

# Transcribing a medical conversation
<a name="transcribe-medical-conversation"></a>

You can use Amazon Transcribe Medical to transcribe a medical conversation between a clinician and a patient using either a batch transcription job or a real-time stream. Batch transcription jobs enable you to transcribe audio files. To ensure that Amazon Transcribe Medical produces transcription results with the highest possible accuracy, you must specify the medical specialty of the clinician in your transcription job or stream.

You can transcribe a clinician-patient visit in the following medical specialities:
+ Cardiology – available in streaming transcription only
+ Neurology – available in streaming transcription only
+ Oncology – available in streaming transcription only
+ Primary Care – includes the following types of medical practice:
  + Family medicine
  + Internal medicine
  + Obstetrics and Gynecology (OB-GYN)
  + Pediatrics
+ Urology – available in streaming transcription only

You can improve transcription accuracy by using medical custom vocabularies. For information on how medical custom vocabularies work, see [Improving transcription accuracy with medical custom vocabularies](vocabulary-med.md).

By default, Amazon Transcribe Medical returns the transcription with the highest confidence level. If you'd like to configure it to return alternative transcriptions, see [Generating alternative transcriptions](alternative-med-transcriptions.md).

For information about how numbers and medical measurements appear in the transcription output, see [Transcribing numbers](how-numbers-med.md) and [Transcribing medical terms and measurements](how-measurements-med.md).

**Topics**
+ [Transcribing an audio file of a medical conversation](batch-medical-conversation.md)
+ [Transcribing a medical conversation in a real-time stream](streaming-medical-conversation.md)
+ [Enabling speaker partitioning](conversation-diarization-med.md)
+ [Transcribing multi-channel audio](conversation-channel-id-med.md)

# Transcribing an audio file of a medical conversation
<a name="batch-medical-conversation"></a>

Use a batch transcription job to transcribe audio files of medical conversations. You can use this to transcribe a clinician-patient dialogue. You can start a batch transcription job in either the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html) API or the AWS Management Console.

When you start a medical transcription job with the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html) API, you specify `PRIMARYCARE` as the value of the `Specialty` parameter. 

## AWS Management Console
<a name="batch-med-conversation-console"></a>

**To transcribe a clinician-patient dialogue (AWS Management Console)**

To use the AWS Management Console to transcribe a clinician-patient dialogue, create a transcription job and choose **Conversation** for **Audio input type**.

1. Sign in to the [AWS Management Console](https://console.aws.amazon.com/transcribe/).

1. In the navigation pane, under Amazon Transcribe Medical, choose **Transcription jobs**.

1. Choose **Create job**.

1. On the **Specify job details** page, under **Job settings **, specify the following.

   1. **Name** – the name of the transcription job.

   1. **Audio input type** – **Conversation**

1. For the remaining fields, specify the Amazon S3 location of your audio file and where you want to store the output of your transcription job.

1. Choose **Next**.

1. Choose **Create**.

## API
<a name="batch-med-conversation-api"></a>

**To transcribe a medical conversation using a batch transcription job (API)**
+ For the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html) API, specify the following.

  1. For `MedicalTranscriptionJobName`, specify a name unique in your AWS account.

  1. For `LanguageCode`, specify the language code that corresponds to the language spoken in your audio file and the language of your vocabulary filter.

  1. For the `MediaFileUri` parameter of the `Media` object, specify the name of the audio file that you want to transcribe.

  1. For `Specialty`, specify the medical specialty of the clinician speaking in the audio file as `PRIMARYCARE`.

  1. For `Type`, specify `CONVERSATION`.

  1. For `OutputBucketName`, specify the Amazon S3 bucket to store the transcription results.

  The following is an example request that uses the AWS SDK for Python (Boto3) to transcribe a medical conversation of a clinician in the `PRIMARYCARE` specialty and a patient.

  ```
  from __future__ import print_function
  import time
  import boto3
  transcribe = boto3.client('transcribe', 'us-west-2')
  job_name = "my-first-med-transcription-job"
  job_uri = "s3://amzn-s3-demo-bucket/my-input-files/my-audio-file.flac"
  transcribe.start_medical_transcription_job(
        MedicalTranscriptionJobName = job_name,
        Media = {
          'MediaFileUri': job_uri
        },
        OutputBucketName = 'amzn-s3-demo-bucket',
        OutputKey = 'output-files/',
        LanguageCode = 'en-US',
        Specialty = 'PRIMARYCARE',
        Type = 'CONVERSATION'
    )
  
  while True:
      status = transcribe.get_medical_transcription_job(MedicalTranscriptionJobName = job_name)
      if status['MedicalTranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
          break
      print("Not ready yet...")
      time.sleep(5)
  print(status)
  ```

The following example code shows the transcription results of a clinician-patient conversation.

```
{
    "jobName": "conversation-medical-transcription-job",
    "accountId": "111122223333",
    "results": {
        "transcripts": [
            {
                "transcript": "... come for a follow up visit today..."
            }
        ],
        "items": [
            {
            ...
                "start_time": "4.85",
                "end_time": "5.12",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "come"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "start_time": "5.12",
                "end_time": "5.29",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "for"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "start_time": "5.29",
                "end_time": "5.33",
                "alternatives": [
                    {
                        "confidence": "0.9955",
                        "content": "a"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "start_time": "5.33",
                "end_time": "5.66",
                "alternatives": [
                    {
                        "confidence": "0.9754",
                        "content": "follow"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "start_time": "5.66",
                "end_time": "5.75",
                "alternatives": [
                    {
                        "confidence": "0.9754",
                        "content": "up"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "start_time": "5.75",
                "end_time": "6.02",
                "alternatives": [
                    {
                        "confidence": "1.0",
                        "content": "visit"
                    }
                ]
                ...
    },
    "status": "COMPLETED"
}
```

## AWS CLI
<a name="batch-med-conversation-cli"></a>

**To transcribe a medical conversation using a batch transcription job (AWS CLI)**
+ Run the following code.

  ```
                      
  aws transcribe start-medical-transcription-job \
  --region us-west-2 \
  --cli-input-json file://example-start-command.json
  ```

  The following code shows the contents of `example-start-command.json`.

  ```
  {
        "MedicalTranscriptionJobName": "my-first-med-transcription-job",        
        "Media": {
            "MediaFileUri": "s3://amzn-s3-demo-bucket/my-input-files/my-audio-file.flac"
        },
        "OutputBucketName": "amzn-s3-demo-bucket",
        "OutputKey": "my-output-files/", 
        "LanguageCode": "en-US",
        "Specialty": "PRIMARYCARE",
        "Type": "CONVERSATION"
    }
  ```

# Transcribing a medical conversation in a real-time stream
<a name="streaming-medical-conversation"></a>

You can transcribe an audio stream of a medical conversation using either the HTTP/2 or [WebSocket ](https://tools.ietf.org/html/rfc6455) protocols. For information on how to start a stream using the WebSocket protocol, see [Setting up a WebSocket stream](streaming-setting-up.md#streaming-websocket). To start an HTTP/2 stream, use the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html) API.

You can transcribe streaming audio in the following medical specialties:
+ Cardiology
+ Neurology
+ Oncology
+ Primary Care
+ Urology

Each medical specialty includes many types of procedures and appointments. Clinicians therefore dictate many different types of notes. Use the following examples as guidance to help you specify the value of the `specialty` URI parameter of the WebSocket request, or the `Specialty` parameter of the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html) API:
+ For electrophysiology or echocardiography consultations, choose `CARDIOLOGY`.
+ For medical oncology, surgical oncology, or radiation oncology consultations, choose `ONCOLOGY`.
+ For a physician providing a consultation to a patient who had a stroke, either a transient ischemic attack or a cerebrovascular attack, choose `NEUROLOGY`.
+ For a consultation around urinary incontinence, choose `UROLOGY`.
+ For yearly checkup or urgent care visits, choose `PRIMARYCARE`.
+ For inpatient hospitalist visits, choose `PRIMARYCARE`.
+ For consultations regarding fertility, tubal ligation, IUD insertion, or abortion, choose `PRIMARYCARE`.

## AWS Management Console
<a name="streaming-medical-conversation-console"></a>

**To transcribe a streaming medical conversation (AWS Management Console)**

To use the AWS Management Console to transcribe a clinician-patient dialogue in real-time stream, choose the option to transcribe a medical conversation, start the stream, and begin speaking into the microphone.

1. Sign in to the [AWS Management Console](https://console.aws.amazon.com/transcribe/).

1. In the navigation pane, under Amazon Transcribe Medical, choose **Real-time transcription**.

1. Choose **Conversation**.

1. For **Medical specialty**, choose the clinician's specialty.

1. Choose **Start streaming**.

1. Speak into the microphone.

## Transcribing a medical conversation in an HTTP/2 stream
<a name="http2-med-conversation-streaming"></a>

The following is the syntax for the parameters of an HTTP/2 request.

To transcribe an HTTP/2 stream of a medical conversation, use the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html) API and specify the following:
+ `LanguageCode` – The language code. The valid value is `en-US`
+ `MediaEncoding` – The encoding used for the input audio. Valid values are `pcm`, `ogg-opus`, and `flac`.
+ `Specialty` – The specialty of the medical professional.
+ `Type` – `CONVERSATION`

To improve transcription accuracy of specific terms in a real-time stream, use a custom vocabulary. To enable a custom vocabulary, set the value of `VocabularyName` parameter to the name of the custom vocabulary that you want to use. For more information, see [Improving transcription accuracy with medical custom vocabularies](vocabulary-med.md).

To label the speech from different speakers, set the `ShowSpeakerLabel` parameter to `true`. For more information, see [Enabling speaker partitioning](conversation-diarization-med.md).

For more information on setting up an HTTP/2 stream to transcribe a medical conversation, see [Setting up an HTTP/2 stream](streaming-setting-up.md#streaming-http2).

## Transcribing a medical conversation in a WebSocket stream
<a name="transcribe-medical-conversation-websocket"></a>

You can use a WebSocket request to transcribe a medical conversation. When you make a WebSocket request, you create a presigned URI. This URI contains the information needed to set up the audio stream between your application and Amazon Transcribe Medical. For more information on creating WebSocket requests, see [Setting up a WebSocket stream](streaming-setting-up.md#streaming-websocket).

Use the following template to create your presigned URI.

```
GET wss://transcribestreaming.us-west-2.amazonaws.com:8443/medical-stream-transcription-websocket
?language-code=languageCode
&X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20220208%2Fus-west-2%2Ftranscribe%2Faws4_request
&X-Amz-Date=20220208T235959Z
&X-Amz-Expires=300
&X-Amz-Security-Token=security-token
&X-Amz-Signature=Signature Version 4 signature 
&X-Amz-SignedHeaders=host
&media-encoding=flac
&sample-rate=16000
&session-id=sessionId
&specialty=medicalSpecialty
&type=CONVERSATION
&vocabulary-name=vocabularyName
&show-speaker-label=boolean
```

To improve transcription accuracy of specific terms in a real-time stream, use a custom vocabulary. To enable a custom vocabulary, set the value of `vocabulary-name` to the name of the custom vocabulary that you want to use. For more information, see [Improving transcription accuracy with medical custom vocabularies](vocabulary-med.md).

To label the speech from different speakers, set the `show-speaker-label` parameter in to `true`. For more information, see [Enabling speaker partitioning](conversation-diarization-med.md).

For more information on creating pre-signed URIs, see [Setting up a WebSocket stream](streaming-setting-up.md#streaming-websocket).

# Enabling speaker partitioning
<a name="conversation-diarization-med"></a>

To enable speaker partitioning in Amazon Transcribe Medical, use *speaker diarization*. This enables you to see what the patient said and what the clinician said in the transcription output.

When you enable speaker diarization, Amazon Transcribe Medical labels each speaker *utterance* with a unique identifier for each speaker. An *utterance* is a unit of speech that is typically separated from other utterances by silence. In batch transcription, an utterance from the clinician could receive a label of `spk_0` and an utterance the patient could receive a label of `spk_1`.

If an utterance from one speaker overlaps with an utterance from another speaker, Amazon Transcribe Medical orders them in the transcription by their start times. Utterances that overlap in the input audio don't overlap in the transcription output.

You can enable speaker diarization when you transcribe an audio file using batch transcription job, or in a real-time stream.

**Topics**
+ [Enabling speaker partitioning in batch transcriptions](conversation-diarization-batch-med.md)
+ [Enabling speaker partitioning in real-time streams](conversation-diarization-streaming-med.md)

# Enabling speaker partitioning in batch transcriptions
<a name="conversation-diarization-batch-med"></a>

You can enable speaker partitioning in a batch transcription job using either the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html) API or the AWS Management Console. This enables you to partition the text per speaker in a clinician-patient conversation and determine who said what in the transcription output.

## AWS Management Console
<a name="conversation-diarization-batch-med-console"></a>

To use the AWS Management Console to enable speaker diarization in your transcription job, you enable audio identification and then speaker partitioning.

1. Sign in to the [AWS Management Console](https://console.aws.amazon.com/transcribe/).

1. In the navigation pane, under Amazon Transcribe Medical, choose **Transcription jobs**.

1. Choose **Create job**.

1. On the **Specify job details** page, provide information about your transcription job.

1. Choose **Next**.

1. Enable **Audio identification**.

1. For **Audio identification type**, choose **Speaker partitioning**.

1. For **Maximum number of speakers**, enter the maximum number of speakers that you think are speaking in your audio file.

1. Choose **Create**.

## API
<a name="conversation-diarization-batch-med-api"></a>

**To enable speaker partitioning using a batch transcription job (API)**
+ For the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html) API, specify the following.

  1. For `MedicalTranscriptionJobName`, specify a name that is unique in your AWS account.

  1. For `LanguageCode`, specify the language code that corresponds to the language spoken in the audio file.

  1. For the `MediaFileUri` parameter of the `Media` object, specify the name of the audio file that you want to transcribe.

  1. For `Specialty`, specify the medical specialty of the clinician speaking in the audio file.

  1. For `Type`, specify `CONVERSATION`.

  1. For `OutputBucketName`, specify the Amazon S3 bucket to store the transcription results.

  1. For the `Settings` object, specify the following.

     1. `ShowSpeakerLabels` – `true`.

     1. `MaxSpeakerLabels` – An integer between 2 and 10 to indicate the number of speakers that you think are speaking in your audio.

The following request uses the AWS SDK for Python (Boto3) to start a batch transcription job of a primary care clinician patient dialogue with speaker partitioning enabled.

```
from __future__ import print_function
import time
import boto3
transcribe = boto3.client('transcribe', 'us-west-2')
job_name = "my-first-transcription-job"
job_uri = "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
transcribe.start_medical_transcription_job(
    MedicalTranscriptionJobName = job_name,
    Media={
        'MediaFileUri': job_uri
    },
    OutputBucketName = 'amzn-s3-demo-bucket',
    OutputKey = 'my-output-files/', 
    LanguageCode = 'en-US',
    Specialty = 'PRIMARYCARE',
    Type = 'CONVERSATION',
    OutputBucketName = 'amzn-s3-demo-bucket',
Settings = {'ShowSpeakerLabels': True,
         'MaxSpeakerLabels': 2
         }
         )
while True:
    status = transcribe.get_medical_transcription_job(MedicalTranscriptionJobName = job_name)
    if status['MedicalTranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(5)
print(status)
```

The following example code shows the transcription results of a transcription job with speaker partitioning enabled.

```
{
    "jobName": "job ID",
    "accountId": "111122223333",
    "results": {
        "transcripts": [
            {
                "transcript": "Professional answer."
            }
        ],
        "speaker_labels": {
            "speakers": 1,
            "segments": [
                {
                    "start_time": "0.000000",
                    "speaker_label": "spk_0",
                    "end_time": "1.430",
                    "items": [
                        {
                            "start_time": "0.100",
                            "speaker_label": "spk_0",
                            "end_time": "0.690"
                        },
                        {
                            "start_time": "0.690",
                            "speaker_label": "spk_0",
                            "end_time": "1.210"
                        }
                    ]
                }
            ]
        },
        "items": [
            {
                "start_time": "0.100",
                "end_time": "0.690",
                "alternatives": [
                    {
                        "confidence": "0.8162",
                        "content": "Professional"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "start_time": "0.690",
                "end_time": "1.210",
                "alternatives": [
                    {
                        "confidence": "0.9939",
                        "content": "answer"
                    }
                ],
                "type": "pronunciation"
            },
            {
                "alternatives": [
                    {
                        "content": "."
                    }
                ],
                "type": "punctuation"
            }
        ]
    },
    "status": "COMPLETED"
}
```

## AWS CLI
<a name="diarization-batch-cli"></a>

**To transcribe an audio file of a conversation between a clinician practicing primary care and a patient (AWS CLI)**
+ Run the following code.

  ```
                      
  aws transcribe start-transcription-job \
  --region us-west-2 \
  --cli-input-json file://example-start-command.json
  ```

  The following code shows the contents of `example-start-command.json`.

  ```
  {
      "MedicalTranscriptionJobName": "my-first-med-transcription-job",       
       "Media": {
            "MediaFileUri": "s3://amzn-s3-demo-bucket/my-input-files/my-audio-file.flac"
        },
        "OutputBucketName": "amzn-s3-demo-bucket",
        "OutputKey": "my-output-files/", 
        "LanguageCode": "en-US",
        "Specialty": "PRIMARYCARE",
        "Type": "CONVERSATION",
        "Settings":{
            "ShowSpeakerLabels": true,
            "MaxSpeakerLabels": 2
          }
  }
  ```

# Enabling speaker partitioning in real-time streams
<a name="conversation-diarization-streaming-med"></a>

To partition speakers and label their speech in a real-time stream, use the AWS Management Console or a streaming request. Speaker partitioning works best for between two and five speakers in a stream. Although Amazon Transcribe Medical can partition more than five speakers in a stream, the accuracy of the partitions decrease if you exceed that number.

To start an HTTP/2 request, use the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html) API. To start a WebSocket request, use a pre-signed URI. The URI contains the information required to set up bi-directional communication between your application and Amazon Transcribe Medical.

## Enabling speaker partitioning in audio that is spoken into your microphone (AWS Management Console)
<a name="conversation-diarization-console"></a>

You can use the AWS Management Console to start a real-time stream of a clinician-patient conversation, or a dictation that is spoken into your microphone in real-time.

1. Sign in to the [AWS Management Console](https://console.aws.amazon.com/transcribe/).

1. In the navigation pane, for Amazon Transcribe Medical choose **Real-time transcription**.

1. For **Audio input type**, choose the type of medical speech that you want to transcribe.

1. For **Additional settings**, choose **Speaker partitioning**.

1. Choose **Start streaming** to start transcribing your real-time audio.

1. Speak into the microphone.

## Enabling speaker partitioning in an HTTP/2 stream
<a name="conversation-diarization-med-http2"></a>

To enable speaker partitioning in an HTTP/2 stream of a medical conversation, use the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html) API and specify the following: 
+ For `LanguageCode`, specify the language code that corresponds to the language in the stream. The valid value is `en-US`.
+ For `MediaSampleHertz`, specify the sample rate of the audio.
+ For `Specialty`, specify the medical specialty of the provider.
+ `ShowSpeakerLabel` – `true`

For more information on setting up an HTTP/2 stream to transcribe a medical conversation, see [Setting up an HTTP/2 stream](streaming-setting-up.md#streaming-http2).

## Enabling speaker partitioning in a WebSocket request
<a name="conversation-diarization-med-websocket"></a>

To partition speakers in WebSocket streams with the API, use the following format to create a pre-signed URI to start a WebSocket request and set `show-speaker-label` to `true`. 

```
GET wss://transcribestreaming.us-west-2.amazonaws.com:8443/medical-stream-transcription-websocket
?language-code=languageCode
&X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20220208%2Fus-west-2%2Ftranscribe%2Faws4_request
&X-Amz-Date=20220208T235959Z
&X-Amz-Expires=300
&X-Amz-Security-Token=security-token
&X-Amz-Signature=Signature Version 4 signature 
&X-Amz-SignedHeaders=host
&media-encoding=flac
&sample-rate=16000
&session-id=sessionId
&specialty=medicalSpecialty
&type=CONVERSATION
&vocabulary-name=vocabularyName
&show-speaker-label=boolean
```

The following code shows the truncated example response of a streaming request.

```
{
  "Transcript": {
    "Results": [
      {
        "Alternatives": [
          {
            "Items": [
              {
                "Confidence": 0.97,
                "Content": "From",
                "EndTime": 18.98,
                "Speaker": "0",
                "StartTime": 18.74,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
              {
                "Confidence": 1,
                "Content": "the",
                "EndTime": 19.31,
                "Speaker": "0",
                "StartTime": 19,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
              {
                "Confidence": 1,
                "Content": "last",
                "EndTime": 19.86,
                "Speaker": "0",
                "StartTime": 19.32,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
             ...
              {
                "Confidence": 1,
                "Content": "chronic",
                "EndTime": 22.55,
                "Speaker": "0",
                "StartTime": 21.97,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
              ...
                "Confidence": 1,
                "Content": "fatigue",
                "EndTime": 24.42,
                "Speaker": "0",
                "StartTime": 23.95,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
              {
                "EndTime": 25.22,
                "StartTime": 25.22,
                "Type": "speaker-change",
                "VocabularyFilterMatch": false
              },
              {
                "Confidence": 0.99,
                "Content": "True",
                "EndTime": 25.63,
                "Speaker": "1",
                "StartTime": 25.22,
                "Type": "pronunciation",
                "VocabularyFilterMatch": false
              },
              {
                "Content": ".",
                "EndTime": 25.63,
                "StartTime": 25.63,
                "Type": "punctuation",
                "VocabularyFilterMatch": false
              }
            ],
            "Transcript": "From the last note she still has mild sleep deprivation and chronic fatigue True."
          }
        ],
        "EndTime": 25.63,
        "IsPartial": false,
        "ResultId": "XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX",
        "StartTime": 18.74
      }
    ]
  }
}
```

Amazon Transcribe Medical breaks your incoming audio stream based on natural speech segments, such as a change in speaker or a pause in the audio. The transcription is returned progressively to your application, with each response containing more transcribed speech until the entire segment is transcribed. The preceding code is a truncated example of a fully-transcribed speech segment. Speaker labels only appear for entirely transcribed segments. 

The following list shows the organization of the objects and parameters in a streaming transcription output.

**`Transcript`**  
Each speech segment has its own `Transcript` object.

**`Results`**  
Each `Transcript` object has its own `Results` object. This object contains the `isPartial` field. When its value is `false`, the results returned are for an entire speech segment.

**`Alternatives`**  
Each `Results` object has an `Alternatives` object.

**`Items`**  
Each `Alternatives` object has its own `Items` object that contains information about each word and punctuation mark in the transcription output. When you enable speaker partitioning, each word has a `Speaker` label for fully-transcribed speech segments. Amazon Transcribe Medical uses this label to assign a unique integer to each speaker in the stream. The `Type` parameter having a value of `speaker-change` indicates that one person has stopped speaking and that another person is about to begin.

**`Transcript`**  
Each Items object contains a transcribed speech segment as the value of the `Transcript` field.

For more information about WebSocket requests, see [Setting up a WebSocket stream](streaming-setting-up.md#streaming-websocket).

# Transcribing multi-channel audio
<a name="conversation-channel-id-med"></a>

If you have an audio file or stream that has multiple channels, you can use *channel identification* to transcribe the speech from each of those channels. Amazon Transcribe Medical transcribes the speech from each channel separately. It combines the separate transcriptions of each channel into a single transcription output.

Use channel identification to identify the separate channels in your audio and transcribe the speech from each of those channels. Enable this in situations such as a caller and agent scenario. Use this to distinguish a caller from an agent in recordings or streams from contact centers that perform drug safety monitoring.

You can enable channel identification for both batch processing and real-time streaming. The following list describes how to enable it for each method.
+ Batch transcription – AWS Management Console and [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html) API
+ Streaming transcription – WebSocket streaming and [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html) API

## Transcribing multi-channel audio files
<a name="conversation-channel-id-med-batch"></a>

When you transcribe an audio file, Amazon Transcribe Medical returns a list of *items* for each channel. An item is a transcribed word or punctuation mark. Each word has a start time and an end time. If a person on one channel speaks over a person on a separate channel, the start times and end times of the items for each channel overlap while the individuals are speaking over each other.

By default, you can transcribe audio files with two channels. You can request a quota increase if you need to transcribe files that have more than two channels. For information about requesting a quota increase, see [AWS service quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html).

To transcribe multi-channel audio in a batch transcription job, use the AWS Management Console or the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html) API.

### AWS Management Console
<a name="channel-id-batch-med-console"></a>

To use the AWS Management Console to enable channel identification in your batch transcription job, you enable audio identification and then channel identification. Channel identification is a subset of audio identification in the AWS Management Console.

1. Sign in to the [AWS Management Console](https://console.aws.amazon.com/transcribe/).

1. In the navigation pane, under Amazon Transcribe Medical, choose **Transcription jobs**.

1. Choose **Create job**.

1. On the **Specify job details** page, provide information about your transcription job.

1. Choose **Next**.

1. Enable **Audio identification**.

1. For **Audio identification type**, choose **Channel identification**.

1. Choose **Create**.

### API
<a name="channel-id-batch-med-api"></a>

**To transcribe a multi-channel audio file (API)**
+ For the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html) API, specify the following.

  1. For `TranscriptionJobName`, specify a name unique to your AWS account.

  1. For `LanguageCode`, specify the language code that corresponds to the language spoken in the audio file. The valid value is `en-US`.

  1. For the `MediaFileUri` parameter of the `Media` object, specify the name of the media file that you want to transcribe.

  1. For the `Settings` object, set `ChannelIdentification` to `true`.

The following is an example request using the AWS SDK for Python (Boto3).

```
from __future__ import print_function
import time
import boto3
transcribe = boto3.client('transcribe', 'us-west-2')
job_name = "my-first-transcription-job"
job_name = "my-first-med-transcription-job"
job_uri = "s3://amzn-s3-demo-bucket/my-input-files/my-media-file.flac"
transcribe.start_medical_transcription_job(
      MedicalTranscriptionJobName = job_name,
      Media = {
        'MediaFileUri': job_uri
      },
      OutputBucketName = 'amzn-s3-demo-bucket',
      OutputKey = 'output-files/',
      LanguageCode = 'en-US',
      Specialty = 'PRIMARYCARE',
      Type = 'CONVERSATION',
      Settings = {
        'ChannelIdentification': True
      }
)
while True:
    status = transcribe.get_transcription_job(MedicalTranscriptionJobName = job_name)
    if status['MedicalTranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(5)
print(status)
```

### AWS CLI
<a name="channel-id-med-cli"></a>

**To transcribe a multi-channel audio file using a batch transcription job (AWS CLI)**
+ Run the following code.

  ```
                      
  aws transcribe start-medical-transcription-job \
  --region us-west-2 \
  --cli-input-json file://example-start-command.json
  ```

  The following is the code of `example-start-command.json`.

  ```
  {
        "MedicalTranscriptionJobName": "my-first-med-transcription-job",        
        "Media": {
            "MediaFileUri": "s3://amzn-s3-demo-bucket/my-input-files/my-audio-file.flac"
        },
        "OutputBucketName": "amzn-s3-demo-bucket",
        "OutputKey": "my-output-files/", 
        "LanguageCode": "en-US",
        "Specialty": "PRIMARYCARE",
        "Type": "CONVERSATION",
  
          "Settings":{
            "ChannelIdentification": true
          }
  }
  ```

The following code shows the transcription output for an audio file that has a conversation on two channels.

```
{
  "jobName": "job id",
  "accountId": "111122223333",
  "results": {
    "transcripts": [
      {
        "transcript": "When you try ... It seems to ..."
      }
    ],
    "channel_labels": {
      "channels": [
        {
          "channel_label": "ch_0",
          "items": [
            {
              "start_time": "12.282",
              "end_time": "12.592",
              "alternatives": [
                {
                  "confidence": "1.0000",
                  "content": "When"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.592",
              "end_time": "12.692",
              "alternatives": [
                {
                  "confidence": "0.8787",
                  "content": "you"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.702",
              "end_time": "13.252",
              "alternatives": [
                {
                  "confidence": "0.8318",
                  "content": "try"
                }
              ],
              "type": "pronunciation"
            },
            ...
         ]
      },
      {
          "channel_label": "ch_1",
          "items": [
            {
              "start_time": "12.379",
              "end_time": "12.589",
              "alternatives": [
                {
                  "confidence": "0.5645",
                  "content": "It"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.599",
              "end_time": "12.659",
              "alternatives": [
                {
                  "confidence": "0.2907",
                  "content": "seems"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.669",
              "end_time": "13.029",
              "alternatives": [
                {
                  "confidence": "0.2497",
                  "content": "to"
                }
              ],
              "type": "pronunciation"
            },
            ...
        ]
    }
}
```

## Transcribing multi-channel audio streams
<a name="conversation-channel-id-med-stream"></a>

You can transcribe audio from separate channels in either HTTP/2 or WebSocket streams using the [https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html) API.

By default, you can transcribe streams with two channels. You can request a quota increase if you need to transcribe streams that have more than two channels. For information about requesting a quota increase, see [AWS service quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html).

### Transcribing multi-channel audio in an HTTP/2 stream
<a name="conversation-channel-id-http2"></a>

To transcribe multi-channel audio in an HTTP/2 stream, use the [StartMedicalStreamTranscription](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html) API and specify the following:
+ `LanguageCode` – The language code of the audio. The valid value is `en-US`.
+ `MediaEncoding` – The encoding of the audio. Valid values are `ogg-opus`, `flac`, and `pcm`.
+ `EnableChannelIdentification` – `true`
+ `NumberOfChannels` – the number of channels in your streaming audio.

For more information on setting up an HTTP/2 stream to transcribe a medical conversation, see [Setting up an HTTP/2 stream](streaming-setting-up.md#streaming-http2).

### Transcribing multi-channel audio in a WebSocket stream
<a name="channel-id-med-websocket"></a>

To partition speakers in WebSocket streams, use the following format to create a pre-signed URI and start a WebSocket request. Specify `enable-channel-identification` as `true` and the number of channels in your stream in `number-of-channels`. A pre-signed URI contains the information needed to set up bi-directional communication between your application and Amazon Transcribe Medical.

```
GET wss://transcribestreaming.us-west-2.amazonaws.com:8443/medical-stream-transcription-websocket
?language-code=languageCode
&X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20220208%2Fus-west-2%2Ftranscribe%2Faws4_request
&X-Amz-Date=20220208T235959Z
&X-Amz-Expires=300
&X-Amz-Security-Token=security-token
&X-Amz-Signature=Signature Version 4 signature
&X-Amz-SignedHeaders=host
&media-encoding=flac
&sample-rate=16000
&session-id=sessionId
&enable-channel-identification=true
&number-of-channels=2
```

Parameter definitions can be found in the [API Reference](https://docs.aws.amazon.com/transcribe/latest/APIReference/API_Reference.html); parameters common to all AWS API operations are listed in the [Common Parameters](https://docs.aws.amazon.com/transcribe/latest/APIReference/CommonParameters.html) section.

For more information about WebSocket requests, see [Setting up a WebSocket stream](streaming-setting-up.md#streaming-websocket).

### Multi-channel streaming output
<a name="streaming-med-output"></a>

The output of a streaming transcription is the same for HTTP/2 and WebSocket requests. The following is an example output.

```
{
    "resultId": "XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXX",
    "startTime": 0.11,
    "endTime": 0.66,
    "isPartial": false,
    "alternatives": [
        {
            "transcript": "Left.",
            "items": [
                {
                    "startTime": 0.11,
                    "endTime": 0.45,
                    "type": "pronunciation",
                    "content": "Left",
                    "vocabularyFilterMatch": false
                },
                {
                    "startTime": 0.45,
                    "endTime": 0.45,
                    "type": "punctuation",
                    "content": ".",
                    "vocabularyFilterMatch": false
                }
            ]
        }
    ],
    "channelId": "ch_0"
}
```

For each speech segment, there is a `channelId` flag that indicates which channel the speech belongs to.