Transcription

Attendee offers two methods for real-time meeting transcription: Third-party-based Transcription and Closed Caption-based Transcription. Both methods allow you to receive real-time updates via webhooks. Also, both methods have perfect speaker identification, also known as diarization.

For an example of a simple web application that uses Attendee to transcribe meeting audio in real time, see the real time transcription example repository.

Third-party-based Transcription

This method relies on access to a per-speaker audio stream for each participant. Attendee identifies when a participant starts and stops speaking. When a participant pauses for a few seconds, the audio segment is sent to a third-party transcription provider for processing.

Latency

The latency of third-party-based transcription is dependent on two main factors: the time it takes for the third-party provider to transcribe the audio, and the size of the audio segment itself. If a participant speaks for a long time without pausing, the audio segment sent for transcription will be large, increasing processing time. These two factors mean that third-party-based transcription generally has higher latency compared to closed caption-based transcription.

Quality

Third-party-based transcription is generally of higher quality than closed caption-based transcription.

Supported providers

Deepgram
OpenAI
Gladia
Assembly AI

See the API reference for supported parameters for configuring the transcription providers.

Cost

Third-party-based transcription incurs costs from the transcription provider. Attendee uses an API key to call the transcription provider that you provide in the Credentials section of the dashboard.

Closed Caption-based Transcription

This method takes advantage of the built-in closed captioning feature of the meeting platfom. Attendee captures these captions as they are generated by the platform.

Latency

This method offers lower latency. Captions are captured as soon as they are generated by the platform.

Cost

Closed caption-based transcription is free.

Choosing the Right Method

Feature	Third-party-based Transcription	Closed Caption-based Transcription
Source	Per-participant audio segments	Built-in captions from the meeting platform (Zoom, Google Meet)
Transcription Quality	High (depends on the provider, e.g., OpenAI, Deepgram)	Generally lower than third-party-based transcription
Word-level timestamps	Supported by all providers except OpenAI	No.
Speaker Diarization	Yes, perfect speaker identification.	Yes, perfect speaker identification.
Latency	Higher latency due to provider processing and segment size.	Lower latency, near-instantaneous.
Cost	Incurs costs from third-party transcription providers.	No additional costs.
Setup	Requires configuring a third-party transcription provider.	No setup required.

Adding transcription providers in the dashboard

For third-party-based transcription, you need to add your API Key for a provider like Deepgram, OpenAI, Gladia, or Assembly AI in the Settings > Credentials page.

Transcription errors

If you are using third-party-based transcription, you may encounter errors from the transcription provider. These errors are visible in the bot detail page in the dashboard, in the transcription section.

Additionally, the post-processing complete bot event will contain a list of transcription errors in the event metadata.

Configuring transcription in the API call

You can configure transcription settings when creating a bot. This includes selecting the transcription provider and provider-specific options like language, model, etc. See the API reference for details. You will set the parameters in the transcription_settings object of the create bot request body. It will have the form

{
    "chosen transcription provider": {
        "provider-specific parameters"
    }
}

For example, if you want to use Deepgram with english and the nova-2 model, you will set the transcription_settings to:

{
    "deepgram": {
        "language": "en-US",
        "model": "nova-2"
    }
}

Setting up webhooks for real time transcription

You can set up webhooks for real time transcription in the dashboard. Go to the Settings > Webhooks page and click the 'Create Webhook' button.

Make sure the transcript.update trigger is enabled for your webhook. This will fire a webhook event every time a new utterance is added to the transcript. See the webhooks page for more details on the webhook payload.

Fetching transcripts during and after the meeting

You can fetch transcripts during and after the meeting, by calling the /transcript endpoint. See the API reference for details.

Multilingual transcription

All transcription methods can transcribe audio in different languages, but some methods support different languages than others. See the API reference for details on how to specify the language.

All third-party transcription providers support automatic language detection, but closed caption-based transcription does not. Some third-party providers have the ability to transcribe audio where the speaker is switching languages in the middle of a sentence, see the list below for details.

Choosing the right transcription provider

Deepgram

Cheap price, good quality, and fast, the only downside is it doesn't support as many languages as some of the other providers.

Can transcribe audio where the speaker is switching languages in the middle of a sentence.

$200 in free credits for new users.

Gladia

Similar to Deepgram, but more expensive and supports more languages.

Can transcribe audio where the speaker is switching languages in the middle of a sentence.

10 hours of free transcription each month.

Assembly AI

Similar to Deepgram in price and quality but lacks the ability to transcribe audio where the speaker is switching languages in the middle of a sentence. Very accurate word-level timestamps.

$50 in free credits for new users.

OpenAI

Cheaper then the other providers, but less accurate and often chooses the wrong language when the language is not specified in advance. Can transcribe audio where the speaker is switching languages in the middle of a sentence. Lacks word-level timestamps.

Note on custom OpenAI proxy servers

To use a custom OpenAI-compatible endpoint (such as a proxy server or alternative model provider), set these environment variables:

OPENAI_BASE_URL: The base URL for your custom endpoint (default: https://api.openai.com/v1)
OPENAI_MODEL_NAME: The model name to use for transcription (default: gpt-4o-transcribe)

Example: OPENAI_BASE_URL=https://your-proxy.com/v1 and OPENAI_MODEL_NAME=whisper-large-v3

Transcription

Third-party-based Transcription Copied

Latency Copied

Quality Copied

Supported providers Copied

Cost Copied

Closed Caption-based Transcription Copied

Latency Copied

Cost Copied

Choosing the Right Method Copied

Adding transcription providers in the dashboard Copied

Transcription errors Copied

Configuring transcription in the API call Copied

Setting up webhooks for real time transcription Copied

Fetching transcripts during and after the meeting Copied

Multilingual transcription Copied

Choosing the right transcription provider Copied

Deepgram Copied

Gladia Copied

Assembly AI Copied

OpenAI Copied

Note on custom OpenAI proxy servers

Third-party-based Transcription

Latency

Quality

Supported providers

Cost

Closed Caption-based Transcription

Latency

Cost

Choosing the Right Method

Adding transcription providers in the dashboard

Transcription errors

Configuring transcription in the API call

Setting up webhooks for real time transcription

Fetching transcripts during and after the meeting

Multilingual transcription

Choosing the right transcription provider

Deepgram

Gladia

Assembly AI

OpenAI