Audio
Generate speech from text, and transcribe audio into text or create accessible WebVTT captions.
Server Components
<Audio>
Creates an audio file from text content
or a React tree. The component also generates WebVTT captions for the audio file to improve accessibility.
Consumes <Track>
for generating the captions, unless the noCaption
prop is true
.
Live Example
Note on file size constraints: the <Audio>
and <Track>
Server Components currently transform the response body to base64 data URLs and this affects the maximum file size that can be generated.
Data URLs are used to inline the data instead of storing the audio and caption files on the server. This may change in the future.
If you want to store the files yourself, you can build your own Server Components using the getAudio
utility function.
<Track>
Creates VTT captions from an audio file.
Live Example
<Transcript>
Transcribes an audio file to text.
Live Example
Hey there, my friend, how are you? With full stack components, you can easily build Next.js applications powered by AI. The set of tools includes server components, custom hooks, and much more. In this example, I'm using the audio component to generate speech from text. You can read more about it in the documentation below. See you around.
useAudio
hook
useAudio
is a utility hook that allows for full access to the same features as getAudio
, in addition to the ability to control the audio file playback.
Custom Server Component with getAudio
Live Example
Setup
Add audio: handleAudioRequest()
to the API route handler.
API Reference
Types
AudioOptions
Audio API route handler options.Properties
openAiApiKeystring
- default`process.env.OPENAI_API_KEY`.
AudioSpeechModeRequestBodyinterface<string>
Audio request body when using `speech` mode to generate audio from text.Properties
modespeech
Speech mode generates audio files from text.model | string | string
ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.- linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
- default'tts-1-hd' speech model optimized for quality.
voicestring | string | string | string | string | string
The voice to use when generating the audio. See the text to speech guide for a preview of the voices available.- linkhttps://platform.openai.com/docs/guides/text-to-speech/voice-options
- default'alloy'
contentstring
A text string to generate an audio file from.AudioTranscriptionModeRequestBodyinterface<string>
Audio request body when using `transcript` mode to generate text from audio.Properties
modetranscription
Transcription mode generates text from audio.model | string
ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.- linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
- default'whisper-1'
languagestring
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.- default'en' English.
contentUploadable
An audio file to transcribe.AudioTranslationModeRequestBodyinterface<string>
Audio request body when using `translation` mode to translate audio into English.Properties
modetranslation
Translation mode translates audio files to English.model | string
ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.- linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
- default'whisper-1'
contentUploadable
An audio file to translate.GetAudioResponseGetAudioResponse
The response type depends on the `mode`. `speech` return `AudioFileResponse` `transcription` and `translation` return `AudioTextResponse`AudioTextResponseinterface
Audio response when using `transcription` or `translation` mode.Properties
responseTextstring
The response text when using `transcription` or `translation` mode.tokensUsednumber
Total number of tokens used in the request (prompt + completion).finishReasonstring
The reason the audio creation, transcription, or translation stopped.errorMessagestring
The error message if there was an error.AudioFileResponseinterface
Audio response when using `speech` mode.Properties
responseFileArrayBufferConstructor
The audio file as an ArrayBuffer.responseFormatstring
The OpenAI `response_format`, used to determine the file extension.- default'mp3'
contentTypestring
The `Content-Type` header from the OpenAI response passed through.- default'audio/mpeg'
tokensUsednumber
Total number of tokens used in the request (prompt + completion).finishReasonstring
The reason the audio creation, transcription, or translation stopped.errorMessagestring
The error message if there was an error.Components
AudioPropsinterface<AudioHTMLAttributes, object>
Props to pass to the `<Audio>` Server Component.Properties
contentstring
Content to be converted to speech.- noteoverrides `children` if both are provided.
model | string | string
ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.- linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
- default'tts-1-hd'
voicestring | string | string | string | string | string
The voice to use when generating the audio. See the text to speech guide for a preview of the voices available.- linkhttps://platform.openai.com/docs/guides/text-to-speech/voice-options
- default'alloy'
noCaptionboolean
Prevents generating a caption track.- noteWCAG requires captions for audio-only content.
- linkhttps://www.w3.org/WAI/WCAG21/Understanding/captions-prerecorded.html
disclosureReactNode
Disclosure to be displayed to end users. The OpenAI usage policies require you to provide a clear disclosure to end users that the TTS voice they are hearing is AI-generated and not a human voice.- linkhttps://openai.com/policies/usage-policies
- default'This audio is AI-generated and not a human voice.'
Audiofunction
Automatically turns text into audio and creates audio captions in WebVTT format.Parameters
propsAsComponent<C, AudioProps>required
- linkAudioProps
TrackPropsinterface<DetailedHTMLProps, string>
Props to pass to the `<Track>` Server Component.Properties
model | string
ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.- linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
- default'whisper-1'
languagestring
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.- default'en'
promptstring
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.- linkhttps://platform.openai.com/docs/guides/speech-to-text/prompting
- default`vtt ${kind} for this ${media} file`
mediaaudio | video
The type of media to which the text track belongs.- default'audio'
namestring
Full file name including file extension of the audio file to transcribe.- noteif this is removed, the API call will fail.
- default'audio.mpeg'
typeaudio/flac | audio/mpeg | audio/mp4 | audio/ogg | audio/wav | audio/webm
The type of audio file to generate a text transcript from.- default'audio/mpeg'
kindcaptions | subtitles | descriptions | chapters | metadata
How the text track is meant to be used. Captions are necessary for deaf viewers to understand the content. Captions include a text description of all important background noises and other sounds, in addition to the text of all dialog and narration. Subtitles are generally language translations, to help listeners understand content presented in a language they don't understand. Subtitles generally include only dialog and narration.- linkhttps://developer.mozilla.org/en-US/docs/Web/HTML/Element/track#kind
- default'captions'
labelstring
Label to use for the track. Since its primary usage is for captions, this should be the language of the captions.- default'English'
srcLangstring
Language of the track text data. Must be a valid BCP 47 language tag.srcArrayBufferConstructor | string
Audio file, or the path to the audio file to transcribe.- notetranscription can only be done with audio files.
transformfunction
A function that allows for transforming the cues before they are added to the track.Parameters
cuesarrayrequired
Trackfunction
Track generates captions from an audio file to be used in <audio> or <video>. It lets you specify timed text tracks, for example to automatically handle subtitles. The track is formatted in WebVTT format (.vtt files) โ Web Video Text Tracks.Parameters
propsTrackPropsrequired
- linkTrackProps
TranscriptPropsinterface<object>
Props to pass to the `<Transcript>` Server Component.Properties
model | string
ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.- linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
- default'whisper-1'
languagestring
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.- default'en'
promptstring
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.srcArrayBufferConstructor | string
Audio file, or the path to the audio file to transcribe.- notetranscription can only be done with audio files.
namestring
Full file name including file extension of the audio file to transcribe.- noteif this is removed, the API call will fail.
- default'audio.mpeg'
typeaudio/flac | audio/mpeg | audio/mp4 | audio/ogg | audio/wav | audio/webm
The type of audio file to generate a text transcript from.- default'audio/mpeg'
formatjson | text | srt | verbose_json | vtt
The `response_format` of the transcript output. Use `vtt` for WebVTT / captions format.- default'json'
Transcriptfunction
Generates text from an audio file.Parameters
propsAsComponent<C, TranscriptProps>required
- linkTranscriptProps
Hooks
useAudiofunction
A client-side fetch handler hook that generates audio files from text or text from audio. Includes utilities for controlling the audio file playback when the `mode` is `speech`.Parameters
bodyAudioRequestBodyrequired
- linkAudioRequestBody
configUseRequestConsumerConfig<AudioRequestBody>
Fetch utility hook request options without the `fetcher`. Allows for overriding the default `request` config.Returns
isLoadingboolean
Fetch loading state. `true` if the fetch is in progress.isErrorboolean
Fetch error state. `true` if an error occurred.errorunknown
Fetch error object if `isError` is `true`datauseAudio!T | undefined
Fetch response data if the fetch was successful.refetchfunction
Refetches the data.playfunction
Plays the audio file.pausefunction
Pauses the audio file.setPlayBackRatefunction
Sets the audio playback rate / speed.Parameters
playBackRatenumberrequired
setAudioContextfunction
Sets the audio context.Parameters
audioContextobjectrequired
setAudioSourcefunction
Sets the audio source.Parameters
audioSourceobjectrequired
audioContextobject | null
The audio context.audioSourceobject | null
The audio source.audioRefRefObject<object>
The audio element ref.playBackRatenumber
The audio playback rate / speed.useAudioSourcefunction
A client-side audio file handler with some basic utilities for controlling audio file playback.Parameters
isEnabledbooleanrequired
Enables connecting the audio source to the audio context when the audio file is loaded and the `audioRef` is set.Returns
playfunction
Plays the audio file.pausefunction
Pauses the audio file.setPlayBackRatefunction
Sets the audio playback rate / speed.Parameters
playBackRatenumberrequired
setAudioContextfunction
Sets the audio context.Parameters
audioContextobjectrequired
setAudioSourcefunction
Sets the audio source.Parameters
audioSourceobjectrequired
audioContextobject | null
The audio context.audioSourceobject | null
The audio source.audioRefRefObject<object>
The audio element ref.playBackRatenumber
The audio playback rate / speed.Utilities
getAudiofunction
Depending on the `mode`, `speech` generates audio files from text or `transcriptions` and `translations` generates text from audio. Server Action that calls the third-party API directly on the server. This avoids calling the Next.js API route handler allowing for performant Server Components.Parameters
requestrequired
- linkAudioRequestBody
optionsAudioOptions
- linkAudioOptions
fetchAudiofunction
Depending on the `mode`, `speech` generates audio files from text or `transcriptions` and `translations` generates text from audio. Client-side fetch handler that calls the internal Next.js API route handler, then the third-party API. Best used for Client Components and functionality.Parameters
bodyAudioRequestBodyrequired
- linkAudioRequestBody
configRequestConfigOnly
Fetch utility request options without the `body`