Audio

Generate speech from text, and transcribe audio into text or create accessible WebVTT captions.

Server Components

`<Audio>`

Creates an audio file from text content or a React tree. The component also generates WebVTT captions for the audio file to improve accessibility.

Consumes <Track> for generating the captions, unless the noCaption prop is true.

Live Example

This audio is AI-generated and not a human voice.

<Audio controls>
  <h2>Hey there my friend, how are you?</h2>
  <p>
    With Fullstack Components you can easily build Next.js
    applications powered by AI. The set of tools includes Server
    Components, custom hooks and much more!
  </p>
  <p>
    In this example, I'm using the <code>Audio</code> component to
    generate speech from text. You can read more about it in the
    documentation below.
  </p>
  <p>See you around!</p>
</Audio>

Note on file size constraints: the <Audio> and <Track> Server Components currently transform the response body to base64 data URLs and this affects the maximum file size that can be generated.

Data URLs are used to inline the data instead of storing the audio and caption files on the server. This may change in the future.

If you want to store the files yourself, you can build your own Server Components using the getAudio utility function.

`<Track>`

Creates VTT captions from an audio file.

Live Example

// Using a video element for this demo
// to visibly show the captions.
// The `<Audio>` Server Component already uses `<Track>`.
<video
  controls
  className="aspect-video w-full [&::cue]:text-base"
>
  <source
    src="/my-super-original-podcast.mp3"
    type="audio/mpeg"
  />
  <Track src="/my-super-original-podcast.mp3" default />
</video>

`<Transcript>`

Transcribes an audio file to text.

Live Example

Hey there, my friend, how are you? With full stack components, you can easily build Next.js applications powered by AI. The set of tools includes server components, custom hooks, and much more. In this example, I'm using the audio component to generate speech from text. You can read more about it in the documentation below. See you around.

<Transcript
  src="/my-super-original-podcast.mp3"
  as="p"
  className="text-lg"
/>

`useAudio` hook

useAudio is a utility hook that allows for full access to the same features as getAudio, in addition to the ability to control the audio file playback.

'use client'
import { useAudio } from '@trikinco/fullstack-components/client'
 
export default function Page() {
  const {
    play,
    pause,
    setPlayBackRate,
    isLoading,
    data,
    audioRef,
  } = useAudio({
    mode: 'speech',
    content: "Hey there, what's up?",
  })
 
  if (isLoading) {
    return 'Loading...'
  }
 
  return (
    <div>
      <audio ref={audioRef} controls>
        <source src={data} type="audio/mpeg" />
      </audio>
      <div className="flex gap-3">
        <button onClick={() => play()}>Play</button>
        <button onClick={() => pause()}>Pause</button>
        <button onClick={() => setPlayBackRate(1)}>1.0x</button>
        <button onClick={() => setPlayBackRate(2)}>2.0x</button>
      </div>
    </div>
  )
}

Custom Server Component with `getAudio`

Live Example

Download reminder

Server Component

import { getAudio } from '@trikinco/fullstack-components'
 
export default async function Page() {
  const { responseFile, contentType } = await getAudio({
    mode: 'speech',
    content: 'It is Wednesday my dudes',
    voice: 'onyx',
  })
 
  if (!responseFile) return 'No reminders.'
 
  const base64String =
    Buffer.from(responseFile).toString('base64')
 
  return (
    <a
      href={`data:${contentType};base64,${base64String}`}
      download
    >
      Download reminder
    </a>
  )
}

Setup

Add audio: handleAudioRequest() to the API route handler.

app/api/fsutils/[...fscomponents]/route.ts

import {
  handleFSComponents,
  handleAudioRequest,
  type FSCOptions,
} from '@trikinco/fullstack-components'
 
const fscOptions: FSCOptions = {
  audio: handleAudioRequest({
    openAiApiKey: process.env.OPENAI_API_KEY || '',
  }),
  // Additional options and handlers...
}
 
const fscHandler = handleFSComponents(fscOptions)
 
export { fscHandler as GET, fscHandler as POST }

API Reference

Types

AudioOptions

Audio API route handler options.

Properties

openAiApiKeystring

default`process.env.OPENAI_API_KEY`.

AudioRequestBodyAudioSpeechModeRequestBody | AudioTranscriptionModeRequestBody | AudioTranslationModeRequestBody

Audio request body.

AudioSpeechModeRequestBodyinterface<string>

Audio request body when using `speech` mode to generate audio from text.

Properties

modespeech

Speech mode generates audio files from text.

model | string | string

ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.

linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
default'tts-1-hd' speech model optimized for quality.

The voice to use when generating the audio. See the text to speech guide for a preview of the voices available.

linkhttps://platform.openai.com/docs/guides/text-to-speech/voice-options
default'alloy'

contentstring

A text string to generate an audio file from.

AudioTranscriptionModeRequestBodyinterface<string>

Audio request body when using `transcript` mode to generate text from audio.

Properties

modetranscription

Transcription mode generates text from audio.

model | string

ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.

linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
default'whisper-1'

languagestring

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

default'en' English.

contentUploadable

An audio file to transcribe.

AudioTranslationModeRequestBodyinterface<string>

Audio request body when using `translation` mode to translate audio into English.

Properties

modetranslation

Translation mode translates audio files to English.

model | string

ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.

linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
default'whisper-1'

contentUploadable

An audio file to translate.

GetAudioResponseGetAudioResponse

The response type depends on the `mode`. `speech` return `AudioFileResponse` `transcription` and `translation` return `AudioTextResponse`

AudioTextResponseinterface

Audio response when using `transcription` or `translation` mode.

Properties

responseTextstring

The response text when using `transcription` or `translation` mode.

tokensUsednumber

Total number of tokens used in the request (prompt + completion).

finishReasonstring

The reason the audio creation, transcription, or translation stopped.

errorMessagestring

The error message if there was an error.

AudioFileResponseinterface

Audio response when using `speech` mode.

Properties

responseFileArrayBufferConstructor

The audio file as an ArrayBuffer.

responseFormatstring

The OpenAI `response_format`, used to determine the file extension.

default'mp3'

contentTypestring

The `Content-Type` header from the OpenAI response passed through.

default'audio/mpeg'

tokensUsednumber

Total number of tokens used in the request (prompt + completion).

finishReasonstring

The reason the audio creation, transcription, or translation stopped.

errorMessagestring

The error message if there was an error.

Components

AudioPropsinterface<AudioHTMLAttributes, object>

Props to pass to the `<Audio>` Server Component.

Properties

contentstring

Content to be converted to speech.

noteoverrides `children` if both are provided.

model | string | string

ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.

linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
default'tts-1-hd'

The voice to use when generating the audio. See the text to speech guide for a preview of the voices available.

linkhttps://platform.openai.com/docs/guides/text-to-speech/voice-options
default'alloy'

noCaptionboolean

Prevents generating a caption track.

noteWCAG requires captions for audio-only content.
linkhttps://www.w3.org/WAI/WCAG21/Understanding/captions-prerecorded.html

disclosureReactNode

Disclosure to be displayed to end users. The OpenAI usage policies require you to provide a clear disclosure to end users that the TTS voice they are hearing is AI-generated and not a human voice.

linkhttps://openai.com/policies/usage-policies
default'This audio is AI-generated and not a human voice.'

Audiofunction

import { Audio } from '@trikinco/fullstack-components'

Automatically turns text into audio and creates audio captions in WebVTT format.

Parameters

propsAsComponent<C, AudioProps>required

linkAudioProps

returnsPromise<JSX.Element>

TrackPropsinterface<DetailedHTMLProps, string>

Props to pass to the `<Track>` Server Component.

Properties

model | string

ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.

linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
default'whisper-1'

languagestring

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

default'en'

promptstring

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

linkhttps://platform.openai.com/docs/guides/speech-to-text/prompting
default`vtt ${kind} for this ${media} file`

mediaaudio | video

The type of media to which the text track belongs.

default'audio'

namestring

Full file name including file extension of the audio file to transcribe.

noteif this is removed, the API call will fail.
default'audio.mpeg'

The type of audio file to generate a text transcript from.

default'audio/mpeg'

kindcaptions | subtitles | descriptions | chapters | metadata

How the text track is meant to be used. Captions are necessary for deaf viewers to understand the content. Captions include a text description of all important background noises and other sounds, in addition to the text of all dialog and narration. Subtitles are generally language translations, to help listeners understand content presented in a language they don't understand. Subtitles generally include only dialog and narration.

linkhttps://developer.mozilla.org/en-US/docs/Web/HTML/Element/track#kind
default'captions'

labelstring

Label to use for the track. Since its primary usage is for captions, this should be the language of the captions.

default'English'

srcLangstring

Language of the track text data. Must be a valid BCP 47 language tag.

linkhttps://developer.mozilla.org/en-US/docs/Web/HTML/Element/track#srclang
default'en'

srcArrayBufferConstructor | string

Audio file, or the path to the audio file to transcribe.

notetranscription can only be done with audio files.

transformfunction

A function that allows for transforming the cues before they are added to the track.

Parameters

cuesarrayrequired

Trackfunction

import { Track } from '@trikinco/fullstack-components'

Track generates captions from an audio file to be used in <audio> or <video>. It lets you specify timed text tracks, for example to automatically handle subtitles. The track is formatted in WebVTT format (.vtt files) — Web Video Text Tracks.

Parameters

propsTrackPropsrequired

linkTrackProps

returnsPromise<JSX.Element>

TranscriptPropsinterface<object>

Props to pass to the `<Transcript>` Server Component.

Properties

model | string

ID of the model to use. See the model endpoint compatibility table for details on which models work with the Audio API.

linkhttps://platform.openai.com/docs/models/model-endpoint-compatibility
default'whisper-1'

languagestring

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

default'en'

promptstring

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

linkhttps://platform.openai.com/docs/guides/speech-to-text/prompting

srcArrayBufferConstructor | string

Audio file, or the path to the audio file to transcribe.

notetranscription can only be done with audio files.

namestring

Full file name including file extension of the audio file to transcribe.

noteif this is removed, the API call will fail.
default'audio.mpeg'

The type of audio file to generate a text transcript from.

default'audio/mpeg'

formatjson | text | srt | verbose_json | vtt

The `response_format` of the transcript output. Use `vtt` for WebVTT / captions format.

default'json'

Transcriptfunction

import { Transcript } from '@trikinco/fullstack-components'

Generates text from an audio file.

Parameters

propsAsComponent<C, TranscriptProps>required

linkTranscriptProps

returnsPromise<JSX.Element>

Hooks

useAudiofunction

import { useAudio } from '@trikinco/fullstack-components/client'

A client-side fetch handler hook that generates audio files from text or text from audio. Includes utilities for controlling the audio file playback when the `mode` is `speech`.

Parameters

bodyAudioRequestBodyrequired

linkAudioRequestBody

configUseRequestConsumerConfig<AudioRequestBody>

Fetch utility hook request options without the `fetcher`. Allows for overriding the default `request` config.

linkhttps://developer.mozilla.org/en-US/docs/Web/API/Request/Request

Returns

isLoadingboolean

Fetch loading state. `true` if the fetch is in progress.

isErrorboolean

Fetch error state. `true` if an error occurred.

errorunknown

Fetch error object if `isError` is `true`

datauseAudio!T | undefined

Fetch response data if the fetch was successful.

refetchfunction

Refetches the data.

returnsundefined

playfunction

Plays the audio file.

returnsundefined

pausefunction

Pauses the audio file.

returnsundefined

setPlayBackRatefunction

Sets the audio playback rate / speed.

Parameters

playBackRatenumberrequired

returnsundefined

setAudioContextfunction

Sets the audio context.

linkhttps://developer.mozilla.org/en-US/docs/Web/API/AudioContext

Parameters

audioContextobjectrequired

returnsundefined

setAudioSourcefunction

Sets the audio source.

linkhttps://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrackAudioSourceNode

Parameters

audioSourceobjectrequired

returnsundefined

audioContextobject | null

The audio context.

audioSourceobject | null

The audio source.

audioRefRefObject<object>

The audio element ref.

playBackRatenumber

The audio playback rate / speed.

useAudioSourcefunction

import { useAudioSource } from '@trikinco/fullstack-components/client'

A client-side audio file handler with some basic utilities for controlling audio file playback.

Parameters

isEnabledbooleanrequired

Enables connecting the audio source to the audio context when the audio file is loaded and the `audioRef` is set.

Returns

playfunction

Plays the audio file.

returnsundefined

pausefunction

Pauses the audio file.

returnsundefined

setPlayBackRatefunction

Sets the audio playback rate / speed.

Parameters

playBackRatenumberrequired

returnsundefined

setAudioContextfunction

Sets the audio context.

linkhttps://developer.mozilla.org/en-US/docs/Web/API/AudioContext

Parameters

audioContextobjectrequired

returnsundefined

setAudioSourcefunction

Sets the audio source.

linkhttps://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrackAudioSourceNode

Parameters

audioSourceobjectrequired

returnsundefined

audioContextobject | null

The audio context.

audioSourceobject | null

The audio source.

audioRefRefObject<object>

The audio element ref.

playBackRatenumber

The audio playback rate / speed.

Utilities

getAudiofunction

import { getAudio } from '@trikinco/fullstack-components'

Depending on the `mode`, `speech` generates audio files from text or `transcriptions` and `translations` generates text from audio. Server Action that calls the third-party API directly on the server. This avoids calling the Next.js API route handler allowing for performant Server Components.

Parameters

requestrequired

linkAudioRequestBody

optionsAudioOptions

linkAudioOptions

returnsPromise<GetAudioResponse>

fetchAudiofunction

import { fetchAudio } from '@trikinco/fullstack-components/client'

Depending on the `mode`, `speech` generates audio files from text or `transcriptions` and `translations` generates text from audio. Client-side fetch handler that calls the internal Next.js API route handler, then the third-party API. Best used for Client Components and functionality.

Parameters

bodyAudioRequestBodyrequired

linkAudioRequestBody

configRequestConfigOnly

Fetch utility request options without the `body`

linkhttps://developer.mozilla.org/en-US/docs/Web/API/Request/Request

returnsPromise<string>