Audio To Text
AI API
Audio to Text
API reference for the audio-to-text pipeline. Accepts audio input and returns a transcript using Whisper-compatible models.
POST
Audio To Text
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
multipart/form-data
Uploaded audio file to be transcribed.
Hugging Face model ID used for transcription.
Return timestamps for the transcribed text. Supported values: 'sentence', 'word', or a string boolean ('true' or 'false'). Default is 'true' ('sentence'). 'false' means no timestamps. 'word' means word-based timestamps.
Additional job information to be passed to the pipeline.
Last modified on May 18, 2026