Faster-Whisper Transcriber — Server API

Integrate transcription into your own scripts, apps, and automation workflows

1. Starting the Server

  1. Launch the Faster-Whisper Transcriber GUI.
  2. Click the Settings (gear) icon in the main window.
  3. In the Server Mode group box, flip the toggle from Off to On and set the Port (default: 8765).
  4. Click Update Settings.
  5. The status bar at the bottom of the main window will show "… | server: on (8765)" and the voice recorder is disabled while the server runs.
  6. The server is now accepting requests.

To stop the server, open Settings again and flip the Server Mode toggle back to Off.

Tip
The server binds to 0.0.0.0, so it is accessible at http://127.0.0.1:8765 locally and at http://<your-ip>:8765 from other machines on your network.
Note
This backend uses the faster-whisper library, which exposes the standard Whisper knobs (beam_size, vad_filter, condition_on_previous_text, word_timestamps, temperature, …). The server surfaces the ones that actually matter for integration.

2. Endpoints

EndpointMethodDescription
/healthGETCheck if the server is running
/statusGETServer status, queue depth, whether a transcription is active
/modelsGETList all available models and whether they support translation
/transcribePOSTTranscribe audio from a file upload (multipart form)
/transcribe/rawPOSTTranscribe audio from base64-encoded data (JSON body)

The server also provides interactive API documentation:

3. Quick Start

The simplest possible request — send an audio file and get text back:

import requests

response = requests.post(
    "http://127.0.0.1:8765/transcribe",
    files={"audio": open("my_audio.mp3", "rb")},
)

print(response.json()["text"])

That's it. The server uses whatever model, quantization, task, and whisper params were configured in the GUI when Server Mode was turned on. Language defaults to auto-detection.

4. Accepted Audio Input Formats

4a. Audio Files (most common)

Standard audio files uploaded directly. Supported formats include: .mp3 .wav .flac .m4a .ogg .aac .wma .webm .mp4 .mkv .avi .asf .amr

import requests

with open("recording.wav", "rb") as f:
    response = requests.post(
        "http://127.0.0.1:8765/transcribe",
        files={"audio": ("recording.wav", f, "audio/wav")},
    )

print(response.json()["text"])

4b. NumPy Arrays

If your program already has audio as a NumPy array, serialize it with np.save() and upload the .npy file:

import io
import numpy as np
import requests

# Your audio as a numpy array (float32, mono)
audio_array = np.random.randn(16000 * 5).astype(np.float32)  # 5 seconds at 16kHz

buffer = io.BytesIO()
np.save(buffer, audio_array)
buffer.seek(0)

response = requests.post(
    "http://127.0.0.1:8765/transcribe",
    files={"audio": ("audio.npy", buffer, "application/octet-stream")},
    data={"sample_rate": "16000"},
)

print(response.json()["text"])
Note
If your audio is at a different sample rate (e.g., 44100 Hz), pass sample_rate=44100 and the server will resample it to 16 kHz automatically.

4c. PyTorch Tensors

import io
import torch
import requests

audio_tensor = torch.randn(16000 * 5)  # 5 seconds at 16kHz

buffer = io.BytesIO()
torch.save(audio_tensor, buffer)
buffer.seek(0)

response = requests.post(
    "http://127.0.0.1:8765/transcribe",
    files={"audio": ("audio.pt", buffer, "application/octet-stream")},
    data={"sample_rate": "16000"},
)

print(response.json()["text"])

4d. Raw PCM Bytes

Common in real-time audio pipelines:

import numpy as np
import requests

audio = np.random.randn(16000 * 3).astype(np.float32)  # 3 seconds
raw_bytes = audio.tobytes()

response = requests.post(
    "http://127.0.0.1:8765/transcribe",
    files={"audio": ("audio.raw", raw_bytes, "application/octet-stream")},
    data={
        "audio_format": "pcm",
        "sample_rate": "16000",
        "dtype": "float32",  # also supports: int16, int32, float64
    },
)

print(response.json()["text"])

4e. Base64-Encoded Data (JSON endpoint)

For sending everything as JSON with no multipart form:

import base64, io
import numpy as np
import requests

audio = np.random.randn(16000 * 5).astype(np.float32)
buffer = io.BytesIO()
np.save(buffer, audio)
b64_data = base64.b64encode(buffer.getvalue()).decode("utf-8")

response = requests.post(
    "http://127.0.0.1:8765/transcribe/raw",
    json={
        "audio_data": b64_data,
        "audio_format": "numpy",
        "sample_rate": 16000,
    },
)

print(response.json()["text"])

You can also send a complete audio file as base64:

import base64
import requests

with open("my_audio.mp3", "rb") as f:
    b64_data = base64.b64encode(f.read()).decode("utf-8")

response = requests.post(
    "http://127.0.0.1:8765/transcribe/raw",
    json={
        "audio_data": b64_data,
        "audio_format": "file",
    },
)

print(response.json()["text"])

5. Settings You Can Control

Every setting is optional. If you omit a value, the server uses whatever was configured in the GUI when Server Mode was turned on.

ParameterTypeDescriptionValues
modelstringName of the Whisper checkpoint"large-v3", "large-v3-turbo", "medium", "medium.en", "small", "small.en", "base", "base.en", "tiny", "tiny.en", "distil-whisper-large-v3", "distil-whisper-medium.en", "distil-whisper-small.en"
quantizationstringCTranslate2 compute_type. Which pre-converted variant of the model to load."float32", "float16", "bfloat16", "int8", "int8_float16", "int8_bfloat16", "int8_float32"
devicestringCPU or GPU"cuda", "cpu"
languagestringISO 639-1 language code. Omit or leave empty to auto-detect."en", "fr", "es", "de", "zh", … (99 Whisper languages)
task_modestringTranscribe in source language or translate to English."transcribe", "translate"
include_timestampsbooleanInclude segment timestamps in the response. When false, the server asks faster-whisper with without_timestamps=True and returns segments: []."true", "false"
word_timestampsbooleanForwarded to faster-whisper. Enables word-level timings inside each segment (still returned at the segment level in the response)."true", "false"
beam_sizeintegerNumber of beams for decoding. Higher = more accurate, slower.120 (default 5)
vad_filterbooleanRun Silero VAD before decoding. Forced on when batch_size>1."true", "false"
condition_on_previous_textbooleanUse previous segment output as context for the next segment. Helps coherence but can propagate hallucinations."true", "false"
batch_sizeintegerWhen >1, uses BatchedInferencePipeline with tuned VAD params and processes VAD-chunked speech in parallel.1128 (default 1)
audio_formatstringOverride input format auto-detection"auto", "file", "numpy", "tensor", "pcm"
sample_rateintegerSample rate of raw audio input (resampled to 16 kHz)e.g. "16000", "22050", "44100", "48000"
dtypestringData type for raw PCM input"float32", "float64", "int16", "int32"

Example with Custom Settings

import requests

with open("lecture.mp3", "rb") as f:
    response = requests.post(
        "http://127.0.0.1:8765/transcribe",
        files={"audio": ("lecture.mp3", f, "audio/mpeg")},
        data={
            "model": "large-v3",
            "quantization": "float16",
            "device": "cuda",
            "task_mode": "transcribe",
            "language": "en",
            "include_timestamps": "true",
            "beam_size": "5",
            "vad_filter": "true",
            "batch_size": "8",
        },
    )

result = response.json()
print(result["text"])
print(f"Detected language: {result['language']}")
print(f"Took {result['processing_time_seconds']} seconds")
Important
When batch_size > 1 the server forces vad_filter=true and applies tuned VAD parameters (threshold, min/max speech duration, speech_pad_ms) needed by faster-whisper's BatchedInferencePipeline. You cannot disable VAD while batching.

6. Response Format

Every transcription request returns a JSON object. Here is what a real response looks like:

6a. Plain text only (include_timestamps omitted or false)

// POST /transcribe  —  include_timestamps defaults to false
{
    "text": "Good morning everyone. Today we'll be discussing the quarterly results and our plans for the upcoming product launch.",
    "segments": [],
    "language": "en",
    "duration": 138.135,
    "task": "transcribe",
    "model_used": "large-v3 - float16",
    "processing_time_seconds": 2.418
}

When timestamps are off, segments is always an empty list [].

6b. With timestamps (include_timestamps=true)

// POST /transcribe  —  include_timestamps=true
{
    "text": "Good morning everyone. Today we'll be discussing the quarterly results and our plans for the upcoming product launch.",
    "segments": [
        {
            "start": 0.081,
            "end": 4.862,
            "text": " Good morning everyone. Today we'll be discussing"
        },
        {
            "start": 4.862,
            "end": 8.241,
            "text": " the quarterly results and our plans for the upcoming product launch."
        }
    ],
    "language": "en",
    "duration": 138.135,
    "task": "transcribe",
    "model_used": "large-v3 - float16",
    "processing_time_seconds": 3.012
}

Each segment corresponds to one Whisper decoding segment (VAD-chunked if vad_filter is on). Timestamps are in seconds with millisecond precision (3 decimal places).

6c. Field Reference

FieldTypeAlways PresentDescription
textstringYesThe complete transcription as a single newline-joined string.
segmentsarrayYesTimestamped segments. Empty [] when include_timestamps is false.
segments[].startfloatSegment start time in seconds.
segments[].endfloatSegment end time in seconds.
segments[].textstringThe transcribed words within this time range (faster-whisper leaves a leading space).
languagestringYesDetected (or echoed) ISO language code.
durationfloatYesDuration of the processed audio, in seconds.
taskstringYes"transcribe" or "translate".
model_usedstringYesFull model key used, e.g., "large-v3 - float16".
processing_time_secondsfloatYesHow long the transcription took (excludes network transfer).

6d. How to Access Each Field in Python

import requests

with open("meeting.mp3", "rb") as f:
    r = requests.post(
        "http://127.0.0.1:8765/transcribe",
        files={"audio": ("meeting.mp3", f)},
        data={"include_timestamps": "true"},
    )

result = r.json()

print("Transcript:", result["text"])
print(f"Detected: {result['language']}, duration {result['duration']:.1f}s")
print(f"Model: {result['model_used']}")
print(f"Task: {result['task']}, took {result['processing_time_seconds']:.1f}s")

for seg in result["segments"]:
    print(f"[{seg['start']:07.3f}{seg['end']:07.3f}] {seg['text'].strip()}")

6e. Converting Segments to SRT in Your Code

The API always returns JSON. If you need subtitle format, convert the segments yourself:

def seconds_to_srt_time(s):
    h = int(s // 3600)
    m = int((s % 3600) // 60)
    sec = s % 60
    return f"{h:02d}:{m:02d}:{sec:06.3f}".replace(".", ",")

def to_srt(segments):
    lines = []
    for i, seg in enumerate(segments, 1):
        lines.append(f"{i}")
        lines.append(f"{seconds_to_srt_time(seg['start'])} --> {seconds_to_srt_time(seg['end'])}")
        lines.append(seg["text"].strip())
        lines.append("")
    return "\n".join(lines)

srt_text = to_srt(result["segments"])
print(srt_text)

7. Model Notes

This backend uses faster-whisper, which loads CTranslate2-converted Whisper checkpoints. Models are pulled on-demand from HuggingFace under ctranslate2-4you/whisper-<model>-ct2-<quantization> (or ctranslate2-4you/distil-whisper-<model>-ct2-<quantization> for the Distil variants).

Model familyEnglish-only?Translation?Notes
large-v3 / large-v3-turboNoYesMultilingual; turbo is a fine-tune with fewer decoder layers.
medium / small / base / tinyNoYesMultilingual; smaller = faster + less VRAM / RAM.
medium.en / small.en / base.en / tiny.enYesNoEnglish-only. Slightly better English accuracy at the same size. Pass language="en".
distil-whisper-large-v3 / medium.en / small.enVariesNoDistilled variants: faster, smaller, English-focused.
Important
.en models and all Distil variants do not support translation. If you request task_mode="translate" against one of these models, faster-whisper will raise and the server returns HTTP 500.

8. Checking Server Status

Health Check

import requests

r = requests.get("http://127.0.0.1:8765/health")
print(r.json())

Returns:

{
    "status": "ok"
}

Server Status

r = requests.get("http://127.0.0.1:8765/status")
status = r.json()
print(status)

Returns (when idle):

{
    "server_running": true,
    "queue_depth": 0,
    "transcription_active": false
}

Returns (while processing one request with two more waiting):

{
    "server_running": true,
    "queue_depth": 2,
    "transcription_active": true
}
FieldTypeDescription
server_runningboolAlways true (if the server weren't running, the request would fail).
queue_depthintNumber of requests waiting in line. 0 means no queue.
transcription_activebooltrue if a transcription is currently being processed.

Example — wait until the server is free before submitting:

import time
import requests

while True:
    status = requests.get("http://127.0.0.1:8765/status").json()
    if status["queue_depth"] == 0 and not status["transcription_active"]:
        break
    print(f"Server busy (queue: {status['queue_depth']}), waiting...")
    time.sleep(2)

print("Server is free, submitting...")

List Available Models

r = requests.get("http://127.0.0.1:8765/models")
models = r.json()
print(models)

Returns a dictionary keyed by model name:

{
    "large-v3": {
        "name": "large-v3",
        "supports_translation": true
    },
    "medium.en": {
        "name": "medium.en",
        "supports_translation": false
    },
    "distil-whisper-large-v3": {
        "name": "distil-whisper-large-v3",
        "supports_translation": false
    }
    // ... etc
}
Note
Unlike the WhisperS2T sister project, this /models endpoint returns one entry per base model name, not one entry per (name, quantization) pair. Use the quantization field on /transcribe to pick the precision.

Example — find all multilingual models:

models = requests.get("http://127.0.0.1:8765/models").json()

for name, info in models.items():
    if info["supports_translation"]:
        print(name)

9. Request Queuing

The server processes one transcription at a time (GPU is a shared resource). If you send multiple requests simultaneously, they are placed in a queue and processed in order. Each client waits for its own result — you don't need to poll.

import threading
import requests

def transcribe(file_path):
    with open(file_path, "rb") as f:
        r = requests.post(
            "http://127.0.0.1:8765/transcribe",
            files={"audio": (file_path, f)},
        )
    print(f"{file_path}: {r.json()['text'][:80]}...")

threads = [
    threading.Thread(target=transcribe, args=("file1.mp3",)),
    threading.Thread(target=transcribe, args=("file2.mp3",)),
    threading.Thread(target=transcribe, args=("file3.mp3",)),
]
for t in threads:
    t.start()
for t in threads:
    t.join()

10. Using curl (Command Line)

# Health check
curl http://127.0.0.1:8765/health

# Transcribe a file
curl -F "audio=@my_audio.mp3" http://127.0.0.1:8765/transcribe

# Transcribe with settings
curl -F "audio=@my_audio.mp3" \
     -F "model=large-v3" \
     -F "quantization=float16" \
     -F "task_mode=transcribe" \
     -F "language=en" \
     -F "include_timestamps=true" \
     -F "beam_size=5" \
     http://127.0.0.1:8765/transcribe

11. Error Handling

All error responses return a JSON object with a detail field explaining what went wrong.

400 — Bad Request

// Invalid model name
{"detail": "Unknown model 'FakeModel'. Available: ['tiny', 'tiny.en', 'base', ...]"}

// Empty audio
{"detail": "Empty audio data"}

// Unreadable audio format
{"detail": "Failed to process audio: [decoder error details]"}

422 — Validation Error

{
    "detail": [
        {
            "type": "missing",
            "loc": ["body", "audio"],
            "msg": "Field required",
            "input": null
        }
    ]
}
Note
For 422 errors, detail is an array of error objects. Each has loc, msg, and type.

500 — Internal Server Error

// CUDA OOM
{"detail": "Transcription failed: CUDA out of memory. Tried to allocate 512.00 MiB..."}

// Translate on an .en / Distil model
{"detail": "Transcription failed: This model is English-only and does not support translation."}

503 — Service Unavailable

{"detail": "Server shutting down"}

Robust Error Handling Pattern

import requests

def transcribe_file(file_path, **settings):
    try:
        with open(file_path, "rb") as f:
            r = requests.post(
                "http://127.0.0.1:8765/transcribe",
                files={"audio": (file_path, f)},
                data=settings,
                timeout=300,
            )
    except requests.ConnectionError:
        print("Cannot connect — is the server running?")
        return None
    except requests.Timeout:
        print("Request timed out")
        return None

    if r.status_code == 200:
        return r.json()

    error = r.json()
    detail = error.get("detail", "Unknown error")

    if r.status_code == 400:
        print(f"Bad request: {detail}")
    elif r.status_code == 422:
        for err in detail:
            print(f"Validation error at {err['loc']}: {err['msg']}")
    elif r.status_code == 500:
        print(f"Server error: {detail}")
    elif r.status_code == 503:
        print("Server is shutting down, try again later")

    return None

result = transcribe_file("meeting.mp3", model="large-v3", language="en")
if result:
    print(result["text"])

Quick Reference

Codedetail TypeMeaningWhat To Do
200SuccessUse result["text"] and result["segments"]
400stringBad inputFix model, language, task, or audio
422arrayMissing fieldCheck that audio is included
500stringModel errorCheck VRAM, language/model compatibility
503stringServer stoppingWait and retry

12. Complete Example

A full script that transcribes all .mp3 files in a folder:

import requests
from pathlib import Path

SERVER = "http://127.0.0.1:8765"
AUDIO_DIR = Path("./my_audio_files")

# Check server is running
health = requests.get(f"{SERVER}/health")
if health.status_code != 200:
    print("Server is not running!")
    exit(1)

for audio_file in sorted(AUDIO_DIR.glob("*.mp3")):
    print(f"Transcribing: {audio_file.name}...", end=" ", flush=True)

    with open(audio_file, "rb") as f:
        response = requests.post(
            f"{SERVER}/transcribe",
            files={"audio": (audio_file.name, f, "audio/mpeg")},
            data={"language": "en", "beam_size": "5"},
        )

    if response.status_code == 200:
        result = response.json()
        text = result["text"]
        duration = result["duration"]
        speed = result["processing_time_seconds"]

        output_file = audio_file.with_suffix(".txt")
        output_file.write_text(text, encoding="utf-8")
        print(f"Done ({duration:.1f}s audio in {speed:.1f}s)")
    else:
        print(f"Failed: {response.json().get('detail', 'Unknown error')}")

Faster-Whisper Transcriber — Server API Guide