vllm.entrypoints.openai.realtime.serving ¶
OpenAIServingRealtime ¶
Bases: OpenAIServing
Realtime audio transcription service via WebSocket streaming.
Provides streaming audio-to-text transcription by transforming audio chunks into StreamingInput objects that can be consumed by the engine.
Source code in vllm/entrypoints/openai/realtime/serving.py
model_cls cached property ¶
model_cls: type[SupportsRealtime]
Get the model class that supports transcription.
transcribe_realtime async ¶
transcribe_realtime(
audio_stream: AsyncGenerator[ndarray, None],
input_stream: Queue[list[int]],
) -> AsyncGenerator[StreamingInput, None]
Transform audio stream into StreamingInput for engine.generate().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_stream | AsyncGenerator[ndarray, None] | Async generator yielding float32 numpy audio arrays | required |
input_stream | Queue[list[int]] | Queue containing context token IDs from previous generation outputs. Used for autoregressive multi-turn processing where each generation's output becomes the context for the next iteration. | required |
Yields:
| Type | Description |
|---|---|
AsyncGenerator[StreamingInput, None] | StreamingInput objects containing audio prompts for the engine |