Streaming STT, Gemini TTS, guardrails & provider controls

Live streaming transcription, new audio providers, per-project content guardrails, and org-level provider blocking — a dense release week for audio and safety capabilities.

Deepgram live streaming STT

Deepgram now supports real-time streaming transcription via WebSocket — send audio as it’s recorded and get transcripts back word by word instead of waiting for the full file. Streaming-capable models are catalogued with their supported encodings, languages, and sample rates so you can pick the right one for your use case.

OpenAI Whisper & GPT-4o Transcribe

OpenAI’s STT models are now available on the transcriptions endpoint. Whisper-1 and GPT-4o Transcribe both support MP3, MP4, WAV, OGG, WebM, FLAC, and M4A — up to 25 MB per file — with optional language selection and word-level timestamps.

Google Gemini Text-to-Speech

Gemini TTS is now available as a provider. Multiple models with support for PCM, MP3, OGG, A-law, and Mu-law output formats.

Guardrails

You can now define per-project content rules that inspect every request before it reaches an LLM and every response before it reaches your users. Each rule specifies what to match and what to do: reject the request outright, or silently mask the matched content with a replacement. Building blocks for PII protection, topic restrictions, and output safety — without changing your application code.

Org provider blocking

Admins can block specific AI providers for their entire org — blocked providers disappear from the chat UI and are excluded from routing. Useful for compliance requirements or keeping teams on an approved vendor list.

Vision fix

A regression was causing multimodal image_url content to be silently dropped in the gateway — models received only the text and replied that they couldn’t see any image. This is fixed; image inputs now reach the model correctly across all providers.