Live streaming transcription, new audio providers, per-project content guardrails, and org-level provider blocking — a dense release week for audio and safety capabilities.
Deepgram live streaming STT
Deepgram now supports real-time streaming transcription via WebSocket — send audio as it’s recorded and get transcripts back word by word instead of waiting for the full file. Streaming-capable models are catalogued with their supported encodings, languages, and sample rates so you can pick the right one for your use case.
OpenAI Whisper & GPT-4o Transcribe
OpenAI’s STT models are now available on the transcriptions endpoint. Whisper-1 and GPT-4o Transcribe both support MP3, MP4, WAV, OGG, WebM, FLAC, and M4A — up to 25 MB per file — with optional language selection and word-level timestamps.
Google Gemini Text-to-Speech
Gemini TTS is now available as a provider. Multiple models with support for PCM, MP3, OGG, A-law, and Mu-law output formats.
Guardrails
You can now define per-project content rules that inspect every request before it reaches an LLM and every response before it reaches your users. Each rule specifies what to match and what to do: reject the request outright, or silently mask the matched content with a replacement. Building blocks for PII protection, topic restrictions, and output safety — without changing your application code.
Org provider blocking
Admins can block specific AI providers for their entire org — blocked providers disappear from the chat UI and are excluded from routing. Useful for compliance requirements or keeping teams on an approved vendor list.
Vision fix
A regression was causing multimodal image_url content to be silently dropped in the gateway — models received only the text and replied that they couldn’t see any image. This is fixed; image inputs now reach the model correctly across all providers.
