Claude web search, new voice models, and model retirements

Claude can now search the web for real answers, Cartesia adds two new voice models, four new text models join the catalog, and a set of older models are retired or scheduled for retirement at the end of the year.

Claude web search

Claude models now correctly route to a backend that runs live web search. Previously, search requests sent to Claude would silently skip the retrieval step and fall back to the model’s training data — so you’d get an answer, but not necessarily an up-to-date one. That’s fixed. When you enable web search on a Claude request, the model actually searches and can cite current information in its response.

New voice models — Cartesia

Two new Cartesia models are now available:
– Sonic 3.5 — text-to-speech. Cartesia’s latest generation, faster and more expressive than Sonic 2.
– Ink 2 — streaming speech-to-text. Low-latency transcription over WebSocket, designed for real-time voice applications.

New models

– Kimi K2.7 Code (Moonshot) — multimodal model with streaming, vision, and tool support. Built for coding tasks.
– GLM 5.2 (z-ai) — available across multiple backends.
– llmapi-os — a new open-source model provider, adding a curated set of open-source models to the catalog.

Also this week

– Images work for Claude via OpenAI endpoint
— sending image content through the OpenAI-compatible endpoint to Claude models was silently dropped. Fixed: vision inputs now reach Claude correctly.
– Clearer provider errors — when an upstream AI provider returns an error, you now see the real reason and error code instead of a generic failure message. Much easier to troubleshoot.
– More accurate reasoning effort — reasoning level options are now more precisely mapped, so low, medium, and high behave consistently regardless of which model you’re using.
– Discount savings on the billing page — a new Savings card on the billing page shows your organization’s cumulative discount savings: all-time total and a breakdown by time window.
– BYOK at zero balance — if you use your own provider keys (BYOK), requests now go through correctly even when your LLM API credit balance is at $0.
– LLM setup guide — a new “Configure Your AI Tools” section in the docs covers how to connect LLM API to your coding tools and IDE integrations.

Model retirements

The following models have already been retired and are no longer available:

– qwen3-235b-a22b-thinking-2507, qwen35-397b-a17b, qwen-3-235b-a22b-instruct-2507, gemma-3-12b-it, llama3.1-8b, llama-guard-3-8b

The following OpenAI models are still available but will be retired on December 10, 2026.
Migrate before then: o3, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro

Their successors — gpt-5.5, gpt-5.5-pro, and o4 — are available today with the same request shape.