Step 3.5 Flash

Instruction Following

Step 3.5 Flash is StepFun’s sparse Mixture-of-Experts language model that delivers frontier-level reasoning and agentic capabilities while remaining highly efficient and fast for production use.

Start Using API

API Performance

Latency: 0.85s time to first token
Context: 256K token context
Input: $0.10 per 1M tokens
Output: $0.30 per 1M tokens
Uptime: 99% 99%

About the model

What is Step 3.5 Flash?

Step 3.5 Flash is a sparse Mixture-of-Experts large language model from StepFun designed to combine frontier-level reasoning with high-throughput, low-latency inference. It is mainly used for complex reasoning tasks, code generation, and agentic workflows that benefit from its long context window and efficient token usage. The model is also applied to natural language processing, data analysis, and long-document or codebase understanding where cost and speed are critical. It belongs to StepFun’s Step family of models and builds on the Step 3.x architecture and research line.

Input / Output

Input

Text (prompts, messages; up to ~256K-token context)

Output

Generated text completions and chat-style responses
Generated source code in various programming languages

Model capabilities

5 Core Capabilities

Conversational Chat

Handles general-purpose dialogue, explanations, brainstorming, and question answering, optimized for fast, low-latency text generation and responses.
Structured JSON Output

Generates well-formed JSON and structured text suitable for programmatic consumption, including configuration data, responses, and tool outputs.
Multilingual Translation

Translates between multiple natural languages, leveraging its large context and reasoning capabilities to preserve meaning and style.
Code Generation

Writes and edits code, explains snippets, and assists with debugging across common programming languages, tuned for agentic coding workflows.
Long-Context Reasoning

Performs reasoning and synthesis over very long texts using its 256K-token context window for documents, logs, or multi-step analyses.

Use cases

6 Most Valuable Use Cases

Multilingual Customer Support
Invoice Field Extraction
Legal Clause Retrieval
Regulatory Change Monitoring
E-commerce Product Assistant
Code Generation and Debugging

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and fastest access to Step 3.5 Flash–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	110ms	80 tps	99.99%	$0.08	$0.24	256K
StepFun	Global	~180ms	~40 tps	~99.9%	~$0.10	~$0.30	~128K
OpenAI-compatible gateway	US East	~220ms	~35 tps	~99.9%	~$0.12	~$0.36	~128K
Cloud Hyperscaler A	EU West	~250ms	~30 tps	~99.95%	~$0.14	~$0.40	~128K

Performance benchmarks

Technical Specifications

Metric	Step 3.5 Flash (StepFun)	GPT-4.1 mini (OpenAI)	Claude 3 Haiku (Anthropic)
Avg Latency	~180ms	~200ms	~220ms
Context Window	128K	128K	200K
Input Price ($/1M)	$0.10	$0.15	$0.25
Output Price ($/1M)	$0.40	$0.60	$0.80
Max Output Tokens	4K	4K	4K
Throughput	~120 tps	~100 tps	~90 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

5.6B: Prompt tokens processed (last 30 days)
24M: Completion tokens generated (last 30 days)
2.1M: API requests served (last 30 days)
99.8%: Avg API uptime (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Intelligent AI Routing

Automatically route each request to the optimal model based on latency, cost, and quality—without changing your integration or redeploying services.
One endpoint, every model.
Cost-Aware Orchestration

Control spend with dynamic model selection, price ceilings, and transparent usage metrics so you can scale AI features without runaway cloud bills.
Max performance, minimal spend.
Resilient Fallback Logic

Define automatic failover chains across providers so timeouts, rate limits, or outages don’t break your workloads or user experience.
Stay online, even upstream.
End-to-End Observability

Trace every call across providers with logs, metrics, and latency breakdowns to debug faster and optimize prompt, model, and routing decisions.
See every token’s journey.
Task-Level Orchestration

Describe high-level tasks once and let LLM.API handle tool calls, multi-step flows, and provider choices for consistent results across environments.
Think tasks, not models.
High-Throughput Batch APIs

Send thousands of operations in a single request with controlled concurrency, retries, and deduplication to power large-scale inference pipelines efficiently.
Ship at batch scale.

Decision guide

When to Use — When NOT to Use

Use it if...

You need a low-cost general-purpose model for everyday chat and task automation.
You need fast responses for lightweight classification, tagging, and basic extraction tasks.
Your use case involves generating short marketing copy, social content, or product descriptions.
Your use case involves simple code help, like small snippets, comments, or refactors.
You need an inexpensive model to power AI features in consumer or internal tools.
You need a model to summarize short documents, tickets, or support conversations efficiently.
Your use case involves multilingual but simple Q&A where perfect nuance is not critical.

Avoid if...

You need frontier-level reasoning quality for complex problem solving or strategic decision support.
Your workload requires highly reliable long-context handling across very large documents or codebases.
You need state-of-the-art code generation for complex systems, architectures, or optimization-heavy tasks.
Your workload requires strong domain expertise in sensitive areas like medicine, law, or finance.
You need advanced tool use, multi-step planning, or orchestration across many external systems.
Your workload requires carefully controlled style-mimicry, safety-tuned outputs, and strict content controls.
You need highest possible model quality for user-facing flagship features or premium products.

FAQ

Frequently Asked Questions

What is Step 3.5 Flash?

Step 3.5 Flash is a fast, cost-efficient StepFun language model for general-purpose text generation and reasoning, accessible through the LLM.API gateway.
What is Step 3.5 Flash best suited for?

Step 3.5 Flash is best for high-throughput tasks like chatbots, agents, data processing, and lightweight reasoning where low latency and low cost matter.
What is the context window of Step 3.5 Flash?

Step 3.5 Flash supports a context window of up to 32K tokens, including both prompt and completion tokens.
How fast is Step 3.5 Flash in terms of latency?

Step 3.5 Flash is optimized for low latency, typically returning first tokens within a few hundred milliseconds depending on prompt size and load.
What modalities does Step 3.5 Flash support?

Step 3.5 Flash is a text-only model that accepts textual prompts and returns textual completions.
How do I call Step 3.5 Flash through LLM.API?

Specify the provider as "StepFun" and the model name "step-3.5-flash" in your LLM.API request, sending standard chat or completion payloads.
How is Step 3.5 Flash priced on LLM.API?

On LLM.API, Step 3.5 Flash is billed per 1,000 tokens of prompt and completion; check the LLM.API pricing page for exact rates.
How does Step 3.5 Flash compare to larger StepFun models?

Compared to larger StepFun models, Step 3.5 Flash is cheaper and faster but generally less capable on complex reasoning and intricate long-context tasks.
Does Step 3.5 Flash support streaming responses via LLM.API?

Yes, Step 3.5 Flash can stream tokens incrementally when you enable streaming mode in your LLM.API request.
What are the main limitations of Step 3.5 Flash?

Step 3.5 Flash may struggle with very long multi-step reasoning, domain-expert tasks, strict factual accuracy, and does not access external tools or the internet.

Start in 2 lines of code

Get My API Key

Step 3.5 Flash

What is Step 3.5 Flash?

5 Core Capabilities

Conversational Chat

Structured JSON Output

Multilingual Translation

Code Generation

Long-Context Reasoning

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Intelligent AI Routing

Cost-Aware Orchestration

Resilient Fallback Logic

End-to-End Observability

Task-Level Orchestration

High-Throughput Batch APIs

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code