Powered by StepFun

Step 3.5 Flash

  • Instruction Following

Step 3.5 Flash is StepFun’s sparse Mixture-of-Experts language model that delivers frontier-level reasoning and agentic capabilities while remaining highly efficient and fast for production use.

Start Using API

What is Step 3.5 Flash?

Step 3.5 Flash is a sparse Mixture-of-Experts large language model from StepFun designed to combine frontier-level reasoning with high-throughput, low-latency inference. It is mainly used for complex reasoning tasks, code generation, and agentic workflows that benefit from its long context window and efficient token usage. The model is also applied to natural language processing, data analysis, and long-document or codebase understanding where cost and speed are critical. It belongs to StepFun’s Step family of models and builds on the Step 3.x architecture and research line.

5 Core Capabilities

  • Conversational Chat

    Handles general-purpose dialogue, explanations, brainstorming, and question answering, optimized for fast, low-latency text generation and responses.

  • Structured JSON Output

    Generates well-formed JSON and structured text suitable for programmatic consumption, including configuration data, responses, and tool outputs.

  • Multilingual Translation

    Translates between multiple natural languages, leveraging its large context and reasoning capabilities to preserve meaning and style.

  • Code Generation

    Writes and edits code, explains snippets, and assists with debugging across common programming languages, tuned for agentic coding workflows.

  • Long-Context Reasoning

    Performs reasoning and synthesis over very long texts using its 256K-token context window for documents, logs, or multi-step analyses.

6 Most Valuable Use Cases

  • Multilingual Customer Support
  • Invoice Field Extraction
  • Legal Clause Retrieval
  • Regulatory Change Monitoring
  • E-commerce Product Assistant
  • Code Generation and Debugging

Cost Comparison

LLM API offers the lowest cost and fastest access to Step 3.5 Flash–class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 110ms 80 tps 99.99% $0.08 $0.24 256K
StepFun Global ~180ms ~40 tps ~99.9% ~$0.10 ~$0.30 ~128K
OpenAI-compatible gateway US East ~220ms ~35 tps ~99.9% ~$0.12 ~$0.36 ~128K
Cloud Hyperscaler A EU West ~250ms ~30 tps ~99.95% ~$0.14 ~$0.40 ~128K

Technical Specifications

Metric Step 3.5 Flash (StepFun) GPT-4.1 mini (OpenAI) Claude 3 Haiku (Anthropic)
Avg Latency ~180ms ~200ms ~220ms
Context Window 128K 128K 200K
Input Price ($/1M) $0.10 $0.15 $0.25
Output Price ($/1M) $0.40 $0.60 $0.80
Max Output Tokens 4K 4K 4K
Throughput ~120 tps ~100 tps ~90 tps
Uptime 99.9% 99.9% 99.9%

30-day usage via LLM API

5.6B
Prompt tokens processed (last 30 days)
24M
Completion tokens generated (last 30 days)
2.1M
API requests served (last 30 days)
99.8%
Avg API uptime (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Intelligent AI Routing

    Automatically route each request to the optimal model based on latency, cost, and quality—without changing your integration or redeploying services.

    One endpoint, every model.
  • Cost-Aware Orchestration

    Control spend with dynamic model selection, price ceilings, and transparent usage metrics so you can scale AI features without runaway cloud bills.

    Max performance, minimal spend.
  • Resilient Fallback Logic

    Define automatic failover chains across providers so timeouts, rate limits, or outages don’t break your workloads or user experience.

    Stay online, even upstream.
  • End-to-End Observability

    Trace every call across providers with logs, metrics, and latency breakdowns to debug faster and optimize prompt, model, and routing decisions.

    See every token’s journey.
  • Task-Level Orchestration

    Describe high-level tasks once and let LLM.API handle tool calls, multi-step flows, and provider choices for consistent results across environments.

    Think tasks, not models.
  • High-Throughput Batch APIs

    Send thousands of operations in a single request with controlled concurrency, retries, and deduplication to power large-scale inference pipelines efficiently.

    Ship at batch scale.

When to Use — When NOT to Use

Use it if...

  • You need a low-cost general-purpose model for everyday chat and task automation.
  • You need fast responses for lightweight classification, tagging, and basic extraction tasks.
  • Your use case involves generating short marketing copy, social content, or product descriptions.
  • Your use case involves simple code help, like small snippets, comments, or refactors.
  • You need an inexpensive model to power AI features in consumer or internal tools.
  • You need a model to summarize short documents, tickets, or support conversations efficiently.
  • Your use case involves multilingual but simple Q&A where perfect nuance is not critical.

Avoid if...

  • You need frontier-level reasoning quality for complex problem solving or strategic decision support.
  • Your workload requires highly reliable long-context handling across very large documents or codebases.
  • You need state-of-the-art code generation for complex systems, architectures, or optimization-heavy tasks.
  • Your workload requires strong domain expertise in sensitive areas like medicine, law, or finance.
  • You need advanced tool use, multi-step planning, or orchestration across many external systems.
  • Your workload requires carefully controlled style-mimicry, safety-tuned outputs, and strict content controls.
  • You need highest possible model quality for user-facing flagship features or premium products.

Frequently Asked Questions

  • What is Step 3.5 Flash?

    Step 3.5 Flash is a fast, cost-efficient StepFun language model for general-purpose text generation and reasoning, accessible through the LLM.API gateway.

  • What is Step 3.5 Flash best suited for?

    Step 3.5 Flash is best for high-throughput tasks like chatbots, agents, data processing, and lightweight reasoning where low latency and low cost matter.

  • What is the context window of Step 3.5 Flash?

    Step 3.5 Flash supports a context window of up to 32K tokens, including both prompt and completion tokens.

  • How fast is Step 3.5 Flash in terms of latency?

    Step 3.5 Flash is optimized for low latency, typically returning first tokens within a few hundred milliseconds depending on prompt size and load.

  • What modalities does Step 3.5 Flash support?

    Step 3.5 Flash is a text-only model that accepts textual prompts and returns textual completions.

  • How do I call Step 3.5 Flash through LLM.API?

    Specify the provider as "StepFun" and the model name "step-3.5-flash" in your LLM.API request, sending standard chat or completion payloads.

  • How is Step 3.5 Flash priced on LLM.API?

    On LLM.API, Step 3.5 Flash is billed per 1,000 tokens of prompt and completion; check the LLM.API pricing page for exact rates.

  • How does Step 3.5 Flash compare to larger StepFun models?

    Compared to larger StepFun models, Step 3.5 Flash is cheaper and faster but generally less capable on complex reasoning and intricate long-context tasks.

  • Does Step 3.5 Flash support streaming responses via LLM.API?

    Yes, Step 3.5 Flash can stream tokens incrementally when you enable streaming mode in your LLM.API request.

  • What are the main limitations of Step 3.5 Flash?

    Step 3.5 Flash may struggle with very long multi-step reasoning, domain-expert tasks, strict factual accuracy, and does not access external tools or the internet.

Start in 2 lines of code

Get My API Key