What is Text Embedding 3 Small best suited for?

Text Embedding 3 Small is best for semantic search, retrieval-augmented generation, clustering, recommendations, and other tasks requiring dense text similarity comparisons.

What is the context window of Text Embedding 3 Small?

Text Embedding 3 Small supports input texts up to 8,192 tokens in length.

How fast is Text Embedding 3 Small when called through LLM.API?

Text Embedding 3 Small is designed for high-throughput, low-latency embedding generation; actual latency depends on your request size and LLM.API region.

Which input modalities does Text Embedding 3 Small support?

Text Embedding 3 Small supports text-only input and outputs numeric embedding vectors; it does not accept images, audio, or other modalities.

How do I access Text Embedding 3 Small via LLM.API?

Call the LLM.API embeddings endpoint, set the provider to OpenAI, and specify the model name "text-embedding-3-small" in your request.

How does Text Embedding 3 Small compare to Text Embedding 3 Large?

Text Embedding 3 Small is cheaper and slightly lower quality than Text Embedding 3 Large, making it preferable for large-scale or latency-sensitive workloads.

What are the limitations of Text Embedding 3 Small?

Text Embedding 3 Small cannot generate natural language, code, or images and may underperform on highly specialized or domain-specific semantic tasks.

How is pricing for Text Embedding 3 Small handled on LLM.API?

Pricing for Text Embedding 3 Small on LLM.API is typically usage-based per token or character and may differ slightly from OpenAI direct pricing.

Can I use Text Embedding 3 Small for multilingual text?

Text Embedding 3 Small supports multiple languages, but performance may vary by language and is generally strongest for English content.

Text Embedding 3 Small

Text Embeddings

Text Embedding 3 Small is an OpenAI embedding model optimized for low-latency, low-cost vector representations of text. It offers strong semantic performance while being suitable for large-scale or resource-constrained applications.

Start Using API

API Performance

Latency: ~0.4s avg embedding request
Context: 8K token context
Input: ~$0.02 per 1M tokens
Output: $0.00 per 1M tokens (embeddings only)
Uptime: 99% 99%

About the model

What is Text Embedding 3 Small?

Text Embedding 3 Small is an OpenAI model that converts text into dense vector embeddings for efficient semantic comparison and retrieval. It is primarily used for tasks like semantic search, information retrieval, and clustering where many documents or queries must be embedded cost-effectively. It is also commonly applied in recommendation systems, topic modeling, and classification workflows that rely on vector similarity. It belongs to OpenAI’s third-generation text embedding family, following earlier models such as the text-embedding-ada-002 series.

Input / Output

Input

Text strings (plain text for embedding)
Lists of text strings (batched text inputs)

Output

Numeric embedding vectors (arrays of floating point numbers)

Model capabilities

5 Core Capabilities

Text Embedding

Generates dense numerical vector representations of text optimized for semantic tasks like search, clustering, and retrieval.
Semantic Similarity

Enables comparison of texts by measuring vector similarity, supporting relevance ranking, deduplication, and near-duplicate detection.
Document Clustering

Supports grouping of related documents or sentences based on embedding proximity, useful for topic discovery and organization.
Multilingual Support

Handles multiple languages for embeddings, enabling cross-lingual similarity search and retrieval in multilingual datasets.
Classification Features

Provides embedding vectors that can be used as input features for downstream classifiers and other machine learning models.

Use cases

6 Most Valuable Use Cases

Semantic Text Search
Document Clustering
Recommendation Matching
Duplicate Content Detection
Cross-Lingual Retrieval
Efficient Vector Indexing

Transparent pricing

Cost Comparison

LLM API offers the lowest-cost, highest-capacity Text Embedding 3 Small–class embeddings.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	80ms	120k tps	99.99%	$0.010	$0.00	200K tokens
OpenAI	Global	120ms	80k tps	99.9%	$0.020	$0.00	128K tokens
Azure OpenAI	US East, EU West	~140ms	~60k tps	99.9%	~$0.022	$0.00	~128K tokens
Anthropic	US, EU	~150ms	~50k tps	99.9%	~$0.025	$0.00	200K tokens
Google Cloud	Global	210ms	55k tps	99.9%	~$0.023	$0.00	200K tokens

Performance benchmarks

Technical Specifications

Metric	Text Embedding 3 Small (OpenAI)	text-embedding-3-large (OpenAI)	text-embedding-ada-002 (OpenAI)
Dimensions	1536	3072	1536
Max Input Tokens	8K	8K	8K
Price per 1M Tokens	$0.02	$0.13	$0.10
Avg Latency	~120ms	~180ms	~200ms
Throughput	~1,200 tps	~900 tps	~800 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

11.4B: Prompt tokens processed (30 days)
7.8M: Embedding API requests (30 days)
620K: Active developer accounts (30 days)
99.98%: Average API uptime (30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the optimal model across providers based on latency, cost, and quality—no client changes, just smarter traffic control.
One endpoint, every model
Cost-Aware Orchestration

Control spend with per-route pricing rules, dynamic model downgrades, and usage caps so you can experiment freely without surprise bills.
Optimize every token
Resilient Fallback Flows

Define automatic fallbacks when a provider fails or times out, ensuring mission-critical flows stay online even during upstream incidents.
Don’t ship single points
Deep Observability

Trace every request end-to-end with logs, metrics, and latency breakdowns across providers so you can debug production issues in minutes, not days.
See every token hop
Task-Level Abstractions

Declare tasks like chat, tools, or RAG once and let LLM.API translate them into provider-specific calls, keeping your app logic clean and portable.
Code to tasks, not vendors
High-Throughput Batch

Submit large batches of prompts with automatic chunking, retries, and concurrency control to process millions of tokens efficiently and predictably.
Scale jobs, not scripts

Decision guide

When to Use — When NOT to Use

Use it if...

You need low-cost, general-purpose text embeddings for large-scale production applications.
You need strong performance on semantic search, retrieval, and reranking across many domains.
Your use case involves clustering or deduplicating large text corpora efficiently and cheaply.
Your use case involves building recommendation systems based on textual or metadata similarity.
You need compact embeddings that balance quality and speed for interactive applications.
Your use case involves multilingual semantic search where most text is in high-resource languages.

Avoid if...

You need cross-modal embeddings that jointly represent text and images in one vector space.
Your workload requires domain-specialized embeddings, heavily tuned to a narrow technical field.
You need sentence embeddings explicitly aligned with another vendor’s proprietary embedding space.
Your workload requires extremely long-context representations beyond the model’s documented token limits.
You need embeddings optimized for code understanding rather than natural language text.
Your workload requires strict on-prem deployment without any connection to OpenAI’s API.

FAQ

Frequently Asked Questions

What is Text Embedding 3 Small?

Text Embedding 3 Small is an OpenAI model that generates vector representations of text, optimized for low cost and strong retrieval performance.
What is Text Embedding 3 Small best suited for?

Text Embedding 3 Small is best for semantic search, retrieval-augmented generation, clustering, recommendations, and other tasks requiring dense text similarity comparisons.
What is the context window of Text Embedding 3 Small?

Text Embedding 3 Small supports input texts up to 8,192 tokens in length.
How fast is Text Embedding 3 Small when called through LLM.API?

Text Embedding 3 Small is designed for high-throughput, low-latency embedding generation; actual latency depends on your request size and LLM.API region.
Which input modalities does Text Embedding 3 Small support?

Text Embedding 3 Small supports text-only input and outputs numeric embedding vectors; it does not accept images, audio, or other modalities.
How do I access Text Embedding 3 Small via LLM.API?

Call the LLM.API embeddings endpoint, set the provider to OpenAI, and specify the model name "text-embedding-3-small" in your request.
How does Text Embedding 3 Small compare to Text Embedding 3 Large?

Text Embedding 3 Small is cheaper and slightly lower quality than Text Embedding 3 Large, making it preferable for large-scale or latency-sensitive workloads.
What are the limitations of Text Embedding 3 Small?

Text Embedding 3 Small cannot generate natural language, code, or images and may underperform on highly specialized or domain-specific semantic tasks.
How is pricing for Text Embedding 3 Small handled on LLM.API?

Pricing for Text Embedding 3 Small on LLM.API is typically usage-based per token or character and may differ slightly from OpenAI direct pricing.
Can I use Text Embedding 3 Small for multilingual text?

Text Embedding 3 Small supports multiple languages, but performance may vary by language and is generally strongest for English content.

Start in 2 lines of code

Get My API Key

Text Embedding 3 Small

What is Text Embedding 3 Small?

5 Core Capabilities

Text Embedding

Semantic Similarity

Document Clustering

Multilingual Support

Classification Features

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Resilient Fallback Flows

Deep Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code