Powered by OpenAI
Text Embedding 3 Small
Text Embedding 3 Small is an OpenAI embedding model optimized for low-latency, low-cost vector representations of text. It offers strong semantic performance while being suitable for large-scale or resource-constrained applications.
About the model
What is Text Embedding 3 Small?
Text Embedding 3 Small is an OpenAI model that converts text into dense vector embeddings for efficient semantic comparison and retrieval. It is primarily used for tasks like semantic search, information retrieval, and clustering where many documents or queries must be embedded cost-effectively. It is also commonly applied in recommendation systems, topic modeling, and classification workflows that rely on vector similarity. It belongs to OpenAI’s third-generation text embedding family, following earlier models such as the text-embedding-ada-002 series.
Model capabilities
5 Core Capabilities
-
Text Embedding
Generates dense numerical vector representations of text optimized for semantic tasks like search, clustering, and retrieval.
-
Semantic Similarity
Enables comparison of texts by measuring vector similarity, supporting relevance ranking, deduplication, and near-duplicate detection.
-
Document Clustering
Supports grouping of related documents or sentences based on embedding proximity, useful for topic discovery and organization.
-
Multilingual Support
Handles multiple languages for embeddings, enabling cross-lingual similarity search and retrieval in multilingual datasets.
-
Classification Features
Provides embedding vectors that can be used as input features for downstream classifiers and other machine learning models.
Use cases
6 Most Valuable Use Cases
- Semantic Text Search
- Document Clustering
- Recommendation Matching
- Duplicate Content Detection
- Cross-Lingual Retrieval
- Efficient Vector Indexing
Transparent pricing
Cost Comparison
LLM API offers the lowest-cost, highest-capacity Text Embedding 3 Small–class embeddings.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 80ms | 120k tps | 99.99% | $0.010 | $0.00 | 200K tokens |
| OpenAI | Global | 120ms | 80k tps | 99.9% | $0.020 | $0.00 | 128K tokens |
| Azure OpenAI | US East, EU West | ~140ms | ~60k tps | 99.9% | ~$0.022 | $0.00 | ~128K tokens |
| Anthropic | US, EU | ~150ms | ~50k tps | 99.9% | ~$0.025 | $0.00 | 200K tokens |
| Google Cloud | Global | 210ms | 55k tps | 99.9% | ~$0.023 | $0.00 | 200K tokens |
Performance benchmarks
Technical Specifications
| Metric | Text Embedding 3 Small (OpenAI) | text-embedding-3-large (OpenAI) | text-embedding-ada-002 (OpenAI) |
|---|---|---|---|
| Dimensions | 1536 | 3072 | 1536 |
| Max Input Tokens | 8K | 8K | 8K |
| Price per 1M Tokens | $0.02 | $0.13 | $0.10 |
| Avg Latency | ~120ms | ~180ms | ~200ms |
| Throughput | ~1,200 tps | ~900 tps | ~800 tps |
| Uptime | 99.9% | 99.9% | 99.9% |
30-day usage via LLM API
- 11.4B
- Prompt tokens processed (30 days)
- 7.8M
- Embedding API requests (30 days)
- 620K
- Active developer accounts (30 days)
- 99.98%
- Average API uptime (30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Automatically route each request to the optimal model across providers based on latency, cost, and quality—no client changes, just smarter traffic control.
One endpoint, every model -
Cost-Aware Orchestration
Control spend with per-route pricing rules, dynamic model downgrades, and usage caps so you can experiment freely without surprise bills.
Optimize every token -
Resilient Fallback Flows
Define automatic fallbacks when a provider fails or times out, ensuring mission-critical flows stay online even during upstream incidents.
Don’t ship single points -
Deep Observability
Trace every request end-to-end with logs, metrics, and latency breakdowns across providers so you can debug production issues in minutes, not days.
See every token hop -
Task-Level Abstractions
Declare tasks like chat, tools, or RAG once and let LLM.API translate them into provider-specific calls, keeping your app logic clean and portable.
Code to tasks, not vendors -
High-Throughput Batch
Submit large batches of prompts with automatic chunking, retries, and concurrency control to process millions of tokens efficiently and predictably.
Scale jobs, not scripts
Decision guide
When to Use — When NOT to Use
Use it if...
- You need low-cost, general-purpose text embeddings for large-scale production applications.
- You need strong performance on semantic search, retrieval, and reranking across many domains.
- Your use case involves clustering or deduplicating large text corpora efficiently and cheaply.
- Your use case involves building recommendation systems based on textual or metadata similarity.
- You need compact embeddings that balance quality and speed for interactive applications.
- Your use case involves multilingual semantic search where most text is in high-resource languages.
Avoid if...
- You need cross-modal embeddings that jointly represent text and images in one vector space.
- Your workload requires domain-specialized embeddings, heavily tuned to a narrow technical field.
- You need sentence embeddings explicitly aligned with another vendor’s proprietary embedding space.
- Your workload requires extremely long-context representations beyond the model’s documented token limits.
- You need embeddings optimized for code understanding rather than natural language text.
- Your workload requires strict on-prem deployment without any connection to OpenAI’s API.
FAQ
Frequently Asked Questions
-
What is Text Embedding 3 Small?
Text Embedding 3 Small is an OpenAI model that generates vector representations of text, optimized for low cost and strong retrieval performance.
-
What is Text Embedding 3 Small best suited for?
Text Embedding 3 Small is best for semantic search, retrieval-augmented generation, clustering, recommendations, and other tasks requiring dense text similarity comparisons.
-
What is the context window of Text Embedding 3 Small?
Text Embedding 3 Small supports input texts up to 8,192 tokens in length.
-
How fast is Text Embedding 3 Small when called through LLM.API?
Text Embedding 3 Small is designed for high-throughput, low-latency embedding generation; actual latency depends on your request size and LLM.API region.
-
Which input modalities does Text Embedding 3 Small support?
Text Embedding 3 Small supports text-only input and outputs numeric embedding vectors; it does not accept images, audio, or other modalities.
-
How do I access Text Embedding 3 Small via LLM.API?
Call the LLM.API embeddings endpoint, set the provider to OpenAI, and specify the model name "text-embedding-3-small" in your request.
-
How does Text Embedding 3 Small compare to Text Embedding 3 Large?
Text Embedding 3 Small is cheaper and slightly lower quality than Text Embedding 3 Large, making it preferable for large-scale or latency-sensitive workloads.
-
What are the limitations of Text Embedding 3 Small?
Text Embedding 3 Small cannot generate natural language, code, or images and may underperform on highly specialized or domain-specific semantic tasks.
-
How is pricing for Text Embedding 3 Small handled on LLM.API?
Pricing for Text Embedding 3 Small on LLM.API is typically usage-based per token or character and may differ slightly from OpenAI direct pricing.
-
Can I use Text Embedding 3 Small for multilingual text?
Text Embedding 3 Small supports multiple languages, but performance may vary by language and is generally strongest for English content.
