Powered by OpenAI
Text Embedding 3 Large
- Text Generation
Text Embedding 3 Large is OpenAI’s high‑capacity embedding model optimized for semantic search, retrieval, and clustering tasks. It provides high‑quality vector representations of text with strong performance across diverse domains.
About the model
What is Text Embedding 3 Large?
Text Embedding 3 Large is an OpenAI model that converts text into high‑dimensional vector embeddings for downstream machine learning and retrieval applications. It is primarily used for tasks such as semantic search, reranking, and building retrieval‑augmented generation (RAG) systems over large document collections. It is also used for similarity search, clustering, classification features, and other applications that rely on dense vector representations of text. It is part of OpenAI’s text-embedding-3 family, succeeding earlier OpenAI embedding models like the text-embedding-ada family.
Model capabilities
5 Core Capabilities
-
High-Dimension Embeddings
Generates high-quality vector representations of text optimized for semantic tasks like clustering, retrieval, and similarity search.
-
Semantic Search
Enables retrieval of conceptually related documents by comparing embedding vectors instead of relying purely on keyword matching.
-
Text Clustering
Supports grouping related texts by embedding them into a shared vector space for downstream clustering and topic analysis.
-
Multilingual Semantics
Produces embeddings that capture meaning across multiple languages, enabling cross-lingual similarity and retrieval workflows.
-
Document Classification
Provides embeddings usable as features for training classifiers to categorize documents by topic, intent, or other labels.
Use cases
6 Most Valuable Use Cases
- Semantic Text Search
- Document Clustering
- Topic Tagging Support
- Product Recommendation Matching
- Cross-Lingual Similarity
- Legal Case Monitoring
Transparent pricing
Cost Comparison
LLM API offers the lowest-cost, highest-capacity embeddings versus comparable Text Embedding 3 Large tiers.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | ~110ms | ~65K tokens/s | 99.99% | ~$0.02 | $0.00 | ~1M tokens |
| OpenAI | Global | ~180ms | ~40K tokens/s | 99.9% | $0.13 | $0.00 | ~8192 tokens |
| Azure OpenAI | US East | ~190ms | ~35K tokens/s | 99.9% | ~$0.14 | $0.00 | ~8192 tokens |
| Anthropic | US West | ~200ms | ~30K tokens/s | 99.9% | ~$0.15 | $0.00 | ~200K tokens |
Performance benchmarks
Technical Specifications
| Metric | Text Embedding 3 Large (OpenAI) | text-embedding-3-large (OpenAI) | text-embedding-ada-002 (OpenAI) |
|---|---|---|---|
| Dimensions | 3072 | 3072 | 1536 |
| Max Input Tokens | 8192 | 8192 | 8192 |
| Price per 1M Tokens | $0.13 | $0.13 | $0.10 |
| Avg Latency | ~220ms | ~250ms | ~260ms |
| Throughput | ~1,200 tps | ~1,000 tps | ~950 tps |
| Uptime | 99.9% | 99.9% | 99.9% |
30-day usage via LLM API
- 9.8T
- Embedding tokens processed (30 days)
- 120M
- API requests served (30 days)
- 480K
- Active developer accounts (30 days)
- 99.98%
- Avg API uptime (30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Define policies once and let LLM.API automatically route each call to the best model across providers, balancing latency, accuracy, and availability with no client changes.
One endpoint, every model. -
Cost-Aware Orchestration
Set hard budgets and price tiers, then let LLM.API choose cheaper models by default and escalate only when needed, cutting spend without touching application code.
Spend less per token. -
Automatic Failover Guardrails
When a provider is slow, degraded, or down, LLM.API retries and fails over to healthy models, preserving SLAs and user experience without manual incident playbooks.
Resilience by default. -
End-to-End Observability
Get centralized traces, logs, and metrics for every request across all models and providers, making debugging, performance tuning, and cost attribution straightforward.
See every token. -
Task-Level Abstractions
Express work as tasks—chat, generation, extraction, tools—and let LLM.API pick the right model and configuration, so you ship features instead of juggling parameters.
Code to tasks, not models. -
High-Throughput Batch APIs
Submit massive job batches through a single endpoint with built-in concurrency control, retries, and progress tracking, maximizing throughput while protecting upstream systems.
Scale jobs, not scripts.
Decision guide
When to Use — When NOT to Use
Use it if...
- You need high-quality semantic embeddings for search, clustering, or retrieval over large corpora.
- You need strong multilingual text understanding across many languages in a single model.
- You need embeddings well-optimized for retrieval-augmented generation with OpenAI chat models.
- Your use case involves semantic deduplication, near-duplicate detection, or content similarity scoring.
- Your use case involves recommendation or ranking systems based on textual semantic similarity.
- You need compact numeric representations of text for downstream ML models or classifiers.
Avoid if...
- You need a model that directly generates text, code, or natural language responses.
- Your workload requires image, audio, or multimodal embeddings rather than pure text embeddings.
- You need on-device or fully offline embeddings without relying on external APIs.
- Your workload requires real-time per-token streaming responses instead of whole-vector outputs.
- You need controllable text generation, reasoning, or tool use rather than representation learning.
- Your workload requires extremely small embeddings tailored for ultra-low-latency edge deployments.
FAQ
Frequently Asked Questions
-
What is Text Embedding 3 Large?
Text Embedding 3 Large is an OpenAI text-only embedding model optimized for high-quality, dense vector representations of longer texts.
-
What is Text Embedding 3 Large best suited for?
It is best for high-accuracy semantic search, retrieval-augmented generation, clustering, recommendations, and other tasks needing rich semantic text similarity.
-
What context length does Text Embedding 3 Large support?
Text Embedding 3 Large supports input sequences up to 8,191 tokens in length.
-
What modalities does Text Embedding 3 Large support?
Text Embedding 3 Large supports text input only and outputs numerical embedding vectors.
-
How does the pricing of Text Embedding 3 Large work on LLM.API?
Pricing is typically per 1,000 input tokens, with LLM.API applying OpenAI’s base rates plus any LLM.API-specific fees or discounts.
-
How fast is Text Embedding 3 Large in terms of latency?
Embedding models are generally low-latency, and Text Embedding 3 Large is suitable for real-time or near–real-time semantic search workloads.
-
How do I call Text Embedding 3 Large through LLM.API?
Use LLM.API’s embeddings endpoint, specify provider "openai" and model "text-embedding-3-large," and pass your text inputs in the request body.
-
How does Text Embedding 3 Large compare to Text Embedding 3 Small?
Text Embedding 3 Large offers higher embedding quality and accuracy, while Text Embedding 3 Small is cheaper and faster but slightly less accurate.
-
Does Text Embedding 3 Large support multilingual text?
Yes, Text Embedding 3 Large supports multiple languages, making it suitable for cross-lingual semantic search and similarity tasks.
-
What are the main limitations of Text Embedding 3 Large?
It cannot generate text or process images, may encode training-data biases, and its performance degrades if inputs exceed the token limit.
