Powered by OpenAI

Text Embedding 3 Small

  • Text Embeddings

Text Embedding 3 Small is an OpenAI embedding model optimized for low-latency, low-cost vector representations of text. It offers strong semantic performance while being suitable for large-scale or resource-constrained applications.

Start Using API

What is Text Embedding 3 Small?

Text Embedding 3 Small is an OpenAI model that converts text into dense vector embeddings for efficient semantic comparison and retrieval. It is primarily used for tasks like semantic search, information retrieval, and clustering where many documents or queries must be embedded cost-effectively. It is also commonly applied in recommendation systems, topic modeling, and classification workflows that rely on vector similarity. It belongs to OpenAI’s third-generation text embedding family, following earlier models such as the text-embedding-ada-002 series.

5 Core Capabilities

  • Text Embedding

    Generates dense numerical vector representations of text optimized for semantic tasks like search, clustering, and retrieval.

  • Semantic Similarity

    Enables comparison of texts by measuring vector similarity, supporting relevance ranking, deduplication, and near-duplicate detection.

  • Document Clustering

    Supports grouping of related documents or sentences based on embedding proximity, useful for topic discovery and organization.

  • Multilingual Support

    Handles multiple languages for embeddings, enabling cross-lingual similarity search and retrieval in multilingual datasets.

  • Classification Features

    Provides embedding vectors that can be used as input features for downstream classifiers and other machine learning models.

6 Most Valuable Use Cases

  • Semantic Text Search
  • Document Clustering
  • Recommendation Matching
  • Duplicate Content Detection
  • Cross-Lingual Retrieval
  • Efficient Vector Indexing

Cost Comparison

LLM API offers the lowest-cost, highest-capacity Text Embedding 3 Small–class embeddings.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 80ms 120k tps 99.99% $0.010 $0.00 200K tokens
OpenAI Global 120ms 80k tps 99.9% $0.020 $0.00 128K tokens
Azure OpenAI US East, EU West ~140ms ~60k tps 99.9% ~$0.022 $0.00 ~128K tokens
Anthropic US, EU ~150ms ~50k tps 99.9% ~$0.025 $0.00 200K tokens
Google Cloud Global 210ms 55k tps 99.9% ~$0.023 $0.00 200K tokens

Technical Specifications

Metric Text Embedding 3 Small (OpenAI) text-embedding-3-large (OpenAI) text-embedding-ada-002 (OpenAI)
Dimensions 1536 3072 1536
Max Input Tokens 8K 8K 8K
Price per 1M Tokens $0.02 $0.13 $0.10
Avg Latency ~120ms ~180ms ~200ms
Throughput ~1,200 tps ~900 tps ~800 tps
Uptime 99.9% 99.9% 99.9%

30-day usage via LLM API

11.4B
Prompt tokens processed (30 days)
7.8M
Embedding API requests (30 days)
620K
Active developer accounts (30 days)
99.98%
Average API uptime (30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Automatically route each request to the optimal model across providers based on latency, cost, and quality—no client changes, just smarter traffic control.

    One endpoint, every model
  • Cost-Aware Orchestration

    Control spend with per-route pricing rules, dynamic model downgrades, and usage caps so you can experiment freely without surprise bills.

    Optimize every token
  • Resilient Fallback Flows

    Define automatic fallbacks when a provider fails or times out, ensuring mission-critical flows stay online even during upstream incidents.

    Don’t ship single points
  • Deep Observability

    Trace every request end-to-end with logs, metrics, and latency breakdowns across providers so you can debug production issues in minutes, not days.

    See every token hop
  • Task-Level Abstractions

    Declare tasks like chat, tools, or RAG once and let LLM.API translate them into provider-specific calls, keeping your app logic clean and portable.

    Code to tasks, not vendors
  • High-Throughput Batch

    Submit large batches of prompts with automatic chunking, retries, and concurrency control to process millions of tokens efficiently and predictably.

    Scale jobs, not scripts

When to Use — When NOT to Use

Use it if...

  • You need low-cost, general-purpose text embeddings for large-scale production applications.
  • You need strong performance on semantic search, retrieval, and reranking across many domains.
  • Your use case involves clustering or deduplicating large text corpora efficiently and cheaply.
  • Your use case involves building recommendation systems based on textual or metadata similarity.
  • You need compact embeddings that balance quality and speed for interactive applications.
  • Your use case involves multilingual semantic search where most text is in high-resource languages.

Avoid if...

  • You need cross-modal embeddings that jointly represent text and images in one vector space.
  • Your workload requires domain-specialized embeddings, heavily tuned to a narrow technical field.
  • You need sentence embeddings explicitly aligned with another vendor’s proprietary embedding space.
  • Your workload requires extremely long-context representations beyond the model’s documented token limits.
  • You need embeddings optimized for code understanding rather than natural language text.
  • Your workload requires strict on-prem deployment without any connection to OpenAI’s API.

Frequently Asked Questions

  • What is Text Embedding 3 Small?

    Text Embedding 3 Small is an OpenAI model that generates vector representations of text, optimized for low cost and strong retrieval performance.

  • What is Text Embedding 3 Small best suited for?

    Text Embedding 3 Small is best for semantic search, retrieval-augmented generation, clustering, recommendations, and other tasks requiring dense text similarity comparisons.

  • What is the context window of Text Embedding 3 Small?

    Text Embedding 3 Small supports input texts up to 8,192 tokens in length.

  • How fast is Text Embedding 3 Small when called through LLM.API?

    Text Embedding 3 Small is designed for high-throughput, low-latency embedding generation; actual latency depends on your request size and LLM.API region.

  • Which input modalities does Text Embedding 3 Small support?

    Text Embedding 3 Small supports text-only input and outputs numeric embedding vectors; it does not accept images, audio, or other modalities.

  • How do I access Text Embedding 3 Small via LLM.API?

    Call the LLM.API embeddings endpoint, set the provider to OpenAI, and specify the model name "text-embedding-3-small" in your request.

  • How does Text Embedding 3 Small compare to Text Embedding 3 Large?

    Text Embedding 3 Small is cheaper and slightly lower quality than Text Embedding 3 Large, making it preferable for large-scale or latency-sensitive workloads.

  • What are the limitations of Text Embedding 3 Small?

    Text Embedding 3 Small cannot generate natural language, code, or images and may underperform on highly specialized or domain-specific semantic tasks.

  • How is pricing for Text Embedding 3 Small handled on LLM.API?

    Pricing for Text Embedding 3 Small on LLM.API is typically usage-based per token or character and may differ slightly from OpenAI direct pricing.

  • Can I use Text Embedding 3 Small for multilingual text?

    Text Embedding 3 Small supports multiple languages, but performance may vary by language and is generally strongest for English content.

Start in 2 lines of code

Get My API Key