Reka Edge

Instruction Following

Reka Edge is a 7B-parameter multimodal vision-language model from RekaAI that processes text, image, and video inputs to generate text outputs, optimized for fast, efficient edge and real-time applications.

Start Using API

API Performance

Latency: 0.87s p50 total latency (OpenRouter, best provider)
Context: 16K token context
Input: $0.10 per 1M tokens
Output: $0.10 per 1M tokens
Uptime: 99% 99%

About the model

What is Reka Edge?

Reka Edge is a 7B multimodal vision-language model that accepts text, image, and video inputs and produces text outputs with strong visual reasoning performance in its size class. It is mainly used for tasks such as image and video understanding, object detection, OCR-style layout-aware reading, and other visual question answering or analysis workloads. It is also suited to low-latency, real-time scenarios like robotics, automotive systems, and augmented or mixed reality on edge hardware. Reka Edge belongs to the Reka multimodal family introduced alongside Reka Core and Reka Flash.

Input / Output

Input

Text prompts
Images (e.g. PNG, JPEG)
Video frames or clips

Output

Structured or free-form text responses

Model capabilities

5 Core Capabilities

Multimodal Chat

Engages in instruction-following and conversational tasks, including reasoning, knowledge queries, coding, and creative writing, optimized for efficiency.
Image Understanding

Processes images to describe scenes, identify objects, and answer visual questions as part of its multimodal vision-language capabilities.
Video Analysis

Accepts video inputs for efficient vision-language reasoning, enabling event understanding and object-centric analysis over sequences of frames.
Optical Character Recognition

Reads and interprets textual content embedded in images or video frames, enabling text extraction for downstream reasoning tasks.
Multilingual Translation

Supports translation between English and multiple other languages as part of its general-purpose multimodal language modeling abilities.

Use cases

6 Most Valuable Use Cases

On-device Text Chatbot
Mobile Voice Assistant
Edge Content Summarization
Multilingual Text Translation
Low-latency App Copilot
Embedded Systems Reasoning

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and latency for Reka Edge-compatible workloads.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	~140ms	~110 tps	99.99%	~$0.05 per 1M tokens	~$0.10 per 1M tokens	~256K tokens
Rekaai	Global	~220ms	~80 tps	99.9%	~$0.20 per 1M tokens	~$0.60 per 1M tokens	~128K tokens
AWS Bedrock (Reka Edge-equivalent)	US East	~260ms	~60 tps	99.9%	~$0.30 per 1M tokens	~$0.90 per 1M tokens	~128K tokens
Azure AI Studio (Reka Edge-equivalent)	EU West	~280ms	~55 tps	99.9%	~$0.32 per 1M tokens	~$0.95 per 1M tokens	~128K tokens
Google Cloud Vertex AI (Reka Edge-equivalent)	Global	~250ms	~65 tps	99.9%	~$0.28 per 1M tokens	~$0.85 per 1M tokens	~128K tokens

Performance benchmarks

Technical Specifications

Metric	Reka Edge	GPT-4o mini	Claude 3.5 Haiku
Avg Latency	~180ms	~250ms	~220ms
Context Window	128K	128K	200K
Input Price ($/1M)	$0.20	$0.15	$0.25
Output Price ($/1M)	$0.60	$0.60	$0.80
Max Output Tokens	4K	4K	4K
Throughput	50 tps	40 tps	35 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

3.8B: Prompt tokens processed (last 30 days)
11.5M: Completion tokens generated (last 30 days)
2.4M: API requests served (last 30 days)
99.8%: Average API uptime (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the optimal model across providers based on latency, cost, and capabilities—without changing your integration or redeploying code.
One endpoint, every model.
Cost-Aware Orchestration

Set budget and quality constraints, then let LLM.API choose the cheapest model that still meets performance requirements, with clear per-call cost visibility and controls.
Max performance, minimal spend.
Resilient Fallback Flows

Configure automatic failover to alternative models or providers on errors, rate limits, or timeouts so your AI features stay online even when vendors don’t.
No more brittle integrations.
Full-Stack Observability

Get unified logs, traces, metrics, and payload inspection for every provider and model in one place, making debugging, tuning, and compliance reviews straightforward.
See every token, everywhere.
Task-Level Abstractions

Define tasks like chat, retrieval, or extraction once, then swap models or chains behind the scenes without touching application logic or reworking prompts.
Code to tasks, not models.
High-Throughput Batch Jobs

Submit massive batches of LLM calls through a single API with provider-aware concurrency, retries, and cost controls built in for large-scale workloads.
Ship at batch scale.

Decision guide

When to Use — When NOT to Use

Use it if...

You need an on-device or edge-deployable LLM with low-latency inference.
You need to keep data on-prem or on-device for privacy and compliance.
Your use case involves mobile, IoT, or embedded scenarios with intermittent connectivity.
You need a compact model suitable for cost-efficient, high-request-volume applications.
Your use case involves lightweight chatbots or assistants embedded into existing applications.
You need faster response times than large cloud-hosted models for interactive interfaces.

Avoid if...

You need state-of-the-art frontier performance on complex reasoning or coding benchmarks.
Your workload requires very long-context processing, such as full-book or codebase analysis.
You need top-tier multimodal capabilities like advanced image, audio, and video understanding.
Your workload requires highly specialized domain expertise, such as cutting-edge scientific research.
You need maximum accuracy for safety-critical tasks like medical, legal, or financial decisions.
Your workload requires heavy fine-tuning infrastructure and ecosystem comparable to big-cloud providers.

FAQ

Frequently Asked Questions

What is Reka Edge?

Reka Edge is a compact multimodal model by Rekaai designed for fast, low-cost inference on LLM.API.
What types of tasks is Reka Edge best suited for?

Reka Edge is best for lightweight chatbots, classification, routing, and low-latency assistants where cost and speed are critical.
What is the context window of Reka Edge?

Reka Edge supports a context window of up to 8,192 tokens on LLM.API.
How fast is Reka Edge in terms of latency?

Reka Edge is optimized for low latency, typically returning initial tokens within a few hundred milliseconds depending on request size and load.
What modalities does Reka Edge support?

Reka Edge currently supports text input and output; image or audio inputs are not available via LLM.API for this model.
How is Reka Edge priced on LLM.API?

Reka Edge uses LLM.API’s unified per-token pricing, billed separately for input and output tokens according to your LLM.API plan.
How do I call Reka Edge through the LLM.API?

You select the model name "Reka Edge" in the LLM.API completion or chat endpoint and authenticate with your standard LLM.API key.
How does Reka Edge compare to larger Rekaai models?

Reka Edge is cheaper and faster but generally less capable on complex reasoning and long-context tasks than larger Rekaai models.
Does Reka Edge have any notable limitations?

Reka Edge may struggle with highly technical reasoning, very long multi-step instructions, and tasks requiring detailed domain expertise.
Can I fine-tune or customize Reka Edge via LLM.API?

Direct fine-tuning is not exposed; you customize behavior using system prompts, few-shot examples, and application-level logic.

Start in 2 lines of code

Get My API Key

Reka Edge

What is Reka Edge?

5 Core Capabilities

Multimodal Chat

Image Understanding

Video Analysis

Optical Character Recognition

Multilingual Translation

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Resilient Fallback Flows

Full-Stack Observability

Task-Level Abstractions

High-Throughput Batch Jobs

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code