Powered by Perceptron
Perceptron Mk1
- Text Classification
Perceptron Mk1 is Perceptron's highest‑quality proprietary vision‑language model focused on video understanding and embodied visual reasoning. It is notable for combining multimodal inputs (text, images, video) with long‑context reasoning and structured visual outputs for production use.
About the model
What is Perceptron Mk1?
Perceptron Mk1 is a closed-source vision-language model from Perceptron designed for image and video understanding, OCR, object detection, document parsing, and embodied visual reasoning. It is mainly used for tasks like video question answering, summarization, event and temporal segment detection, and detailed scene analysis across robotics, autonomous systems, surveillance, and AR applications. It also supports structured outputs such as spatial annotations (points, boxes, polygons) and temporal clips for production APIs that need precise localization and grounding in complex visual data. Perceptron Mk1 is the first and highest-quality model in the proprietary Perceptron Mk family of multimodal vision and reasoning models.
Model capabilities
5 Core Capabilities
-
Conversational Chat
Engages in multi-turn text conversations, following instructions, answering questions, and adapting responses to user context and intent.
-
Image Analysis
Interprets visual content in images, identifying objects, scenes, relationships, and other salient details to support downstream reasoning tasks.
-
Text Translation
Translates written text between multiple languages while preserving meaning, tone, and basic formatting across diverse domains and styles.
-
Screen Content Handling
Processes user interface or webpage-like content, enabling reasoning about layouts, elements, and on-screen information for digital tasks.
-
Optical Character Recognition
Extracts machine-readable text from images, screenshots, or scanned documents to enable search, editing, or further automated processing.
Use cases
6 Most Valuable Use Cases
- Video Question Answering
- Video Event Detection
- Image Object Detection
- Document OCR Parsing
- Robotics Data Curation
- Embodied Spatial Reasoning
Transparent pricing
Cost Comparison
LLM API offers the lowest cost and best performance for Perceptron‑class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 120ms | 120 tps | 99.99% | $0.20 | $0.20 | 64K tokens |
| Perceptron | US East | ~220ms | ~60 tps | ~99.9% | ~$0.45 | ~$0.45 | ~32K tokens |
| OpenAI | Global | ~250ms | ~80 tps | ~99.9% | ~$0.50 | ~$0.50 | ~32K tokens |
| Anthropic | US West | ~260ms | ~70 tps | ~99.9% | ~$0.55 | ~$0.55 | ~200K tokens |
| Azure AI | EU West | ~240ms | ~75 tps | ~99.95% | ~$0.52 | ~$0.52 | ~32K tokens |
Performance benchmarks
Technical Specifications
| Metric | Perceptron Mk1 | OpenAI GPT-4.1 Mini | Anthropic Claude 3 Haiku |
|---|---|---|---|
| Avg Latency | ~180ms | ~200ms | ~220ms |
| Context Window | 128K | 128K | 200K |
| Input Price ($/1M) | $0.15 | $0.15 | $0.25 |
| Output Price ($/1M) | $0.60 | $0.60 | $0.80 |
| Max Output Tokens | 8K | 4K | 8K |
| Throughput | ~120 tps | ~100 tps | ~90 tps |
| Uptime | 99.9% | 99.9% | 99.9% |
30-day usage via LLM API
- 620M
- Prompt tokens processed (last 30 days)
- 4.8M
- API requests served (last 30 days)
- 710M
- Completion tokens generated (last 30 days)
- 99.6%
- Avg uptime over API (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Intelligent Model Routing
Dynamically route each request to the optimal model across providers using policies, constraints, and real-time signals—no client rewrites or vendor-specific logic needed.
One endpoint, any model -
Cost-Aware Orchestration
Control spend with price-based routing, per-project limits, and smart downgrades—keeping quality high while preventing surprise bills in production workloads.
Optimize quality per dollar -
Resilient Fallback Flows
Define graceful failover chains across models and providers so timeouts, rate limits, or outages degrade smoothly instead of breaking user experiences.
Never ship single points of failure -
End-to-End Observability
Trace every request across providers with unified logs, metrics, and latency breakdowns to debug issues fast and tune prompts and routing policies confidently.
See every token, everywhere -
Task-Level Abstractions
Describe tasks like chat, embeddings, tools, or reranking once and let LLM.API translate them into each provider’s API, schema, and capabilities.
Standard tasks, many backends -
High-Throughput Batch APIs
Ship large workloads as batches with concurrency control, retries, and partial-failure handling built in—cutting overhead and maximizing throughput across providers.
Batch at scale, safely
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a general-purpose model from Perceptron for prototyping typical AI features.
- You need basic question-answering or summarization without demanding complex multi-step reasoning.
- Your use case involves moderate-length chat interactions with straightforward instructions and responses.
- You need an additional non-OpenAI model to diversify provider or vendor dependencies.
- Your use case involves experimentation or benchmarking across multiple mid-range foundation models.
- You need a Perceptron-native model to integrate with existing Perceptron tooling or dashboards.
Avoid if...
- You need state-of-the-art reasoning, planning, or tool use comparable to leading flagship models.
- Your workload requires very long context windows handling large codebases or multi-document reasoning.
- You need specialized capabilities like high-fidelity image generation or advanced multimodal understanding.
- Your workload requires strict, well-documented compliance certifications or audited industry-specific guarantees.
- You need a highly mature ecosystem with extensive third-party integrations, plugins, and community tools.
- Your workload requires detailed, battle-tested performance benchmarks across many public evaluation suites.
FAQ
Frequently Asked Questions
-
What is Perceptron Mk1?
Perceptron Mk1 is a large language model from Perceptron focused on fast, low-cost text generation for general software and product development use cases.
-
What is Perceptron Mk1 best suited for?
Perceptron Mk1 is best for code generation, refactoring, technical writing, and structured data transformations where low latency and high throughput matter.
-
How is Perceptron Mk1 priced on LLM.API?
Perceptron Mk1 uses LLM.API’s unified per-token billing; check your LLM.API pricing dashboard for current input and output token rates.
-
What is the context window of Perceptron Mk1?
Perceptron Mk1 supports a context window defined by the LLM.API integration; refer to the model card for the latest maximum token limit.
-
How fast is Perceptron Mk1 in terms of latency?
Perceptron Mk1 is optimized for low latency, typically returning first tokens quickly enough for interactive applications when used via LLM.API.
-
What modalities does Perceptron Mk1 support?
Perceptron Mk1 is a text-only model, accepting text prompts and returning text completions via the LLM.API interface.
-
How do I call Perceptron Mk1 through LLM.API?
Specify the Perceptron Mk1 model name in your LLM.API completion or chat endpoint request, using the same authentication and parameters as other models.
-
How does Perceptron Mk1 compare to similar models on LLM.API?
Perceptron Mk1 typically trades some reasoning depth for higher throughput and lower cost compared with larger, more capable general-purpose models.
-
What are the main limitations of Perceptron Mk1?
Perceptron Mk1 can struggle with complex multi-step reasoning, domain-expert answers, and may occasionally produce incorrect or hallucinated information.
-
Can I use Perceptron Mk1 for streaming responses?
Yes, when enabled in LLM.API, Perceptron Mk1 supports streaming token responses to improve perceived latency for end users.
