Powered by TheDrummer
Cydonia 24B V4.1
- Text Generation
Cydonia 24B V4.1 is a 24-billion-parameter, open-source text language model by TheDrummer, fine-tuned from Mistral Small 3.2 and optimized for uncensored creative writing with a 131K-token context window. It is notable for combining strong long-context handling with budget-friendly pricing for high-volume use.
About the model
What is Cydonia 24B V4.1?
Cydonia 24B V4.1 is an open‑source, text‑to‑text language model by TheDrummer built on Mistral Small 3.2 24B with a ~131K token context window. It is primarily used for uncensored creative writing, roleplay, and narrative-heavy chat where mood, nuance, and consistent characterization matter over long conversations. It is also applied as a general-purpose assistant model in enterprise and hobbyist settings, offering relatively low per-token costs for large-context workloads. Cydonia 24B V4.1 continues TheDrummer’s Cydonia series, improving on earlier variants such as Cydonia-22B and Cydonia-24B-v2.x in focus, coherence, and writing quality.
Model capabilities
5 Core Capabilities
-
Conversational Chat
Engages in multi-turn dialogue, answering questions, following instructions, and maintaining context within general conversational and assistant tasks.
-
Code and Logs
Reads and writes code or technical text, explaining behavior, debugging issues, and providing structured suggestions within its training scope.
-
Visual Content
Processes image inputs to identify objects and scenes and provide descriptive text responses within its supported visual understanding abilities.
-
Optical Text Reading
Extracts readable text from images or screenshots and converts it into machine-readable form for further processing or analysis.
-
Language Translation
Translates written text between multiple languages, preserving meaning and tone as closely as possible within its training limitations.
Use cases
6 Most Valuable Use Cases
- Long-form Storytelling
- Roleplay Chatbots
- Creative Writing Assistant
- Dialogue Generation
- Cost-Efficient Assistant
- Long-Context Text Processing
Transparent pricing
Cost Comparison
LLM API offers the lowest cost and highest performance for Cydonia 24B–class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | ~120ms | ~65 tps | ~99.99% | ~$0.25 | ~$0.25 | ~256K |
| TheDrummer | Global | ~220ms | ~30 tps | ~99.5% | ~$0.80 | ~$0.80 | ~64K |
| Together AI | US East | ~210ms | ~35 tps | ~99.9% | ~$0.70 | ~$0.70 | ~128K |
| RunPod | US West | ~260ms | ~25 tps | ~99.0% | ~$0.90 | ~$0.90 | ~32K |
| Banana | Global | ~240ms | ~28 tps | ~99.5% | ~$0.85 | ~$0.85 | ~64K |
Performance benchmarks
Technical Specifications
| Metric | Cydonia 24B V4.1 | Llama 3 70B Instruct | Qwen2 72B |
|---|---|---|---|
| Avg Latency | ~220ms | ~260ms | ~280ms |
| Context Window | 128K | 8K | 32K |
| Input Price ($/1M) | $0.40 | $0.60 | $0.45 |
| Output Price ($/1M) | $0.60 | $0.90 | $0.70 |
| Max Output Tokens | 4K | 4K | 4K |
| Throughput | 65 tps | 50 tps | 55 tps |
| Uptime | 99.5% | 99.9% | 99.5% |
30-day usage via LLM API
- 3.4B
- Prompt tokens processed (last 30 days)
- 210M
- Completion tokens generated (last 30 days)
- 6.8M
- API requests served (last 30 days)
- 1.9K
- Unique developers using this model (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Intelligent AI Routing
Dynamically route each request across providers and models based on latency, cost, and quality—without changing your integration.
One endpoint, any model -
Cost-Aware Orchestration
Automatically pick the most cost-effective model for each task and track spend per project, environment, and feature in one place.
Optimize tokens, not code -
Resilient Fallback Flows
Define fallback chains across providers so requests transparently recover from outages, rate limits, and model regressions.
Keep responses flowing -
Full-Stack Observability
Get end-to-end traces, latency and error metrics, and model-level analytics to debug prompts and production traffic in real time.
See every token hop -
Task-Level Abstractions
Describe tasks like chat, tool use, search, or generation once, then plug in any model or provider behind the same interface.
Ship tasks, not glue code -
High-Throughput Batch APIs
Fan out thousands of requests per call with built-in retries, rate management, and structured result aggregation.
Scale from 10 to 10M calls
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a mid-sized 24B model balancing capability with more moderate hardware requirements.
- You need a community-driven open model that can be self-hosted and customized.
- Your use case involves general coding assistance, debugging, and small-to-medium code generation.
- Your use case involves chat-style assistants for customer support or internal knowledge bases.
- You need an experimentation model for fine-tuning or benchmarking against other 20–30B models.
- Your use case involves educational tutoring, explanations, and walkthroughs of technical concepts.
Avoid if...
- You need frontier-level reasoning comparable to the very latest large proprietary flagship models.
- Your workload requires extremely long context handling, such as full-book ingestion or analysis.
- You need highly specialized domain performance in law, medicine, or finance with certifications.
- Your workload requires ultra-low latency inference at massive scale on very limited hardware.
- You need guaranteed first-party support, SLAs, and an enterprise governance or compliance program.
- Your workload requires cutting-edge multimodal capabilities like advanced vision, audio, or video understanding.
FAQ
Frequently Asked Questions
-
What is Cydonia 24B V4.1?
Cydonia 24B V4.1 is a 24-billion-parameter language model by TheDrummer focused on fast, general-purpose code and text generation via LLM.API.
-
What is Cydonia 24B V4.1 best suited for?
It is best for code completion, technical writing, tool-using agents, and structured data generation where latency and cost matter.
-
What is the context window of Cydonia 24B V4.1?
Cydonia 24B V4.1 supports a context window of up to 32,000 tokens per request.
-
What modalities does Cydonia 24B V4.1 support?
Cydonia 24B V4.1 is a text-only model that accepts and outputs UTF-8 text.
-
How is Cydonia 24B V4.1 priced on LLM.API?
Pricing is usage-based per 1,000 tokens, with separate rates for input and output tokens defined in your LLM.API account.
-
How fast is Cydonia 24B V4.1 in production use?
Typical end-to-end latency is in the low hundreds of milliseconds for short prompts, depending on load and request size.
-
How do I call Cydonia 24B V4.1 through LLM.API?
Specify the model name "TheDrummer/cydonia-24b-v4.1" in your LLM.API completion or chat endpoint requests with your API key.
-
How does Cydonia 24B V4.1 compare to similar 20–30B models?
It targets a balance of stronger coding ability and lower latency than many open 20–30B models at similar price points.
-
Does Cydonia 24B V4.1 support function calling or tools via LLM.API?
Yes, you can use LLM.API's tool or function-calling conventions with this model for agent-style workflows.
-
What are the main limitations of Cydonia 24B V4.1?
It may hallucinate facts, lacks real-time knowledge, and is not guaranteed safe for high-stakes decisions without human review.
