Powered by TheDrummer

Cydonia 24B V4.1

  • Text Generation

Cydonia 24B V4.1 is a 24-billion-parameter, open-source text language model by TheDrummer, fine-tuned from Mistral Small 3.2 and optimized for uncensored creative writing with a 131K-token context window. It is notable for combining strong long-context handling with budget-friendly pricing for high-volume use.

Start Using API

What is Cydonia 24B V4.1?

Cydonia 24B V4.1 is an open‑source, text‑to‑text language model by TheDrummer built on Mistral Small 3.2 24B with a ~131K token context window. It is primarily used for uncensored creative writing, roleplay, and narrative-heavy chat where mood, nuance, and consistent characterization matter over long conversations. It is also applied as a general-purpose assistant model in enterprise and hobbyist settings, offering relatively low per-token costs for large-context workloads. Cydonia 24B V4.1 continues TheDrummer’s Cydonia series, improving on earlier variants such as Cydonia-22B and Cydonia-24B-v2.x in focus, coherence, and writing quality.

5 Core Capabilities

  • Conversational Chat

    Engages in multi-turn dialogue, answering questions, following instructions, and maintaining context within general conversational and assistant tasks.

  • Code and Logs

    Reads and writes code or technical text, explaining behavior, debugging issues, and providing structured suggestions within its training scope.

  • Visual Content

    Processes image inputs to identify objects and scenes and provide descriptive text responses within its supported visual understanding abilities.

  • Optical Text Reading

    Extracts readable text from images or screenshots and converts it into machine-readable form for further processing or analysis.

  • Language Translation

    Translates written text between multiple languages, preserving meaning and tone as closely as possible within its training limitations.

6 Most Valuable Use Cases

  • Long-form Storytelling
  • Roleplay Chatbots
  • Creative Writing Assistant
  • Dialogue Generation
  • Cost-Efficient Assistant
  • Long-Context Text Processing

Cost Comparison

LLM API offers the lowest cost and highest performance for Cydonia 24B–class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global ~120ms ~65 tps ~99.99% ~$0.25 ~$0.25 ~256K
TheDrummer Global ~220ms ~30 tps ~99.5% ~$0.80 ~$0.80 ~64K
Together AI US East ~210ms ~35 tps ~99.9% ~$0.70 ~$0.70 ~128K
RunPod US West ~260ms ~25 tps ~99.0% ~$0.90 ~$0.90 ~32K
Banana Global ~240ms ~28 tps ~99.5% ~$0.85 ~$0.85 ~64K

Technical Specifications

Metric Cydonia 24B V4.1 Llama 3 70B Instruct Qwen2 72B
Avg Latency ~220ms ~260ms ~280ms
Context Window 128K 8K 32K
Input Price ($/1M) $0.40 $0.60 $0.45
Output Price ($/1M) $0.60 $0.90 $0.70
Max Output Tokens 4K 4K 4K
Throughput 65 tps 50 tps 55 tps
Uptime 99.5% 99.9% 99.5%

30-day usage via LLM API

3.4B
Prompt tokens processed (last 30 days)
210M
Completion tokens generated (last 30 days)
6.8M
API requests served (last 30 days)
1.9K
Unique developers using this model (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Intelligent AI Routing

    Dynamically route each request across providers and models based on latency, cost, and quality—without changing your integration.

    One endpoint, any model
  • Cost-Aware Orchestration

    Automatically pick the most cost-effective model for each task and track spend per project, environment, and feature in one place.

    Optimize tokens, not code
  • Resilient Fallback Flows

    Define fallback chains across providers so requests transparently recover from outages, rate limits, and model regressions.

    Keep responses flowing
  • Full-Stack Observability

    Get end-to-end traces, latency and error metrics, and model-level analytics to debug prompts and production traffic in real time.

    See every token hop
  • Task-Level Abstractions

    Describe tasks like chat, tool use, search, or generation once, then plug in any model or provider behind the same interface.

    Ship tasks, not glue code
  • High-Throughput Batch APIs

    Fan out thousands of requests per call with built-in retries, rate management, and structured result aggregation.

    Scale from 10 to 10M calls

When to Use — When NOT to Use

Use it if...

  • You need a mid-sized 24B model balancing capability with more moderate hardware requirements.
  • You need a community-driven open model that can be self-hosted and customized.
  • Your use case involves general coding assistance, debugging, and small-to-medium code generation.
  • Your use case involves chat-style assistants for customer support or internal knowledge bases.
  • You need an experimentation model for fine-tuning or benchmarking against other 20–30B models.
  • Your use case involves educational tutoring, explanations, and walkthroughs of technical concepts.

Avoid if...

  • You need frontier-level reasoning comparable to the very latest large proprietary flagship models.
  • Your workload requires extremely long context handling, such as full-book ingestion or analysis.
  • You need highly specialized domain performance in law, medicine, or finance with certifications.
  • Your workload requires ultra-low latency inference at massive scale on very limited hardware.
  • You need guaranteed first-party support, SLAs, and an enterprise governance or compliance program.
  • Your workload requires cutting-edge multimodal capabilities like advanced vision, audio, or video understanding.

Frequently Asked Questions

  • What is Cydonia 24B V4.1?

    Cydonia 24B V4.1 is a 24-billion-parameter language model by TheDrummer focused on fast, general-purpose code and text generation via LLM.API.

  • What is Cydonia 24B V4.1 best suited for?

    It is best for code completion, technical writing, tool-using agents, and structured data generation where latency and cost matter.

  • What is the context window of Cydonia 24B V4.1?

    Cydonia 24B V4.1 supports a context window of up to 32,000 tokens per request.

  • What modalities does Cydonia 24B V4.1 support?

    Cydonia 24B V4.1 is a text-only model that accepts and outputs UTF-8 text.

  • How is Cydonia 24B V4.1 priced on LLM.API?

    Pricing is usage-based per 1,000 tokens, with separate rates for input and output tokens defined in your LLM.API account.

  • How fast is Cydonia 24B V4.1 in production use?

    Typical end-to-end latency is in the low hundreds of milliseconds for short prompts, depending on load and request size.

  • How do I call Cydonia 24B V4.1 through LLM.API?

    Specify the model name "TheDrummer/cydonia-24b-v4.1" in your LLM.API completion or chat endpoint requests with your API key.

  • How does Cydonia 24B V4.1 compare to similar 20–30B models?

    It targets a balance of stronger coding ability and lower latency than many open 20–30B models at similar price points.

  • Does Cydonia 24B V4.1 support function calling or tools via LLM.API?

    Yes, you can use LLM.API's tool or function-calling conventions with this model for agent-style workflows.

  • What are the main limitations of Cydonia 24B V4.1?

    It may hallucinate facts, lacks real-time knowledge, and is not guaranteed safe for high-stakes decisions without human review.

Start in 2 lines of code

Get My API Key