The Model Registry
Available models for Guildhall workflows, optimized for Blackthorn’s hardware (768GB RAM, 80 threads, 2× RTX 2080 8GB).
Current assignments (Quorum)
| Model | Size (Q4) | Inference | Seats | Notes |
|---|---|---|---|---|
| Qwen3-30B-A3B | ~18GB | CPU | 1, 5, 8 | MoE, only 3B active. Fast for its capability. |
| Cogito 14B | ~9GB | CPU | 2, 9 | Hybrid reasoning, deep thinking mode |
| Qwen3.5-9B | ~7GB | GPU 1 | 3 | Fast instinctive responses |
| Qwen3.5-27B | ~17GB | CPU | 4 | User-perspective reasoning, vision-capable |
| Mistral 7B Instruct | ~5GB | GPU 2 | 6 | Punchy persuasive framing |
| Qwen3.5-35B | ~24GB | CPU | 7 | MoE, creative/unpredictable outputs |
| Qwen3-8B | ~5GB | CPU | 11 | Structured competitive analysis |
| DeepSeek R1 14B | ~9GB | CPU | 10 | Systems/incentive reasoning |
| DeepSeek R1 32B | ~20GB | CPU | 12 | Transparent reasoning for mandated dissent |
| Qwen3-32B | ~20GB | CPU | 13 | Synthesis across all outputs |
Total estimated RAM footprint (all models loaded): ~134GB Remaining RAM: ~634GB free
Model selection principles
- Capability match: Use the smallest model that handles the seat’s cognitive task well
- Diversity for tension: Seats that check each other’s work run different models
- Efficiency for agreement: Seats with orthogonal (non-adversarial) perspectives can share models
- GPU for latency: Only seats needing fast responses get GPU allocation
- MoE preference: Mixture-of-experts models (Qwen3-30B-A3B, Qwen3.5-35B) offer better capability-per-active-parameter for CPU inference
Models to evaluate
- Llama 3.1 70B — alternative for Facilitator (Seat 13) if Qwen3-32B underperforms on synthesis
- Gemma 4 — recently released, tool-calling native, worth testing for structured seats
- GLM-5.1 — currently cloud-only via Ollama, local weights not yet available. Monitor for release.
- Qwen3-Coder 30B — potential for any seats that need code analysis or technical evaluation
Lugh model assignments
TBD — Lugh’s pipeline stages have different requirements than Quorum’s seats. The Feynman tutor needs conversational depth. The research/synthesis stage needs factual grounding + RAG. The script generator needs narrative capability. These may share some Quorum models but will likely need their own assignments.