Known Limitations & Mitigations
Honest critique of the Lugh design, with mitigations where they exist and open questions where they don’t.
1. Source discovery is the agent’s job, not the user’s
The user says “Understanding Design Patterns” and nothing else. The agent must:
- Do a breadth-first search to map the topic territory
- Identify authoritative sources (books, sites, papers, documentation)
- Build enough understanding to scope a syllabus
- During episode generation, do depth-first research into each episode’s slice
- Validate and cite sources in a refinement pass
This is a research agent problem, not a RAG-over-curated-docs problem. The quality ceiling is determined by the agent’s ability to find good sources and distinguish authoritative from noise.
Mitigation: The refinement loop after script generation validates claims against sources and adds citations. If the agent can’t find a source for a claim, it flags it rather than presenting it as fact.
Optional override: Users who want to provide their own source materials (PDFs, notes, links) can do so. This is the “NotebookLM it” path — valid, but not the default. The default is agent-driven discovery.
Open question: How does the agent handle topics where source quality is contested? Design patterns have consensus sources. MCAS has contradictory research. Socialism has fundamentally different schools of thought. The agent needs a strategy for surfacing disagreement rather than picking a side.
2. Local LLM quality floor
The reference syllabus was generated by a frontier model. Local models may produce content that is structurally correct but pedagogically flat — missing analogies, editorial judgment, corrective framing.
Mitigation: The landscape is moving fast. Google’s TurboQuant enables 20B+ parameter models on consumer hardware (MacBook Airs). Quality will improve. The pipeline design is model-agnostic — it doesn’t care whether the backend is Ollama, an API call, or a hybrid.
Design decision: Include optional API key configuration. Users who want frontier model quality can plug in an API key. Users who want local-only can use whatever they can run. The pipeline doesn’t change, just the model endpoint.
Open question: Is there a minimum model quality below which the pipeline produces harmful output (confident misinformation)? We won’t know until we test, but the self-check Feynman step should catch the worst cases.
3. Pre-assessment accessibility
“Tell me what you know about X” is a recall task. Many learners — especially neurodivergent learners, developers who learn by doing, or anyone who struggles with verbal articulation — may know things through practice but can’t narrate their knowledge on demand.
Mitigation: The pre-assessment is run by a conversational AI. It has the flexibility to adapt its approach based on the learner. Options include:
- Recognition tasks instead of recall: show code snippets, scenarios, or examples and ask “what’s happening here?” or “how would you approach this?”
- Meta-assessment: Before diving into the topic, ask about learning style, self-identified labels (neurodivergent, visual learner, etc.), special interests that might provide useful analogies, professional background
- Gradual warm-up: Start with broad, low-pressure questions (“have you ever heard of X?”) before asking for explanations
The pre-assessment prompt needs to be designed with accessibility as a first-class concern, not an afterthought. The meta-assessment information could also inform episode generation — choosing analogies, adjusting pacing, matching the learner’s frame of reference.
4. Feynman tutor prompt robustness
The Feynman Tutor Prompt is the core of the assessment system. If it’s too lenient, learners pass without understanding. If it’s too aggressive, it creates anxiety and disengagement. If it’s not specific enough to the topic, it asks generic questions that don’t test real understanding.
Mitigation: The prompt needs extensive examples — not just the protocol, but demonstrations of:
- Accepting a correct explanation gracefully
- Catching a fluent-but-wrong explanation (the hardest case)
- Probing for judgment, not just recall (“when would you NOT use this?“)
- Handling “I don’t know” without making it punitive
- Adapting to the learner’s communication style (terse vs. verbose, technical vs. analogical)
Open question: Can a single Feynman prompt work across all topics, or does each topic need a topic-specific tutor prompt? Probably the base protocol is universal but the examples and edge cases need to be topic-specific — which means the rubric generation step is critical.
5. Remediation framing
Calling it “remediation” has failure energy. Even with learner autonomy baked in, “you didn’t pass, here’s extra content” feels punitive to many learners.
Mitigation: Don’t call it remediation. Frame it as what it actually is:
- “Bonus episode” or “deep dive” — “You had some great questions about wrapper patterns, here’s a deeper look at the Decorator/Proxy distinction”
- The system is transparent about what it’s doing, but the language is invitational, not corrective
Design principle: This is a voluntary learning experience. If a learner wants to skip assessments entirely, that’s their choice. The gate exists for people who want verified understanding. The system should make the value proposition clear (“the quiz helps it stick”) without making it mandatory. It’s their compute, their tokens, their time.
6. Source material is Stage 0, not a prerequisite
Restating for clarity: the default pipeline does NOT require the user to provide sources. The user provides a topic string. The agent does the research.
The pipeline stages are:
- Stage 0a: Breadth-first topic discovery and source identification
- Stage 0b: Curriculum design based on discovered sources + depth selection
- Stage 1: Pre-assessment
- Stage 2: Depth-first research for Episode N (with source validation)
- Stage 3: Script generation (with citations)
- Stage 3b: Accuracy review and citation validation
- Stage 3c: Self-check Feynman (can the script pass its own gate?)
- Stage 3d: Rubric extraction from self-check
- Stage 4: TTS
- Stage 5: Learner listens
- Stage 6: Feynman tutor session (using rubric from 3d)
- Stage 7: The gate
Optional: Users can provide their own source materials, in which case Stage 0a is replaced by ingestion of provided materials.
7. No cross-topic memory (v2)
Each topic is a silo. Completing “Understanding Design Patterns” and starting “Understanding Software Architecture” doesn’t carry forward the knowledge that you already understand Facade.
v2 mitigation: A learner profile that accumulates solid concepts across courses. Future pre-assessments check against it. Not a v1 concern.
Resolved — moved to Design Decisions
How does the agent handle contested or contradictory source material?→ Transparently, like a good teacher.What’s the minimum local model quality for safe output?→ The refinement chain is the mitigation; floor is an implementation discovery.Can one Feynman prompt work across all topics?→ Base protocol is universal, assembled per-episode with dynamic context.How does meta-assessment flow through the pipeline?→ As prompt context at every generation and evaluation stage.