Walkthrough Findings

Gaps discovered by dry-running the “Understanding Design Patterns” request through the full pipeline, and the design decisions that resolve them.

1. Depth selection needs a shallow research pass

Problem: The system can’t describe topic-aware depth levels without knowing anything about the topic yet. “Awareness” means something different for design patterns than for MCAS.

Resolution: Add a quick shallow scan before depth selection. The flow becomes:

Topic string → shallow scan (~30 seconds, just enough to understand the territory) → depth selection with topic-aware descriptions → full breadth-first research scoped to the selected depth

The shallow scan isn’t a full research pass. It’s enough to generate meaningful depth descriptions like: “Depth 1 gets you enough to follow your tech lead’s PR comments. Depth 2 gives you the vocabulary to choose patterns deliberately.”

2. The system is online

Problem: Local LLM doesn’t mean offline. The agent needs to find and retrieve external sources, not just generate from training data.

Resolution: The system is online. Local LLM is a preference driven by cost (the refinement chain would be expensive at API rates, and TTS services like ElevenLabs add up). But the agent has internet access for source discovery and retrieval.

Additional consideration: A local knowledge store (Kiwix with Wikipedia, or similar) could provide a baseline reference layer that doesn’t require network calls for every query. Useful for foundational concepts that don’t change. Web search handles current/specialized sources.

3. Syllabus is not negotiable (v1)

Problem: Should the user be able to edit the syllabus?

Resolution: No, not in v1. You don’t negotiate the syllabus with your professor. If you selected Depth 2, the syllabus is what Depth 2 means for this topic. The system is the curriculum designer.

If the syllabus is wrong, the right fix is improving Stage 0, not adding a manual editing step.

v2 consideration: Syllabus editing could be a power-user feature later.

Efficiency note: Using medium-sized models for grunt work (research, initial drafts) and larger models for validation and approval could reduce compute cost while maintaining quality at decision points.

4. Desktop GUI application

Problem: The pipeline needs different interaction modes — chat for assessment, structured display for syllabi, audio playback for episodes.

Resolution: Lugh is a desktop GUI application. Framework candidates: Slint, Tauri, or Electron. The GUI handles:

  • Topic input and depth selection
  • Syllabus display
  • Pre-assessment and Feynman tutor chat interfaces
  • Audio playback
  • Listener question submission
  • Progress tracking across the course

Framework selection is an implementation decision.

5. Shared knowledge store

Problem: Local models have limited context windows. Source material, pre-assessment results, learner profile, episode history — it all needs to be accessible without stuffing everything into every prompt.

Resolution: A shared knowledge store using Redis, a vector DB, or similar. This serves as:

  • Source material storage and retrieval (chunked and embedded for relevant lookup)
  • Learner profile persistence (meta-assessment results, solid/shaky/blank maps)
  • Episode state (which episodes are generated, completed, in progress)
  • Cross-stage context (the pre-assessment results need to be available at script generation time without being fully in the prompt)

The knowledge store is what lets each stage be a single prompt — it pulls the relevant context from the store rather than needing the entire history in the window.

6. Dual narrator format

Problem: What’s the voice and format of the episodes?

Resolution: Two voices:

  • Host: Advocates for the learner. Asks the “stupid” questions. Represents the learner’s perspective. “Wait, so you’re saying Adapter and Decorator are basically the same thing structurally?”
  • Expert: Conveys the information. Answers the questions. Provides the corrective framing. “Structurally yes, but the intent is completely different — let me explain why that matters.”

This format has several advantages:

  • The Host gives the learner permission to not know things
  • The Expert can correct misconceptions without it feeling like a lecture
  • The back-and-forth is more engaging than a monologue
  • The Host’s questions can be seeded from Listener Questions
  • Call-in segments integrate naturally: “We got a question from a listener…” becomes a third voice in the conversation

Script implications: Every script is a dialogue, not a monologue. The episode format template needs to define both voices and their roles. TTS needs two distinct voice profiles.

7. Generation time is not instant

Problem: Multiple LLM calls per episode (research, script, accuracy review, self-check, rubric extraction) plus TTS rendering means episode generation takes real time.

Resolution: Set expectations in the UI. Progress indicators per stage. Possibly batch generation (generate overnight, episodes ready in the morning). The user experience should be: request episode → go do something else → get notified when it’s ready.

8. Scripts are TTS-marked-up

Problem: Raw text doesn’t produce good audio. “GoF” gets pronounced wrong, emphasis gets lost, pauses are missing.

Resolution: The script generation prompt produces TTS-ready output with markup for:

  • Pronunciation guides (GoF → “Gang of Four”, SOLID → “S-O-L-I-D”)
  • Emphasis markers
  • Pause indicators (between sections, before key points)
  • Speaker tags (Host vs Expert)
  • Emotional tone hints (curious, corrective, enthusiastic)

The script format is designed for TTS consumption, not human reading. This is baked into the Stage 3 prompt and the episode format template, not a separate conversion step.

9. App persists state

Problem: Tutor sessions, episode progress, and learner data need to survive across sessions.

Resolution: The desktop app persists all state. The learner can:

  • Close mid-tutor-session and resume later
  • See their progress across the course
  • Review past tutor session results
  • Access completed audio episodes for re-listening

State lives in the shared knowledge store (see #5). The app is the interface; the knowledge store is the memory.

Updated pipeline flow

With these decisions, the refined pipeline is:

Topic string
  → Shallow scan (map territory)
  → Depth selection (topic-aware descriptions)
  → Breadth-first research (scoped to depth)
  → Curriculum design (syllabus generation)
  → Pre-assessment (meta + diagnostic, chat-based)
  → [Per episode loop:]
      → Depth-first research (from knowledge store + web)
      → Script generation (dual narrator, TTS-marked-up)
      → Accuracy review
      → Self-check Feynman
      → Rubric extraction
      → TTS rendering (two voices)
      → Learner listens
      → Feynman tutor session (chat-based, persistent)
      → Gate decision (advance / deep dive / complete)