Every empirical claim in the pitch and pedagogy docs traces to one of the artifacts below. PDFs link directly; Notes links open our internal summary of the source; Source ↗ links go to the original publisher.
Coverage is honest: the foundational and grey-literature sections are well-stocked; the adult-specific empirical literature is thin (see the Adult CT and digital-skills transfer synthesis for a direct discussion of why and what it means for the project).
Foundational sources
The conceptual anchors for the pedagogy doc — transfer of learning, contrasting cases, computational thinking, the upstream RCT on adaptive AI tutoring.
Schwartz & Bransford (1998), A Time for Telling
Three classroom studies showing that analyzing contrasting cases before a lecture produces significantly better far-transfer prediction than other instructional sequences. The empirical anchor for our 'pattern naming after attempt' co-pilot rule.
Salomon & Perkins (1989), Rocky Roads to Transfer
Defines the low-road / high-road transfer distinction. Our cross-domain task families work via low road; explicit pattern naming + metacognitive debrief work via high road. Cites Pea & Kurland's LOGO transfer-failure finding directly.
Wing (2006), Computational Thinking
The original CT manifesto. Source of 'fundamental, not rote skill' — the framing our pitch borrows for the schemas-vs-procedures position. Silent on the transfer problem; the subsequent 20 years of CT research is largely a sustained engagement with that gap.
How People Learn II (2018), Ch 5: Knowledge & Reasoning
National Academies synthesis on the five evidence-supported strategies (retrieval practice, spacing, interleaving, self-explanation, transformation) that produce durable, flexible knowledge. Our five design moves map directly onto these.
Bastani et al. (2026), Effective Personalized AI Tutors via LLM-Guided RL
Pre-registered RCT, 770 students across 10 Taipei schools, 5 months. +0.15 SD on unassisted final exam (≈6–9 months of schooling-equivalent). Effects concentrated in lower-tier schools and prior novices. Engagement-mediated. The strongest single piece of upstream evidence in our pitch.
Grey literature
Practitioner and policy reports — not peer-reviewed but field-grounded.
Hecker & Loprest (Urban Institute, 2019), Foundational Digital Skills for Career Progress
Provider interviews + literature synthesis. The smartphone-to-office transfer-failure quote ('fluid use of a smartphone does not always translate to broader digital skills') is the sharpest single design constraint we have. Closes by naming the exact gap our project addresses.
Adult-transfer literature
Studies on AI-integrated learning interventions in adult / undergraduate populations. Mostly tangential to our specific population (low-fluency adults), but useful for triangulating effect sizes and design patterns.
Saritepeci & Durak (2024), AI integration in design-based learning
Quasi-experimental n=87+99 undergraduates; ChatGPT + Midjourney in digital storytelling. Significant gains on creative/reflective self-efficacy; null on design-thinking mindset. Useful as evidence that AI-integrated designs produce mixed results across outcome dimensions — argues for measurement precision.
Synthesis summaries
Our own research syntheses, drawing on multiple sources. Each compiles findings on a specific question that shaped the pitch, pedagogy, or technical-approach docs.
Adult CT and digital-skills transfer
Maps what's empirically known about transfer in adult/workforce populations. Headline: the literature is genuinely thin. Zero RCTs on adult CT transfer. K-12 meta-analyses (Ye 2022 n=55; CT-STEM 2024 n=37 studies/7,832 students) explicitly note adults under-studied. Frames the contribution opportunity for our project.
Metacognitive prompting in LLM tutors — short synthesis
Headline finding: literature is sparse and partly null. Zengilowski 2025 preregistered RCT (n=1,005) found null effect of metacognitive reflection prompts. Strongest upstream evidence is pre-LLM ITS work (Aleven & Koedinger 2002, VanLehn 2011 meta-analysis: d ≈ 0.33–0.55 on transfer for self-explanation).
Metacognitive prompting in LLM tutors — full literature review
Longer agent-produced literature review on the same question. Same headline conclusions, more thorough citation work. Includes LearnLM UK RCT 2025 (+5.5pp on transfer via Socratic dialogue) and Kestin et al. Harvard physics 2025 (d=0.73–1.3 immediate, no transfer measured).
Instrumented environment vs. vision-based AI overlay
Synthesis behind the simulated-environment architectural choice in technical-approach.md §1. Vision-agent latency (2–7s/step) vs instrumented sandbox (<10ms). Telemetry signals invisible to vision. The benchmark-vs-production reliability gap (OSWorld 79.6% / Online-Mind2Web 30%).
UI-pattern transfer in simulated environments
Validates the claim that UI standardization + abstract workflow understanding enable transfer from sim to real. Functional-fidelity > physical-fidelity (aviation/medical simulator literature). The mental-models caveat: surface UI familiarity isn't enough when underlying conceptual structure differs.
Browser-based simulated desktop ecosystem
Build-on-top library landscape for the simulated environment. ~50–60% from MIT-licensed open-source: daedalOS (window manager), TipTap (editor), ZenFS (filesystem), React Hook Form (forms). Custom: email mock, tutor integration, pedagogical telemetry. ~6–9 months for a small team.