Digital Fluency Platform

Objective

Enable low-to-mid digital users to:

Complete real-world digital tasks
Operate across applications
Build transferable schemas (not procedural memorizations) for digital problem-solving
Use AI as a directable tool, not a passive answer source

Target: move users from PIAAC Level 1 (single-step tasks in a generic interface) to Level 2+ (multi-step problem-solving with inferential reasoning), and demonstrate far transfer: success on novel-context tasks that share structure but not surface form with training tasks.

The product surfaces, AI co-pilot behavior, and assessment metrics in this spec are direct consequences of the pedagogical commitments in pedagogy.md. The architectural choices are detailed in technical-approach.md.

Target user

Can:

Use a smartphone confidently (browser, messaging, simple apps)
Send a basic email
Navigate a single web interface

Cannot reliably:

Complete tasks that span multiple applications
Move information between apps (e.g., copy from a web page into a form)
Manage persistent state (files, downloads, draft documents)
Recover from errors without abandoning the task
Direct an AI to help, then evaluate whether its output is correct

Mental-model gaps (the deeper constraint)

Per the Urban Institute's 2019 finding¹ and the conceptual-change literature², the gating issue for adult digital fluency is not UI familiarity but mental models. Smartphone-fluent users lack:

A model of the file system (files exist persistently in named locations, not encapsulated inside apps)
A model of multi-window state (multiple things can be open and operated on in parallel)
A model of hierarchical organization (folders within folders, threads within mailboxes)
A model of versioning and history (current state was reached by a series of edits that can be undone, branched, or compared)

The curriculum must construct these. Procedural training over them produces sandbox-procedural users who fail in the wild. See pedagogy.md §3.

Core product concept

A task-based learning system inside an instrumented simulated desktop, with an embedded AI co-pilot that observes the user's work and intervenes pedagogically.

The simulated environment is a deliberate architectural choice driven by latency, telemetry richness, and reliability constraints — see technical-approach.md §1.

System overview

1. Simulated desktop (in-browser)

A simplified OS-like environment with:

Browser (sandboxed iframe with overlay; we control all "websites" the user can navigate to)
Email client (custom mock — no real send/receive; messages are scripted by the task engine)
Document editor (TipTap-based)
Form interface (React Hook Form-based)
File browser (over a virtual filesystem, ZenFS)

Constraints:

Minimal UI — controlled complexity, only the affordances the curriculum needs
Realistic enough that schemas transfer (functional fidelity, per the simulator-fidelity literature)
Fully instrumented at the event level (see technical-approach.md §3)

2. Task engine

Each task:

Multi-step
Specifies a primary pattern and 0–2 secondary patterns being practiced
Specifies an underlying schema being constructed (which mental model the task targets)
Cross-application when appropriate to the level
Drawn from real-world scenarios

Examples (these are placeholder content — final task content is shaped by fieldwork.md Phase 1):

Apply for a service: download a benefit form, fill it out, attach proof, submit.
Research + summarize + send: read three short articles, extract three claims, draft an email summarizing them.
Verify + complete: receive an email asking for confirmation, find the original document referenced, verify a fact, reply.

3. AI co-pilot

Persistent side panel. Capabilities:

Context-aware hints (graduated escalation, not direct answers)
Pattern naming after task subgoals
Metacognitive debrief at task end
Demonstration on explicit user request, after attempt
Error explanation when the user gets stuck after multiple attempts

Rules (see pedagogy.md §4 and technical-approach.md §4):

Default behavior is silence. The co-pilot speaks only when a defined trigger fires.
Attempt before assist (productive struggle).
Pattern naming uses the curriculum's controlled vocabulary; the same pattern is named the same way every time it appears.
Forbidden moves: condescending language ("simply", "just"), unspecified praise, false-flattering pattern attribution, more than one debrief question per task.

4. Assessment engine

Tracks two distinct categories of metric:

Near-transfer (procedural fluency):

Completion rate on training-like tasks
Time-to-completion improvement
Error frequency
AI usage patterns (productive vs. answer-seeking, judged by chat-quality LLM-as-judge)

Far-transfer (schema application): (headline metric)

% of users who successfully complete a task they've never seen before, in a context they've never trained in, using a pattern from earlier in the curriculum
Pattern-naming recall (do users name the patterns the co-pilot named for them, weeks later, without prompting?)
Self-efficacy on novel tasks

Outputs:

Per-pattern mastery score (decays over time)
Per-mental-model mastery score
Progression readiness signal (next-task selection driver)

The near/far distinction is non-negotiable. A system that only measures near-transfer will produce sandbox-procedural users — see pedagogy.md §6 (falsification).

Transfer Design Principles

The five evidence-supported moves from pedagogy.md, translated into operational product constraints:

TDP-1: Cross-domain task families

Each skill is practiced in at least three different surface forms before being marked "taught." Not three email tasks — one email task, one form task, one document task that share the same underlying pattern. The variation across surfaces produces low-road transfer; the consistency of pattern builds the schema.

Implementation: Curriculum-engine constraint. The task selection algorithm cannot mark a pattern "mastered" on a single surface form, regardless of completion success.

TDP-2: Explicit pattern naming

On task subgoal completion, the co-pilot names the pattern the user just used, in the curriculum's controlled vocabulary, with one example of where the pattern recurs.

Implementation: A defined trigger in the co-pilot orchestrator (task-complete trigger). The pattern vocabulary is a small, stable set (~12–20 patterns) defined as part of curriculum design, not invented per-response.

TDP-3: Metacognitive debrief

At task end, the co-pilot asks one question: "What pattern did you use? Where else might it apply?" The user articulates the schema in their own words, the co-pilot confirms or refines.

Implementation: A defined trigger in the co-pilot orchestrator (task-end trigger). Frequency limit: one debrief per task. Skipped if the user shows signs of metacognitive overload (per McCarthy 2018 — see pedagogy.md §4).

TDP-4: Contrasting cases

Periodically (~every 4–6 tasks), the system inserts a near-neighbor task that looks similar to the previous one but requires a different pattern. The co-pilot prompts: "Is this the same pattern as last time, or different? How can you tell?"

Implementation: Task-selection logic. The contrasting-case trigger fires when the system has just completed a task with primary pattern X, and a near-neighbor task with primary pattern Y exists.

TDP-5: Far-transfer assessment

Assessment cycles include tasks the user has never seen, in surface forms they've never trained in, that require previously-taught patterns.

Implementation: The Assessment Engine reserves a held-out pool of "novel surface form" tasks per pattern. These are never used as training tasks for any user.

Curriculum structure

Five levels. Each progresses on the kind of mental-model construction it requires, not procedural difficulty alone. Constraint per TDP-1: each skill must appear in ≥3 distinct surface forms before being considered taught.

Level	Goal	Mission alignment
L1 — Operational basics	Construct the desktop mental model: file persistence, multi-window state, basic hierarchy.	Bridge level. Probably not where our distinctive contribution lives — public libraries already deliver this for free, with in-person support, at scale. v1 likely treats this as a fast-track diagnostic or partners with library Level-1 onboarding.
L2 — Transactional tasks	Sequencing and verification. Multi-step processes within one or two apps; recovery from errors.	Bridge level. Some library systems handle this; many don't. Our value-add starts here.
L3 — Workflow execution	Operate across applications. Compose Level 1–2 patterns into longer workflows that span apps and sustain state.	Where our core mission begins. The library system does not scaffold this depth. The Urban Institute (2019) names this as the gap providers can't fill.
L4 — AI-augmented workflows	Direct AI as a tool inside a workflow. Evaluate AI output. Refine.	Novel. No established curriculum exists at this level for adult digital-fluency populations.
L5 — Adaptation layer	Encounter an unfamiliar tool, recognize patterns, figure it out. The closest thing the platform certifies to "I am now digitally fluent."	Most aligned with the pitch's thesis. Tools change; AI agents proliferate; the only durable skill is the ability to meet new ones.

Full curriculum content — task families, patterns, mental-model tracks, the controlled pattern vocabulary the AI co-pilot uses — lives in curriculum.md. That document also flags the strategic question of where v1 should target on this scale (see Implications below).

Cross-cutting: mental-model construction tracks

Independent of the procedural-skill progression, the curriculum tracks four mental-model constructions:

File system (hierarchical, persistent, named locations) — primarily Level 1
Multi-window state (parallel things, focus, switching) — Levels 1–3
Versioning and history (undo, drafts, edits-over-time) — Levels 2–4
AI as directable system (delegation, evaluation, refinement) — Levels 4–5

The Assessment Engine tracks each as a separate mastery dimension.

UX principles

One task at a time. No competing surfaces, no notifications.
Minimal interface. Only the affordances the current task requires are interactable.
Immediate feedback. Telemetry-driven; the interface reacts within <100ms even if the AI co-pilot's response takes longer.
AI always available, never intrusive. The co-pilot side panel is persistent but speaks only on defined triggers (per TDP).
Errors are tasks, not failures. A wrong attempt becomes a contrasting-case opportunity, not a marked failure.

MVP scope

Open question: where does v1 target on the curriculum?

The previous draft of this spec scoped v1 to "Curriculum Levels 1–2 (~10–15 tasks total)." Drafting curriculum.md surfaced a problem with that framing: Level 1–2 work is largely already served by the public library system (free, in-person, with thirty years of operational experience). A v1 that targets the same competencies competes with established infrastructure and betrays the pitch's "transferable schemas in the age of AI" thesis, because the novel content lives at Levels 3+.

Two alternatives we are considering for v1 — to be decided jointly with Phase 2 of fieldwork.md:

Levels 2–3, Level 1 as a fast-track diagnostic. Users with smartphone fluency (most of the target population) skip Level 1 in 10–20 minutes; users without it get routed to library/Northstar acquisition before re-entering. Targets the real gap libraries don't fill.
Levels 3–4 only, library/ABE partnership for Levels 1–2. Sharper. Concedes the basics to established infrastructure and concentrates engineering on the novel-contribution levels. Distribution requires a partner — fieldwork.md is set up to identify one.

v1 features (regardless of curriculum scope):

Simulated desktop with five core apps
Full telemetry layer per technical-approach.md §3
AI co-pilot with the trigger system, model split (Sonnet + Haiku), and forbidden-moves list
LLM-as-judge knowledge-state estimator
Near-transfer + far-transfer assessment instruments
Per-user analytics dashboard for the team
~12–20 tasks total (curriculum scope determines which levels they cover)

Not in v1:

Whichever curriculum levels aren't selected as the v1 target
Adaptive sequencing via BKT/DKT (heuristic in v1)
Cohort or peer-learning architecture (deferred until field research informs)
Offline mode
Mobile
Multilingual support

Detailed build phases in technical-approach.md §7.

Key metrics

Primary (headline):

Far-transfer rate: % of users completing a novel-context task using a previously-taught pattern.

Secondary:

Pattern-naming recall (delayed)
Procedural completion rate
Time-to-completion improvement
Sustained engagement (per Bastani: time-on-task and persistence)
Self-efficacy change

Diagnostic (for our team, not the user):

Per-intervention type frequency (are we triggering pattern-naming and debriefs at the right rates?)
Co-pilot chat-quality score (LLM-as-judge on whether interventions were productive)
Mental-model mastery scores per dimension

Risks and mitigations

Pedagogical risks

Sandbox-procedural failure — users become competent in our environment but cannot transfer to real apps. → Mitigated by TDP-1 (≥3 surface forms per skill), TDP-5 (far-transfer assessment), and the falsification metrics in pedagogy.md §6.

AI dependency — users learn to ask the co-pilot rather than to think. → Mitigated by attempt-before-assist (productive struggle), graduated escalation in help, and tracking AI-usage patterns as a chat-quality metric, not just volume.

Metacognitive overload — debrief prompts demoralize low-confidence users. → Mitigated by McCarthy-2018-aware design (one debrief per task max, scaffolded modeling first, skip if affective signals indicate distress). See pedagogy.md §5.

Engagement risks

Novelty effect decay — Bastani's mediator wanes after the first sessions. → Mitigated by the engagement-architecture decision (deferred, see open questions). Field research probably surfaces a cohort/community layer requirement.

Distress vs. productive struggle confusion — the co-pilot intervenes too late or too early. → Mitigated by field research with experienced instructors (fieldwork.md Q2). The v1 trigger thresholds are educated guesses; calibration comes from observation.

Distribution and access risks

The PIAAC Level-1 population is the least likely to find a D2C learning product. → Distribution channel is unspecified. See pitch-and-overview.md "Distribution channel" TODO.

Third-level digital divide — users without home internet or a device cannot practice between sessions; OECD data shows skills decline without sustained practice. → Out of scope for v1 product; partially addressed by partner-distribution choice (libraries provide access).

Competitive risks

Khan / Google / Microsoft can ship a competitor in a quarter if this works. → Mitigated by (a) the population specificity (incumbents have not targeted PIAAC Level-1 adults), (b) the pedagogy specificity (transfer-targeted design is a non-trivial intellectual commitment), (c) the open-source platform layer strategy (long-term defensibility comes from being the standard, not from the moat).

Technical notes

Architecture, telemetry layer, AI integration, model choices, state estimation, cost model, and build phases: technical-approach.md.

Headline points:

Browser-only (no app install). React-based.
Backend handles task engine, tutor orchestration, knowledge state.
Anthropic Claude (Sonnet for orchestration, Haiku for high-frequency).
Per-active-user-hour cost: ~$0.05–0.15 with prompt caching.
v1 buildable in 4–6 months with a small team; ~50–60% assembled from MIT-licensed open-source (daedalOS, TipTap, ZenFS, React Hook Form).

Distribution and pilot plan

TODO (Matt): Distribution channel commitment + first-cohort plan.

Field-research program (fieldwork.md) is designed to produce 2–3 named candidate partner organizations as a deliverable. Defer commitment until Phase 2 of that program completes.

Likely shape (subject to revision):

Partner with one library system or ABE provider for the first 100–500-user pilot.

Co-design content for a cohort with the partner organization (their staff knows their learners; co-design has compounding pedagogical value).

Partner organization runs in-person component (per Urban Institute: "humanizing" matters); product runs the AI-coached interactive component.

Pilot success criterion: defined jointly with partner against far-transfer rate and engagement-retention thresholds.

Long-term expansion

Adaptive difficulty engine (BKT/DKT) replacing v1 heuristic
Levels 3–5 of the curriculum
Larger task library
Benchmarking system (the dataset becomes the largest empirical record of adult digital-fluency transfer)
Enterprise / employer deployment
Certification layer (third-party-recognized credential for completion)
Open-source platform release (daedalOS-derived shell + telemetry layer + co-pilot integration; other adult-education orgs adopt and extend)

End state

A system that:

Teaches transferable digital competence (not procedural memorization)
Measures it with both near and far transfer instruments
Improves continuously as the dataset grows
Becomes the standard layer between low-fluency adults and AI-augmented digital systems
Generates the empirical evidence the field does not have

Hecker & Loprest, Foundational Digital Skills for Career Progress, Urban Institute 2019. Notes: research/grey-literature/urban-institute-2019-foundational-digital-skills.md. ↩
Synthesis of conceptual-change literature (Chi 2008 framework, NN/G mental-model studies). See research/summaries/adult-ct-and-digital-skills-transfer.md and pedagogy.md §3. ↩