The Other Jagged Frontier

The problem

Only 10–15% of US adults have the digital fluency and ability to reliably handle the cross-source, evaluate-and-adapt work that directing an AI requires.¹²

Thesis

Whether AI's dividend is broadly shared or concentrates at the top depends on whether that gap closes.

Benefiting from AI is a stack of three capacities, in order:

Execute a multi-step task that spans applications. Hold the goal, move between tools, recover from errors. Below this line, no AI tool helps.
Structure a problem clearly enough to direct an AI. Decompose the task, name what you want, supply the right context.
Evaluate AI output against the real task. Notice when the answer is wrong, refine, integrate.

Most adults today fail at (1), never reach (2), and cannot leverage (3). The skills that produce digital fluency in one context have to generalize to unfamiliar ones, or they don't count; that makes this a problem of transfer, the one our pedagogy targets. See pedagogy.md for the full argument. Without a deliberate intervention, the adults below the line will miss the gains, and the gap between them and everyone else will widen.³

We propose to build the platform that skills up for digital problem-solving (rather than memorization) at population scale, using AI as the cost-collapse mechanism that makes structured 1:1 coaching deployable at $4 per learner.

The floor

The OECD has measured this with two different instruments, and the floor is the same either way:

PIAAC's Problem Solving in Technology-Rich Environments (PSTRE), through 2017: ~31% of US adults scored at Level 1 or below, and an additional ~19% could not complete the digital assessment at all (no computer experience, failed ICT screener, or opted out). Roughly half of US adults — 130+ million people — were at or below the floor of cross-application digital problem-solving.¹
PIAAC's Adaptive Problem Solving (APS), 2023: ~32% of US adults — about 68 million — score at Level 1 or below.⁴ APS is broader than PSTRE (it covers digital, physical, and social information environments) and the OECD is explicit that the two measures are not directly comparable.⁴ But Level 1 describes the same cognitive ceiling: solving "problems that do not change and thereby do not require adaptivity."

Above the floor, the picture is no better. Only 10–15% of OECD adults reach Level 3 or 4, the bands that describe holding multiple criteria, switching between information sources, evaluating conflicting evidence, and adjusting when conditions change.² Directing an AI sits squarely in that band: the prompt-evaluate-refine loop is adaptivity by another name. AI's productivity gains are real and growing, but they are accruing only to the small fraction of adults already operating there.

The bridge nobody has built

The adult digital-skills field already knows where it fails. The Urban Institute's 2019 synthesis names it directly: training programs can teach an isolated digital task in context, but moving learners past basic familiarity into genuine fluency (the ability to use unfamiliar tools or recover from errors) is the part nobody has solved. Their words:⁵

"Multiple respondents suggested it is not clear how to train people to move from this initial level to more fluency."

Today, public libraries deliver intro-to-computer and intro-to-email instruction at scale, free, with in-person human support. But the bridge from "I can use Gmail" to "I can complete a Medicare appeal that requires reading the denial letter, locating supporting documents, drafting a response with AI assistance, and tracking the case across email and a benefits portal" does not exist as a product.

With fronteir AI models, this bridge is buildable now.

What "fluency" means here

Adult digital competence is measured by the OECD's PIAAC assessment on a five-band scale. The scale runs from "no digital skills" through Level 4. Most modern jobs and government services require Level 2 minimum; AI-augmented work pushes the bar toward Level 3+. The middle bands are where the gap lives:

Level	What an adult at this level can do	Where it's served today
Level 2	Multi-step tasks within a single tool, with inferential reasoning. Example: sort a spreadsheet to count entries matching criteria from another app.	Some library systems; partial.
Level 3	Higher-order tasks across multiple sources, evaluating relevance and reliability. Example: schedule a meeting using a new web app under multiple constraints — booked rooms, participant schedules.	Almost nowhere.
Level 4	Complex problem-solving across unfamiliar tools, integrating evidence to support a decision. Example: research a major purchase across vendors, evaluate source credibility, reconcile contradictions, produce a justified recommendation.	Nowhere.

Why us

I run Happy Robots, an AI consulting firm training Fortune 500 teams to adopt and direct AI systems. Our 15-week enablement programs cover LLM fundamentals through task-level evaluation. One client, Une Femme Wines, went from zero to 100% daily AI adoption in six weeks. I have spent the last two years watching the failure mode this project targets: employees who can operate individual tools but cannot compose cross-application workflows, cannot structure a problem clearly enough to direct an AI, and cannot evaluate whether the AI's output is correct. The curriculum in this proposal is not theoretical. It is what I already teach, restructured around transfer-of-learning research and delivered through software instead of consulting.

I can build this myself. My background is Gettysburg College (CS, economics, statistics), a decade in brand and product at L'Oreal, AB InBev, and Drinkworks, then independent consulting and product development since 2021. In the last year I have shipped 16 projects spanning full-stack development, RAG systems, reinforcement learning, OCR pipelines, agentic frameworks, and browser-based tools, all built with Claude Code in tight iteration cycles. The simulated desktop, the telemetry layer, and the co-pilot integration described in technical-approach.md are within my solo build capacity for v1.

Why now

Three conditions converged in the last 18 months.

AI shifted where the bottleneck sits. When complex tasks become possible for anyone who can direct an AI, directing becomes the gating skill. The OECD's 2025 Bridging the AI Skills Gap report quantifies this: roughly 1 in 3 job vacancies have high AI exposure, but only ~1% require specialized AI skills.⁶ The other 99% require general digital fluency.

Untargeted AI access is widening the divide, not closing it. Microsoft's 2025 diffusion data shows cross-country AI adoption gaps grew from 2–16% (2021) to 4–28% (2024).³ Without structured intervention, AI-powered tools accrue only to the already-skilled.

Frontier AI tutoring works at scale, with measured effect, when it's structured correctly. Bastani et al. (2026), a preregistered RCT of 770 students across 10 Taipei schools over 5 months, produced +0.15 SD on an unassisted final exam, equivalent to 6–9 months of additional schooling. Effects concentrated in lower-tier schools and prior novices. ⁷ See pedagogy.md for what "structured correctly" means and how our design implements it.

Solution

A simulated digital workspace (browser, email, document editor, forms, file system) running in the user's browser, with measurement at the keystroke and event level, with an embedded AI co-pilot observing the user's work,intervening as necessary.

Users complete real tasks across real-feeling apps. The co-pilot watches without interrupting, names the patterns the user just used, and prompts metacognitive reflection at task end. Curriculum content is structured around five evidence-supported design moves for transfer: cross-domain task families, explicit pattern naming, metacognitive debrief, contrasting cases, and far-transfer assessment. See pedagogy.md and curriculum.md.

What that looks like

A learner four months into the platform. She arrived a smartphone-native who had rarely used a desktop email client. Now she is mid-task: choosing a Medicare Advantage plan for her father.

Three vendor sites are open across two browser tabs. A benefit-comparison document sits in a third, half-drafted. She asks the AI co-pilot to summarize the prescription drug coverage differences across the three plans. The AI produces a comparison; one claim contradicts what she just read on Vendor B's page, so she asks the AI to verify against the source; the AI corrects itself; the corrected comparison goes into her document. She finishes by writing a one-paragraph recommendation for her father with three supporting bullets.

She cannot recite every keystroke. But asked what she did, she names the patterns: "I broke it into pieces. I had the AI draft, but I checked its claims against the actual sites. I made a recommendation I believe in."

That is what far-transfer success looks like. The Medicare comparison was never a training task; the patterns were.

Evidence base

Three claims, three pieces of evidence:

1. Adaptive AI tutoring works, and the design pattern matters. Bastani et al. (2026), cited above, is the strongest single piece of upstream evidence we have: engagement-mediated gains, equity-positive distribution, mechanism-isolated experimental design.⁷

2. The transfer mechanisms our design uses have decades of evidence in adjacent domains. Self-explanation prompts in Intelligent Tutoring Systems produce transfer (VanLehn 2011 meta-analysis of 50+ studies). Contrasting-cases pedagogy (Schwartz & Bransford 1998) produces measurable far-transfer gains. The mindful-abstraction mechanism (Salomon & Perkins 1989) is the foundation of high-road transfer. Full citations and product implications in pedagogy.md.

3. The specific evidence base for our population is thin, and that is the contribution opportunity. No RCTs exist on adult computational-thinking transfer. No empirical studies exist on how adults build the mental models digital fluency requires. A platform that deploys a structured intervention to thousands of low-fluency adults and reports honestly on what works will produce the evidence the field lacks.

Why a simulated environment, not an AI agent on real apps

A skeptic's first question: why not deploy a Claude or Operator coach on top of real Gmail, real Google Docs, real government forms? Three reasons.

Latency. Real-time coaching needs sub-1-second response. Frontier vision-based computer-use agents take 2–7 seconds per step (screenshot → inference → action). For productive-struggle pedagogy ("intervene when the user gets stuck, before they give up"), that is 4–8x too slow. An instrumented sandbox responds in <10ms.

Reliability. OSWorld benchmarks show ~80% top-line, but on real production websites (Online-Mind2Web) frontier agents drop to ~30%. Production sites defend against agents with CAPTCHAs, bot detection, and dynamic DOM. That gap is structural and won't close on its own.

Telemetry. Vision agents see pixels. An instrumented sandbox sees keystroke timing, dwell, hover-without-click, partial input, undo events, and paste origin: the cognitive signals that mediate learning gains in Bastani's study and that are invisible to vision.

Full architecture, build-vs-buy analysis, and cost model in technical-approach.md.

Outcomes (measurable)

The headline metric is far-transfer rate: % of users who complete a task they have never seen before, in a context they have never trained in, using a pattern from earlier in the curriculum.

Secondary metrics:

Pattern-naming recall (do users name the patterns the co-pilot named for them, weeks later, without prompting?)
Sustained engagement (Bastani's mediator, measured as time-on-task and persistence)
Procedural completion rate, time-to-completion, error frequency
Self-efficacy on novel digital tasks

A platform without far-transfer measurement is procedural training in disguise. pedagogy.md §6 commits, in advance, to three concrete failure signals that would tell us the design is not working.

Scale economics

Per active user-hour: ~$0.05–0.15 in LLM inference (Anthropic Sonnet + Haiku, with prompt caching). Detailed cost model in technical-approach.md §6.

Per 5-month learning relationship at 2 hours/week (40 hours): **$4 per learner.**
Roughly an order of magnitude lower than human-tutored alternatives; 5–10x lower than vision-agent-based coaching alternatives.
LLM pricing trajectory has been deflationary and is expected to remain so.

The product is price-taking on a fast-deflating curve rather than zero-marginal-cost; plan, budget, and pilot scope assume current prices and improve from there.

What we have already done

Reviewed and synthesized the foundational transfer-of-learning literature: Schwartz & Bransford 1998 (contrasting cases), Salomon & Perkins 1989 (low-road / high-road transfer), How People Learn II Ch 5, Wing 2006 (computational thinking), Bastani et al. 2026 (adaptive AI tutoring RCT). Notes and PDFs bundled in the Research library.
Validated the simulated-vs-vision-agent architectural choice against current benchmark and production-reliability data (OSWorld, Online-Mind2Web, Mind2Web 2). See technical-approach.md §1.
Mapped the open-source build stack: roughly 50–60% of the simulated environment is assembled from MIT-licensed projects (daedalOS, TipTap, ZenFS, React Hook Form). See technical-approach.md §2.
Drafted full pedagogy, curriculum, technical-approach, spec, and field-research documents. Each is a separate, citation-grounded artifact.
Defined a research program for field interviews with 6–8 adult digital-skills providers (libraries, ABE programs, senior programs); see fieldwork.md. Phase 1 begins as soon as funding is secured.

What we propose to do

Build the v1 MVP per the technical-approach doc:

v0 proof of concept (4–6 weeks): one app pair, one task, full telemetry, AI co-pilot loop validated.
v1 MVP (4–6 months): five apps (browser, email, docs, forms, files), curriculum scoped to Levels 2–3 or 3–4 (decision contingent on field-research findings; see product-spec.md MVP scope), far-transfer assessment instrument, deployable to a partner-cohort pilot.
Field research in parallel: visit 6–8 digital-skills providers, reshape co-pilot intervention rules and curriculum content against observed practice.
Pilot deployment: 100–500 learners in collaboration with a library system, ABE provider, or workforce development partner.

Detailed scope, build-vs-buy analysis, and cost model in technical-approach.md. Curriculum content in curriculum.md. Assessment design in product-spec.md.

Ask

TODO: Specific dollar amount, timeline, what the funding buys.

Distribution channel

TODO: Which path to first cohort?

Options to consider:

Library partnership. Public libraries have decades of experience with this exact population. Distribution is solved; pilot recruitment is plausible.

Workforce development partnership. Goodwill, local workforce boards, AEFLA-funded ABE providers. Outcome incentives align (employment outcomes are funder-tracked).

Direct-to-consumer. Highest leverage if it works, but the target population is precisely the one least likely to find a D2C learning product on their own.

Government / public service. Slow but potentially large; possibly via state workforce agencies.

The field-research program (fieldwork.md) is designed to produce a defensible answer here. Defer the commitment until Phase 2 of that program is done.

First-cohort plan

TODO: How we get from v1 MVP to 500–1,000 real learners.

Long-term vision

The standard instrument for measuring digital + AI fluency in adult populations.
The largest empirical dataset on real-world human + AI performance in cross-application work.
An open-source platform layer that other adult-education organizations can adopt and extend.
The foundation layer for workforce adaptation to AI systems: the specific software deployed in libraries, ABE programs, and workforce development centers.

NCES, PIAAC PSTRE Proficiency Level Results (Cycle 1, US data through 2017). https://nces.ed.gov/surveys/piaac/pstreproficiencylevel.asp · 31% of US adults scored at Level 1 or below on PSTRE; an additional ~19% could not complete the digital assessment at all (no computer experience, failed ICT screener, or opted out of computer-based assessment). PSTRE specifically measured the ability to use digital tools — email, web, spreadsheets, simulated apps — to solve information problems. It was retired after Cycle 1. ↩ ↩²
NCES, PIAAC 2023 National Results, Dec 2024. https://www.nces.ed.gov/surveys/piaac/2023/national_results.asp ↩ ↩²
Microsoft Research, AI Diffusion Report 2025 H2. https://www.microsoft.com/en-us/research/wp-content/uploads/2026/01/Microsoft-AI-Diffusion-Report-2025-H2.pdf ↩ ↩²
NCES, PIAAC 2023 National Results — Adaptive Problem Solving, Dec 2024. 32% of US adults at Level 1 or below; OECD average 29%. https://www.nces.ed.gov/surveys/piaac/2023/national_results.asp · APS measures the capacity to achieve goals in dynamic situations where information changes mid-task, across digital, physical, and social information environments. It replaces the prior cycle's Problem Solving in Technology-Rich Environments (PSTRE) measure. The OECD's stated reason for the switch: PSTRE "conflated problem solving and information and communication technologies (ICT) skills, as only test-takers with some (basic) ICT skills could participate" and excluded between 8% and 57% of the target population per country who could not pass the ICT screener (Survey of Adult Skills 2023 — Reader's Companion, OECD 2024, p. 37 — see our notes and the PDF). The two measures are not comparable; NCES is explicit that "the digital problem-solving and adaptive problem-solving domains cannot be compared due to differences in their assessment frameworks." The APS Level 1 descriptor includes the phrase "solve problems that do not change and thereby do not require adaptivity" — a plain-English match for the cognitive operation an AI workflow requires. ↩ ↩²
Hecker & Loprest (Urban Institute), Foundational Digital Skills for Career Progress, 2019. PDF · our notes · original source ↩
OECD, Bridging the AI Skills Gap, 2025. https://www.oecd.org/en/publications/bridging-the-ai-skills-gap_66d0702e-en.html ↩
Chung, Zhang, Kung, Bastani & Bastani (2026), Effective Personalized AI Tutors via LLM-Guided Reinforcement Learning. SSRN 6423358. PDF · original source ↩ ↩²