Every skill you've ever learned, you learned by watching someone do it first.
Reading about bikes doesn't teach you to ride. Documentation doesn't make you a Photoshop expert. The knowledge that matters — procedural knowledge — lives in demonstration, not description.
◆
LLMs solved reasoning. VLMs solved perception. What's left is the embodiment gap.
The distance between knowing and doing. Robots had this problem — they could see, they could plan, but they couldn't move. Static images didn't capture the micro-corrections, the timing, the fluid motion of a human hand. They needed to watch humans navigate space in real time.
VLA models closed the gap. Video in, actions out. Robots learned embodiment by watching embodiment.
Computer use has the same gap. Cursor precision. Timing. Error recovery. The fluidity of 1-2 actions per second instead of 3-10 seconds per action. Models can reason about what to click — they can't click like humans click. They need to watch us do it.
◆
We believe:
the automation of all digital work is inevitable
the bottleneck is demonstrations, not compute
procedural knowledge can only be shown, never written
screens are just another environment for embodied AI
the embodiment gap closes with human data, not more parameters
whoever captures the demonstrations owns the future
experts should be paid to teach machines, not replaced by them quietly
this is the last generation that will work this way — we should record what they know
◆
We didn't start here. We built computer use agents. Shipped before it was a category. Watched them fail in ways no benchmark captures — not from lack of intelligence, but from lack of embodiment.
The models could reason about what to do. They couldn't do it. The hands were missing.
So we're building the hands. Not as code — as data. Human demonstrations, semantically labeled, at scale. The training signal for embodied digital intelligence.
The frontier labs will build the brains. We're building the memory of how humans worked.
The last human demonstrations.