[{"data":1,"prerenderedAt":792},["ShallowReactive",2],{"navigation":3,"\u002Fblog\u002Fsymbolic-descent":39,"\u002Fblog\u002Fsymbolic-descent-project":551,"\u002Fblog\u002Fsymbolic-descent-surround":789},[4,26],{"title":5,"path":6,"stem":7,"children":8,"page":25},"Projects","\u002Fprojects","projects",[9,13,17,21],{"title":10,"path":11,"stem":12},"AntiAgent","\u002Fprojects\u002Fantiagent","projects\u002Fantiagent",{"title":14,"path":15,"stem":16},"Cubist","\u002Fprojects\u002Fcubist","projects\u002Fcubist",{"title":18,"path":19,"stem":20},"Eliane","\u002Fprojects\u002Feliane","projects\u002Feliane",{"title":22,"path":23,"stem":24},"Genario","\u002Fprojects\u002Fgenario","projects\u002Fgenario",false,{"title":27,"path":28,"stem":29,"children":30,"page":25},"Blog","\u002Fblog","blog",[31,35],{"title":32,"path":33,"stem":34},"Symbolic descent: gradient descent, applied to rules instead of weights","\u002Fblog\u002Fsymbolic-descent","blog\u002Fsymbolic-descent",{"title":36,"path":37,"stem":38},"Laws from single experiences: an online symbolic world-model for ARC-AGI-3","\u002Fblog\u002Fsymbolic-world-model","blog\u002Fsymbolic-world-model",{"id":40,"title":32,"author":41,"body":44,"date":541,"description":542,"extension":543,"image":544,"meta":545,"minRead":546,"navigation":547,"path":33,"project":548,"seo":549,"stem":34,"__hash__":550},"blog\u002Fblog\u002Fsymbolic-descent.md",{"name":42,"description":43},"Louis Manhès","Founder & ML engineer",{"type":45,"value":46,"toc":530},"minimark",[47,66,69,79,84,103,203,216,220,227,239,283,290,294,309,316,320,327,337,343,349,355,359,362,365,369,423,430,434],[48,49,50,53,54,58,59,65],"p",{},[51,52,14],"a",{"href":15}," is an attempt to build a ",[55,56,57],"strong",{},"symbolic autotelic agent",": one that sets its own goals and grows an open-ended repertoire of skills, in the tradition of intrinsically motivated agents (",[51,60,64],{"href":61,"rel":62},"https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.09830",[63],"nofollow","Colas et al., 2022","), but with every moving part readable. An agent like that needs a very particular kind of learner. It must learn from single experiences, because a one-life world grants no replays. It must not forget, because every skill it builds tomorrow stands on what it learned today. And it must know what it does not know, because its own ignorance is the map it explores by.",[48,67,68],{},"Gradient descent, which nudges a vector of weights down a loss one small step at a time, is a poor fit for all three. It needs many passes to fit, it overwrites old competence when trained on new data, and a weight vector cannot point to the place where it is ignorant.",[48,70,71,74,75,78],{},[55,72,73],{},"Symbolic descent"," keeps the shape of gradient descent and changes the object. Instead of a vector of weights, the thing being optimized is a small, readable ",[55,76,77],{},"theory",": a set of typed laws. The claim of this post is that you can keep the whole machinery of gradient descent (loss, regularizer, gradient, mini-batch) and get something interpretable, stable under continual learning, and far more data-efficient in return. Cubist uses this to learn a world-model for ARC-AGI-3, but the method is general.",[80,81,83],"h2",{"id":82},"the-correspondence","The correspondence",[48,85,86,87,91,92,95,96,99,100,102],{},"In parametric learning you minimize a loss ",[88,89,90],"code",{},"L(θ)"," by stepping ",[88,93,94],{},"θ ← θ - η ∇L(θ)",". A regularizer biases toward simpler ",[88,97,98],{},"θ",", a mini-batch estimates the gradient from a sample, and early stopping watches a held-out loss. Symbolic descent minimizes the same kind of objective, but ",[88,101,98],{}," is a structured discrete theory rather than a vector. Every familiar piece has a counterpart.",[104,105,106,119],"table",{},[107,108,109],"thead",{},[110,111,112,116],"tr",{},[113,114,115],"th",{},"gradient descent",[113,117,118],{},"symbolic descent",[120,121,122,131,139,147,155,163,171,179,187,195],"tbody",{},[110,123,124,128],{},[125,126,127],"td",{},"parameters: a weight vector",[125,129,130],{},"a theory: a set of typed laws",[110,132,133,136],{},[125,134,135],{},"loss",[125,137,138],{},"the data cost (bits to encode the residual)",[110,140,141,144],{},[125,142,143],{},"regularizer",[125,145,146],{},"the model cost (bits to write the laws; Occam, built in)",[110,148,149,152],{},[125,150,151],{},"the gradient",[125,153,154],{},"the labelled errors (miss, over-fire, true delta)",[110,156,157,160],{},[125,158,159],{},"the space of descent directions",[125,161,162],{},"the abduction set: every small program that explains the error",[110,164,165,168],{},[125,166,167],{},"a gradient step",[125,169,170],{},"a directed edit (generalize, specialize, or patch)",[110,172,173,176],{},[125,174,175],{},"a mini-batch",[125,177,178],{},"the revised laws' own support: every past transition they answer for",[110,180,181,184],{},[125,182,183],{},"early stopping on validation loss",[125,185,186],{},"held-out, predict-before-learn scoring",[110,188,189,192],{},[125,190,191],{},"weight decay",[125,193,194],{},"condensation: merging laws whenever the merged theory costs fewer bits",[110,196,197,200],{},[125,198,199],{},"a convex bowl",[125,201,202],{},"a combinatorial lattice with real local minima",[48,204,205,206,209,210,215],{},"The objective is a two-part description length, ",[88,207,208],{},"L(T) + L(D | T)",": the bits to write the theory down, plus the bits to encode the data given it, paid wherever the theory mispredicts (",[51,211,214],{"href":212,"rel":213},"https:\u002F\u002Fdoi.org\u002F10.1016\u002F0005-1098%2878%2990005-5",[63],"Rissanen, 1978","). The two terms are loss and regularizer at once, which is the quiet but important part. A law earns its place if and only if it saves more bits in the residual than it costs to write. There is no precision threshold, no minimum-support count, nothing to tune. The bits decide.",[80,217,219],{"id":218},"errors-are-the-gradient-abduction-is-the-direction-set","Errors are the gradient, abduction is the direction set",[48,221,222,223,226],{},"This is the load-bearing row of the table. A gradient tells you the direction in which a continuous loss falls fastest. In symbolic descent, the ",[55,224,225],{},"labelled prediction errors"," tell you exactly which edits can lower the description length. They do more than a gradient does, because they are labelled.",[48,228,229,230,233,234,238],{},"The mechanism is ",[55,231,232],{},"abduction",". For every observed change the model failed to explain, it enumerates every small expression that exactly computes that change: constants, attribute reads off the entity, reads off a unique neighbour, and one or two arithmetic combinations of these. This ",[235,236,237],"em",{},"abduction set"," is the space of possible explanations for the change, the discrete counterpart of the candidate descent directions. It turns learning into set algebra:",[240,241,242,253,277],"ul",{},[243,244,245,248,249,252],"li",{},[55,246,247],{},"Same regime = intersection."," Two changes belong to the same law when their abduction sets overlap. There is no clustering metric and no ",[88,250,251],{},"k"," to choose. If a small program explains both, they are one behaviour.",[243,254,255,258,259,262,263,266,267,270,271,276],{},[55,256,257],{},"Generalize = anti-unify."," Two laws merge by keeping their shared structure and lifting clashing constants into computed expressions. ",[88,260,261],{},"colour == 3"," and ",[88,264,265],{},"colour == 5"," become ",[88,268,269],{},"colour == the(touching).colour"," when that expression sits in both abduction sets. This is the bottom-clause generalization of inductive logic programming (",[51,272,275],{"href":273,"rel":274},"https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002FBF03037227",[63],"Muggleton, 1995","), run online.",[243,278,279,282],{},[55,280,281],{},"Memorize = the guaranteed floor."," The literal change is always in the abduction set, so the model can always patch a change it cannot yet name. This is deliberate memorization, honestly accounted: a patch is expensive in bits, which is precisely the pressure to replace it with structure later.",[48,284,285,286,289],{},"A worked step makes it concrete. Suppose a law moves every red entity three cells up, and it fires correctly. Now a green entity moves the same way under the same action, and no law covers it: one miss. The green entity's abduction set contains ",[88,287,288],{},"move := (0, -3)","; so does the red law's. Non-empty intersection: same regime. Anti-unification drops the colour condition, and the merged law covers both. The model cost falls because one condition is gone, the data cost falls because the green move is now explained, and no red entity is disturbed. One labelled miss has moved the theory exactly one step downhill.",[80,291,293],{"id":292},"weight-decay-that-actually-deletes","Weight decay that actually deletes",[48,295,296,297,300,301,304,305,308],{},"Gradient descent shrinks weights toward zero; symbolic descent has something better. ",[55,298,299],{},"Condensation"," sweeps the theory for pairs of laws that share an explanation and merges them whenever the merged theory costs fewer bits, best saving first, until nothing improves. Patches dissolve into laws as evidence accumulates, and the theory loses laws while gaining accuracy. In our ",[51,302,303],{"href":37},"benchmark"," you can watch it happen: on game ",[88,306,307],{},"ft09"," the law count falls from 15 to 14 across the run while held-out accuracy rises, and roughly one revision in three across all 25 games ends with fewer, broader laws than it started with.",[48,310,311],{},[312,313],"img",{"alt":314,"src":315},"Laws in the theory (bars) against held-out F1 (line) across the run, for three games. The count falling while accuracy rises is condensation digesting its own patches.","\u002Fprojects\u002Fcubist\u002Ftheory-metabolism.png",[80,317,319],{"id":318},"what-the-method-buys","What the method buys",[48,321,322,323,326],{},"The parallel is pretty, but the reason to care is what comes out of it. Each claim below is a measured number, not an aspiration (full protocol in the ",[51,324,325],{"href":37},"results post",").",[48,328,329,332,333,336],{},[55,330,331],{},"Continual learning without catastrophic forgetting."," Each revision touches exactly the laws implicated by the error, re-fit over their own support: every past transition they answer for. A law that is already correct produces no error, and the descent never disturbs it. Learning something new therefore has no way to silently break something old, the failure mode that has haunted connectionist learners since McCloskey & Cohen (1989). Measured: across ~2,700 revisions over 25 games, only 18 revisions lost any accuracy on their laws' own past, a ",[55,334,335],{},"0.7% regression rate",".",[48,338,339,342],{},[55,340,341],{},"Generalization instead of memorization, and you can read the margin."," Because the model cost is real bits, a law that merely stores what it has seen is expensive and loses to one that compresses. The objective actively prefers the general law over the lookup table, and the margin is a single number: the compression ratio, bits of data explained per bit of theory. Eight of 25 games end above 1.0, meaning genuine structure, topping out at 10.7×: a game's whole visible dynamics in about twenty falsifiable sentences. Games below 1.0 are memorization-heavy, and the number says so rather than hiding it.",[48,344,345,348],{},[55,346,347],{},"Data efficiency."," A single informative experience justifies a single edit, applied immediately: the model is exact on the frame it just erred on. There is no batch to accumulate and no epoch to re-run, which is exactly the resource a one-life, learn-while-playing setting denies a network.",[48,350,351,354],{},[55,352,353],{},"Interpretability and honest uncertainty."," The parameter is a list of laws you can read, so every prediction traces to the law that produced it, and every failure decomposes into the component that blocked it: a gate, a selector, a transform. And what no law explains is left unchanged rather than guessed, so the model's silence is a real coverage gap. That is the very signal an autotelic agent turns into a goal.",[80,356,358],{"id":357},"where-the-analogy-bites","Where the analogy bites",[48,360,361],{},"It is a lens, not an identity. There is no continuous gradient: the space is discrete, so \"the gradient\" is really the finite set of abducible edits and their cost differences, and the step is the argmin over that set. The terrain is a combinatorial lattice with real local minima, where a smooth convex loss has none, so a misleading early edit can trap the descent. The honest framing is directed discrete local search, but one whose direction set is constructed from the data by abduction, not sampled blindly the way evolutionary methods probe fitness.",[48,363,364],{},"The parametric toolbox still transfers. The model cost is already a complexity penalty, and richer priors slot in the same way. Beams, restarts, and annealing are drop-in optimizers. The replay reservoir is the mini-batch, and held-out predict-before-learn scoring is early stopping.",[80,366,368],{"id":367},"related-work","Related work",[48,370,371,372,375,376,379,380,383,384,389,390,395,396,262,401,406,407,412,413,416,417,422],{},"Symbolic descent sits at the intersection of several traditions. ",[55,373,374],{},"Inductive logic programming"," supplies the bones: per-entity most-specific descriptions are Progol-style bottom clauses (",[51,377,275],{"href":273,"rel":378},[63],"), generalized by contrast against negatives. ",[55,381,382],{},"Theory induction"," framings share the laws-as-programs stance: the Apperception Engine (",[51,385,388],{"href":386,"rel":387},"https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.02227",[63],"Evans et al., 2020","), Schema Networks (",[51,391,394],{"href":392,"rel":393},"https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.04317",[63],"Kansky et al., 2017","), and the recent program-synthesis world-models ",[51,397,400],{"href":398,"rel":399},"https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.12275",[63],"WorldCoder",[51,402,405],{"href":403,"rel":404},"https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.10819",[63],"PoE-World",". The salient difference: those recent systems use an LLM to propose the programs, where symbolic descent learns fully online, without an LLM proposer, from single transitions. The same contrast holds against LLM coding-agent approaches to ARC-AGI-3 (",[51,408,411],{"href":409,"rel":410},"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.05138",[63],"Rodionov, 2026","). ",[55,414,415],{},"State-merging automata induction"," (RPNI: Oncina & García, 1992; ALERGIA: Carrasco & Oncina, 1994) is the direct ancestor of condensation: start from total memorization, merge while a description-length criterion pays. And ",[51,418,421],{"href":419,"rel":420},"https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.08381",[63],"DreamCoder","'s wake-sleep library learning is the closest relative of what comes next for Cubist: growing a repertoire of skills by the same compression pressure that grows the laws.",[48,424,425,426,429],{},"The point of all this is not a clever analogy. It is that an agent meant to learn continually, and to set its own goals, needs a learner that does not forget, generalizes from little data, and knows what it does not know. Symbolic descent is an attempt at exactly that learner. The ",[51,427,428],{"href":37},"companion post"," puts it to work as a world-model across all 25 public ARC-AGI-3 games and reports how well it does, curves, tables, and failures included.",[80,431,433],{"id":432},"references","References",[240,435,436,447,457,463,466,474,484,494,503,510,520],{},[243,437,438,439,443,444,336],{},"Muggleton (1995). ",[51,440,442],{"href":273,"rel":441},[63],"Inverse entailment and Progol",". ",[235,445,446],{},"New Generation Computing 13",[243,448,449,450,443,454,336],{},"Rissanen (1978). ",[51,451,453],{"href":212,"rel":452},[63],"Modeling by shortest data description",[235,455,456],{},"Automatica 14",[243,458,459,460,336],{},"McCloskey & Cohen (1989). Catastrophic interference in connectionist networks. ",[235,461,462],{},"Psychology of Learning and Motivation 24",[243,464,465],{},"Oncina & García (1992). Inferring regular languages in polynomial updated time (RPNI). Carrasco & Oncina (1994). Learning stochastic regular grammars by means of a state merging method (ALERGIA).",[243,467,468,469,473],{},"Evans et al. (2020). ",[51,470,472],{"href":386,"rel":471},[63],"Making sense of sensory input"," (the Apperception Engine).",[243,475,476,477,443,481,336],{},"Kansky et al. (2017). ",[51,478,480],{"href":392,"rel":479},[63],"Schema Networks: zero-shot transfer with a generative causal model of intuitive physics",[235,482,483],{},"ICML",[243,485,486,487,443,491,336],{},"Tang, Key & Ellis (2024). ",[51,488,490],{"href":398,"rel":489},[63],"WorldCoder: building world models by writing code and interacting with the environment",[235,492,493],{},"NeurIPS",[243,495,496,497,443,501,336],{},"Piriyakulkij et al. (2025). ",[51,498,500],{"href":403,"rel":499},[63],"PoE-World: compositional world modeling with products of programmatic experts",[235,502,493],{},[243,504,505,506,336],{},"Rodionov (2026). ",[51,507,509],{"href":409,"rel":508},[63],"Executable world models for ARC-AGI-3 in the era of coding agents",[243,511,512,513,443,517,336],{},"Ellis et al. (2021). ",[51,514,516],{"href":419,"rel":515},[63],"DreamCoder: bootstrapping inductive program synthesis with wake-sleep library learning",[235,518,519],{},"PLDI",[243,521,522,523,443,527,336],{},"Colas, Karch, Sigaud & Oudeyer (2022). ",[51,524,526],{"href":61,"rel":525},[63],"Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning: a short survey",[235,528,529],{},"JAIR 74",{"title":531,"searchDepth":532,"depth":532,"links":533},"",2,[534,535,536,537,538,539,540],{"id":82,"depth":532,"text":83},{"id":218,"depth":532,"text":219},{"id":292,"depth":532,"text":293},{"id":318,"depth":532,"text":319},{"id":357,"depth":532,"text":358},{"id":367,"depth":532,"text":368},{"id":432,"depth":532,"text":433},"2026-07-01","Symbolic descent keeps the shape of gradient descent but changes the object it optimizes, from a weight vector to a readable theory of laws. This post explains the parallel, the machinery that makes it work, and what it buys for continual learning, reasoning, and interpretability.","md","\u002Fprojects\u002Fcubist\u002Fperception.png",{},10,true,"cubist",{"title":32,"description":542},"Zu328ZPnzw_FV2NtHRhZGsYuVFdpK6XClxjbnH7ZWMA",{"id":552,"title":14,"body":553,"color":773,"date":774,"description":775,"extension":543,"featured":547,"icon":776,"image":544,"logo":777,"logoDark":777,"meta":778,"navigation":547,"order":779,"path":15,"seo":780,"status":781,"stem":16,"tagline":782,"tags":783,"url":777,"__hash__":788},"projects\u002Fprojects\u002Fcubist.md",{"type":45,"value":554,"toc":765},[555,559,588,593,597,600,606,612,618,621,630,633,636,640,655,659,662,668,678,682,685,709,712,714],[80,556,558],{"id":557},"the-mission","The mission",[48,560,561,562,565,566,569,570,573,574,579,580,579,585,326],{},"Watch a child in a new place. Nobody hands them a goal or a reward function. They poke at things, notice what surprises them, invent little challenges, and out of that self-directed play comes an ever-growing repertoire of skills. Developmental AI researchers call such a learner ",[55,563,564],{},"autotelic",", from the Greek ",[235,567,568],{},"auto"," (self) and ",[235,571,572],{},"telos"," (goal): an agent that invents, selects, and pursues its own goals, driven by curiosity rather than external reward (",[51,575,578],{"href":576,"rel":577},"https:\u002F\u002Fwww.frontiersin.org\u002Fjournals\u002Fneurorobotics\u002Farticles\u002F10.3389\u002Fneuro.12.006.2007\u002Ffull",[63],"Oudeyer & Kaplan, 2007","; ",[51,581,584],{"href":582,"rel":583},"https:\u002F\u002Fjmlr.org\u002Fpapers\u002Fv23\u002F21-0808.html",[63],"Forestier et al., 2022",[51,586,64],{"href":61,"rel":587},[63],[48,589,590,591,336],{},"Cubist's mission is to build one, with a twist. The autotelic agents of Oudeyer, Colas, and colleagues are usually powered by deep reinforcement learning. Cubist is symbolic all the way down. Its model of the world is a set of laws you can read. Its skills are small closed-loop programs. And the algorithm that improves both is a discrete cousin of gradient descent we call ",[55,592,118],{},[80,594,596],{"id":595},"the-bet-symbolic-not-gradient","The bet: symbolic, not gradient",[48,598,599],{},"Almost everything that learns at scale today learns by gradient descent: nudge billions of opaque weights down a loss. It works spectacularly well when data is plentiful and the world holds still. But an agent dropped into a new world, learning from a single unfolding life, needs three things that gradients make expensive.",[48,601,602,605],{},[55,603,604],{},"Continual learning."," A neural network trained on something new tends to silently overwrite what it knew, the classic problem of catastrophic forgetting (McCloskey & Cohen, 1989). A theory made of discrete laws does not have that failure mode. A law that is already correct produces no error, and a learner driven by errors never touches it. One experience is enough to add a law, and adding it breaks nothing.",[48,607,608,611],{},[55,609,610],{},"Reasoning."," An explicit theory is something you can do things with: query it, plan through it, spot the exact situations it does not cover. Every law is a falsifiable claim. The world either behaves as the law says, or the law gets revised. A weight vector predicts; a theory explains.",[48,613,614,617],{},[55,615,616],{},"Interpretability."," When a symbolic model fails, the failure has an address: this condition was too narrow, that rule missed an entity. When it succeeds, the model is its own documentation.",[48,619,620],{},"To make this concrete, here is a real theory Cubist learned on one of the benchmark games, in about eight seconds of play:",[622,623,628],"pre",{"className":624,"code":626,"language":627},[625],"language-text","# up key: the player (the only width-5 entity) moves 5 cells up\nACTION3: (self.width == 5) ⇒ move := (0, -5)\n\n# down key: the same player moves 5 cells down\nACTION4: (self.width == 5) ⇒ move := (0, 5)\n\n# right key: whatever sits on row 43 slides 5 cells right\nACTION2: (self.row == 43)  ⇒ move := (5, 0)\n\n# on every action, the entity with a direct neighbour (the life-bar)\n# shifts its colours one notch: time is running out\nany:     exists(nbrs(d1))  ⇒ recolor := (0,0,0,2, 0,0,0,0, 0,0,0,-2, 0,0,0,0)\n","text",[88,629,626],{"__ignoreMap":531},[48,631,632],{},"Four laws, and you know the game. Each one states when it fires, who it applies to, and what happens. When the world contradicts a law, that law and only that law is revised.",[48,634,635],{},"The honest counterpart: gradients win where perception is raw and data is abundant. The symbolic bet targets the opposite regime, low data, a single life, and a demand for explanations. That happens to be the regime children, robots in the field, and scientific discovery all live in.",[80,637,639],{"id":638},"the-benchmark-arc-agi-3","The benchmark: ARC-AGI-3",[48,641,642,643,648,649,654],{},"Betting on a research direction means picking a test you cannot fool. Ours is ",[51,644,647],{"href":645,"rel":646},"https:\u002F\u002Farcprize.org\u002Farc-agi\u002F3",[63],"ARC-AGI-3",", the interactive-reasoning benchmark from the ARC Prize Foundation. The agent is dropped into a grid-world game with no instructions, no stated goal, and a handful of abstract actions, and must figure out what the game is while playing it. Scoring emphasizes sample efficiency: solving with fewer actions beats solving with more. That is ",[51,650,653],{"href":651,"rel":652},"https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.01547",[63],"Chollet's definition of intelligence"," as skill-acquisition efficiency made operational, and a benchmark where the goal is never stated is exactly the right exam for an agent whose whole premise is inventing goals for itself.",[80,656,658],{"id":657},"where-it-stands","Where it stands",[48,660,661],{},"The world-model half is built and measured. Playing each of the 25 public ARC-AGI-3 games for a single 200-step life, with no pretraining, no gradients, and no game-specific tuning, Cubist learns each game's mechanics online as a small theory of laws, predicting every frame before learning from it. Held-out accuracy climbs from 0.46 in the first quarter of a run to 0.72 in the last, and 23 of 25 games end better than they started:",[48,663,664],{},[312,665],{"alt":666,"src":667},"Per-game held-out prediction accuracy (F1), first quartile of the run versus final quartile. 23 of 25 games improve within a single 200-step life.","\u002Fprojects\u002Fcubist\u002Flearning-progress.png",[48,669,670,671,673,674,677],{},"The ",[51,672,325],{"href":37}," covers the representation, the learning algorithm, every metric, and where the model breaks. The ",[51,675,676],{"href":33},"method post"," works through symbolic descent itself and the full parallel with gradient descent.",[80,679,681],{"id":680},"whats-next-the-autotelic-loop","What's next: the autotelic loop",[48,683,684],{},"A model that predicts well is necessary but not sufficient. Prediction is not control, and on its own the world-model solves no levels. The second half of the program closes the loop:",[240,686,687,693,703],{},[243,688,689,692],{},[55,690,691],{},"Gaps become goals."," The model is honest: it never asserts a change it cannot justify, so its coverage gaps are a precise map of what is still unknown. That map is the agent's intrinsic motivation. Act where the theory is blind.",[243,694,695,698,699,702],{},[55,696,697],{},"Skills as programs."," Behaviours are grown as closed-loop programs over the theory's own vocabulary, scored by imagining them forward through the world-model before spending a real action, and kept in an open-ended repertoire the way ",[51,700,421],{"href":419,"rel":701},[63]," grows a library of reusable abstractions.",[243,704,705,708],{},[55,706,707],{},"The loop."," Model the world, find what you cannot explain, invent a skill to probe it, and let the sharper model expose new gaps. Open-ended, self-directed, and readable at every step.",[48,710,711],{},"This is the part under active construction.",[80,713,433],{"id":432},[240,715,716,726,733,743,750,758],{},[243,717,718,719,443,723,336],{},"Oudeyer & Kaplan (2007). ",[51,720,722],{"href":576,"rel":721},[63],"What is intrinsic motivation? A typology of computational approaches",[235,724,725],{},"Frontiers in Neurorobotics",[243,727,522,728,443,731,336],{},[51,729,526],{"href":61,"rel":730},[63],[235,732,529],{},[243,734,735,736,443,740,336],{},"Forestier, Portelas, Mollard & Oudeyer (2022). ",[51,737,739],{"href":582,"rel":738},[63],"Intrinsically motivated goal exploration processes with automatic curriculum learning",[235,741,742],{},"JMLR 23(152)",[243,744,745,746,336],{},"Chollet (2019). ",[51,747,749],{"href":651,"rel":748},[63],"On the measure of intelligence",[243,751,752,753,336],{},"ARC Prize Foundation (2026). ",[51,754,757],{"href":755,"rel":756},"https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.24621",[63],"ARC-AGI-3: a new challenge for frontier agentic intelligence",[243,759,512,760,443,763,336],{},[51,761,516],{"href":419,"rel":762},[63],[235,764,519],{},{"title":531,"searchDepth":532,"depth":532,"links":766},[767,768,769,770,771,772],{"id":557,"depth":532,"text":558},{"id":595,"depth":532,"text":596},{"id":638,"depth":532,"text":639},{"id":657,"depth":532,"text":658},{"id":680,"depth":532,"text":681},{"id":432,"depth":532,"text":433},"#d97706","2026-01-01","Cubist is a research program on symbolic autotelic agents, agents that teach themselves an open-ended repertoire of skills. Gradient descent is replaced by symbolic descent. World-models are learned as readable programs, skills as closed-loop programs, and everything is measured on ARC-AGI-3.","i-lucide-box",null,{},4,{"title":14,"description":775},"Research","Building a symbolic autotelic agent: an AI that sets its own goals and learns its world as readable laws",[784,73,785,786,647,787],"Autotelic agents","World-models","Open-endedness","Interpretability","8CmPek1aJkK0sokgX8TS8knzVuEB1afssi4xS9PwJNs",[777,790],{"title":36,"path":37,"stem":38,"description":791,"children":-1},"Cubist's world-model learns each ARC-AGI-3 game's mechanics as a small theory of symbolic laws, online, from single experiences, with no pretraining and no gradients. This post presents the representation, the learning algorithm, and an evaluation across all 25 public games.",1783074437902]