Discovered paper pair (Session 38). Detailed explanation not available.
Action-conditioned sequential prediction enables path integration and dynamic object-location binding for world modeling. Prediction accuracy improves across sequence through in-context learning. Structured representations with flexible binding emerge to support prediction - new bindings learnable late in sequence, out-of-distribution bindings acquirable through sequential exposure.
view paper→Continuous latent actions serve as unified proxy actions, enabling interaction knowledge transfer from unlabeled human videos to robot control. World model learns diverse interactions from 44k hours of egocentric video. Post-training on small-scale robot data yields physics understanding and precise action controllability. Generative modeling enables teleoperation, policy evaluation, and planning.
view paper→