RoboPocket:/robopocket*/ Improve Robot Policies/ˈpɑləsiz/ Instantly/ˈɪnstəntli/ with Your Phone
Scaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing the underlying policy's weaknesses, leading to inefficient coverage of critical state distributions. Conversely, interactive methods like DAgger effectively address covariate shift but rely on physical robot execution, which is costly and difficult to scale. To reconcile this trade-off, we introduce RoboPocket, a portable system that enables Robot-Free Instant Policy Iteration using single consumer smartphones. Its core innovation is a Remote Inference framework that visualizes the policy's predicted trajectory via Augmented Reality (AR) Visual Foresight. This immersive feedback allows collectors to proactively identify potential failures and focus data collection on the policy's weak regions without requiring a physical robot. Furthermore, we implement an asynchronous Online Finetuning pipeline that continuously updates the policy with incoming data, effectively closing the learning loop in minutes. Extensive experiments demonstrate that RoboPocket adheres to data scaling laws and doubles the data efficiency compared to offline scaling strategies, overcoming their long-standing efficiency bottleneck. Moreover, our instant iteration loop also boosts sample efficiency by up to 2$\times$ in distributed environments a small number of interactive corrections per person. Project page and videos: https://robo-pocket.github.io.
模仿学习的规模化从根本上受到数据收集效率的限制。虽然手持界面已成为野外数据采集的可扩展解决方案,但它们主要以开环方式运行:操作员在不了解底层策略弱点的情况下盲目收集演示,导致关键状态分布的覆盖效率低下。相反,像 DAgger 这样的交互式方法可以有效地解决协变量偏移问题,但依赖于物理机器人执行,这种方法成本高昂且难以扩展。为了协调这种权衡,我们引入了 RoboPocket,这是一种便携式系统,可以使用单个消费者智能手机实现无机器人即时策略迭代。其核心创新是远程推理框架,通过增强现实 (AR) 视觉前瞻可视化政策的预测轨迹。这种身临其境的反馈使收集者能够主动识别潜在的故障,并将数据收集集中在策略的薄弱区域,而无需物理机器人。此外,我们还实现了一个异步在线微调管道,可以使用传入数据不断更新策略,从而在几分钟内有效地关闭学习循环。大量实验表明,RoboPocket遵循数据扩展规律,与离线扩展策略相比,数据效率提高了一倍,克服了长期存在的效率瓶颈。此外,我们的即时迭代循环还在分布式环境中将样本效率提高了高达 2$\times$,每人进行少量的交互式校正。项目页面和视频:https://robo-pocket.github.io。