正在加载数据... (如果长时间停留在此,说明浏览器不支持当前脚本或 JS 报错)

Deep Tech

ArXiv 最新论文精选

RoboPocket:/robopocket*/ Improve Robot Policies/ˈpɑləsiz/ Instantly/ˈɪnstəntli/ with Your Phone

Junjie Fang, Wendi Chen, Han Xue 2026-03-05

Scaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing the underlying policy's weaknesses, leading to inefficient coverage of critical state distributions. Conversely, interactive methods like DAgger effectively address covariate shift but rely on physical robot execution, which is costly and difficult to scale. To reconcile this trade-off, we introduce RoboPocket, a portable system that enables Robot-Free Instant Policy Iteration using single consumer smartphones. Its core innovation is a Remote Inference framework that visualizes the policy's predicted trajectory via Augmented Reality (AR) Visual Foresight. This immersive feedback allows collectors to proactively identify potential failures and focus data collection on the policy's weak regions without requiring a physical robot. Furthermore, we implement an asynchronous Online Finetuning pipeline that continuously updates the policy with incoming data, effectively closing the learning loop in minutes. Extensive experiments demonstrate that RoboPocket adheres to data scaling laws and doubles the data efficiency compared to offline scaling strategies, overcoming their long-standing efficiency bottleneck. Moreover, our instant iteration loop also boosts sample efficiency by up to 2$\times$ in distributed environments a small number of interactive corrections per person. Project page and videos: https://robo-pocket.github.io.


模仿学习的规模化从根本上受到数据收集效率的限制。虽然手持界面已成为野外数据采集的可扩展解决方案,但它们主要以开环方式运行:操作员在不了解底层策略弱点的情况下盲目收集演示,导致关键状态分布的覆盖效率低下。相反,像 DAgger 这样的交互式方法可​​以有效地解决协变量偏移问题,但依赖于物理机器人执行,这种方法成本高昂且难以扩展。为了协调这种权衡,我们引入了 RoboPocket,这是一种便携式系统,可以使用单个消费者智能手机实现无机器人即时策略迭代。其核心创新是远程推理框架,通过增强现实 (AR) 视觉前瞻可视化政策的预测轨迹。这种身临其境的反馈使收集者能够主动识别潜在的故障,并将数据收集集中在策略的薄弱区域,而无需物理机器人。此外,我们还实现了一个异步在线微调管道,可以使用传入数据不断更新策略,从而在几分钟内有效地关闭学习循环。大量实验表明,RoboPocket遵循数据扩展规律,与离线扩展策略相比,数据效率提高了一倍,克服了长期存在的效率瓶颈。此外,我们的即时迭代循环还在分布式环境中将样本效率提高了高达 2$\times$,每人进行少量的交互式校正。项目页面和视频:https://robo-pocket.github.io。

POET-X: Memory-efficient/memoryefficient*/ LLM Training/ˈtreɪnɪŋ/ by Scaling Orthogonal/ɔrˈθɔgənəl/ Transformation/ˌtrænsfərˈmeɪʃən/

Zeju Qiu, Lixin Liu, Adrian Weller 2026-03-05

Efficient and stable training of large language models (LLMs) remains a core challenge in modern machine learning systems. To address this challenge, Reparameterized Orthogonal Equivalence Training (POET), a spectrum-preserving framework that optimizes each weight matrix through orthogonal equivalence transformation, has been proposed. Although POET provides strong training stability, its original implementation incurs high memory consumption and computational overhead due to intensive matrix multiplications. To overcome these limitations, we introduce POET-X, a scalable and memory-efficient variant that performs orthogonal equivalence transformations with significantly reduced computational cost. POET-X maintains the generalization and stability benefits of POET while achieving substantial improvements in throughput and memory efficiency. In our experiments, POET-X enables the pretraining of billion-parameter LLMs on a single Nvidia H100 GPU, and in contrast, standard optimizers such as AdamW run out of memory under the same settings.


高效稳定的大型语言模型(LLM)训练仍然是现代机器学习系统的核心挑战。为了应对这一挑战,人们提出了重新参数化正交等价训练(POET),这是一种通过正交等价变换优化每个权重矩阵的频谱保留框架。尽管 POET 提供了强大的训练稳定性,但其原始实现由于密集的矩阵乘法而导致较高的内存消耗和计算开销。为了克服这些限制,我们引入了 POET-X,这是一种可扩展且内存高效的变体,可以执行正交等价变换,同时显着降低计算成本。 POET-X 保持了 POET 的通用性和稳定性优势,同时在吞吐量和内存效率方面实现了大幅改进。在我们的实验中,POET-X 可以在单个 Nvidia H100 GPU 上预训练十亿参数的 LLM,相比之下,AdamW 等标准优化器在相同设置下会耗尽内存。

The Spike, the Sparse and the Sink: Anatomy of Massive Activations/activations*/ and Attention/əˈtɛnʃən/ Sinks

Shangwen Sun, Alfredo Canziani, Yann LeCun 2026-03-05

We study two recurring phenomena in Transformer language models: massive activations, in which a small number of tokens exhibit extreme outliers in a few channels, and attention sinks, in which certain tokens attract disproportionate attention mass regardless of semantic relevance. Prior work observes that these phenomena frequently co-occur and often involve the same tokens, but their functional roles and causal relationship remain unclear. Through systematic experiments, we show that the co-occurrence is largely an architectural artifact of modern Transformer design, and that the two phenomena serve related but distinct functions. Massive activations operate globally: they induce near-constant hidden representations that persist across layers, effectively functioning as implicit parameters of the model. Attention sinks operate locally: they modulate attention outputs across heads and bias individual heads toward short-range dependencies. We identify the pre-norm configuration as the key choice that enables the co-occurrence, and show that ablating it causes the two phenomena to decouple.


我们研究 Transformer 语言模型中的两种反复出现的现象:大规模激活(其中少量标记在少数通道中表现出极端异常值)和注意力沉降(其中某些标记无论语义相关性如何都会吸引不成比例的注意力)。先前的研究发现,这些现象经常同时发生,并且经常涉及相同的标记,但它们的功能作用和因果关系仍不清楚。通过系统的实验,我们表明,共现很大程度上是现代 Transformer 设计的建筑产物,并且这两种现象具有相关但不同的功能。大规模激活在全局范围内运行:它们引起跨层持续存在的近乎恒定的隐藏表示,有效地充当模型的隐式参数。注意力接收器在本地运行:它们调节各个头部的注意力输出,并使各个头部偏向于短程依赖性。我们将预规范配置确定为实现共现的关键选择,并表明消除它会导致两种现象解耦。

GitHub Trending

近期 AI 热门项目

First Principles Thinking

第一性原理


将复杂问题分解为最基本的元素,然后从头开始重建解决方案。不依赖类比或既有经验,而是从根本真理出发进行推理。

实例:埃隆·马斯克在制造电池时,不接受'电池就是很贵'的假设,而是分析电池的原材料成本,发现可以大幅降低成本。

— 亚里士多德 / 伊隆·马斯克

Occam's Razor

奥卡姆剃刀


如无必要,勿增实体。在多个假设中,选择假设最少、最简洁的那个。复杂的解释往往隐藏着错误。

实例:当你听到马蹄声时,先想到马,而不是斑马。除非有明确证据表明是更罕见的情况。

— 威廉·奥卡姆 (14世纪)

Second-Order Thinking

二阶思维


不仅考虑行动的直接后果,还要思考这些后果带来的连锁反应。问自己:'然后呢?再然后呢?'

实例:降价促销会增加短期销量(一阶效应),但可能损害品牌形象并引发价格战(二阶效应)。

— 霍华德·马克斯

The only way to do great work is to love what you do.

做好工作的唯一方法就是热爱你所做的事情。

— Steve Jobs

In the middle of difficulty lies opportunity.

困难之中蕴藏着机遇。

— Albert Einstein

The best time to plant a tree was 20 years ago. The second best time is now.

种一棵树最好的时间是20年前。第二个最好的时间是现在。

— Chinese Proverb

不积跬步,无以至千里;不积小流,无以成江海。

— 荀子

博观而约取,厚积而薄发。

— 苏轼

Example

Stay hungry, stay foolish

保持饥饿,保持愚昧

The people who are crazy enough to think

那些疯狂到认为自己

they can change the world

能够改变世界的人

are the ones who do

往往正是那些真正改变世界的人

Here's to the crazy ones

致那些疯狂的人

The misfits, the rebels

那些格格不入的人,那些叛逆者

The troublemakers

那些惹是生非的人

The round pegs in the square holes

方孔中的圆钉

They're not fond of rules

他们不喜欢循规蹈矩

And they have no respect for the status quo

他们也不尊重现状

You can quote them, disagree with them

你可以引用他们,反对他们

Glorify or vilify them

颂扬或诋毁他们

But the only thing you can't do

但唯独不能忽视他们

is ignore them

Because they change things

因为他们改变了事物

They push the human race forward

他们推动了人类前进

algorithm

/ˈælɡəˌrɪðəm/

n. 算法;运算法则

The sorting algorithm runs in O(n log n) time complexity.

该排序算法的时间复杂度为 O(n log n)。

We need to optimize this algorithm for better performance.

我们需要优化这个算法以获得更好的性能。

recursion

/rɪˈkɜːrʒn/

n. 递归;循环

Recursion is a method where the solution depends on solutions to smaller instances.

递归是一种方法,其解决方案依赖于较小实例的解决方案。

Be careful with recursion to avoid stack overflow.

使用递归时要小心避免栈溢出。

encapsulation

/ɪnˌkæpsjuˈleɪʃn/

n. 封装;包装

Encapsulation hides the internal state of an object from the outside.

封装将对象的内部状态对外部隐藏。

Good encapsulation leads to more maintainable code.

良好的封装能带来更易维护的代码。

polymorphism

/ˌpɒliˈmɔːfɪzəm/

n. 多态性

Polymorphism allows objects of different classes to be treated as objects of a common superclass.

多态性允许不同类的对象被当作共同父类的对象来处理。

Method overriding is a common way to implement polymorphism.

方法重写是实现多态性的常见方式。

inheritance

/ɪnˈherɪtəns/

n. 继承;遗传

Inheritance enables new classes to receive the properties of existing classes.

继承使新类能够接收现有类的属性。

Multiple inheritance can lead to the diamond problem.

多重继承可能导致菱形继承问题。

abstraction

/æbˈstrækʃn/

n. 抽象;提取

Abstraction reduces complexity by hiding unnecessary details.

抽象通过隐藏不必要的细节来降低复杂性。

An abstract class cannot be instantiated directly.

抽象类不能被直接实例化。

concurrency

/kənˈkʌrənsi/

n. 并发;并发性

Concurrency allows multiple tasks to run in overlapping time periods.

并发允许多个任务在重叠的时间段内运行。

Handling concurrency correctly is crucial for multi-threaded applications.

正确处理并发对多线程应用程序至关重要。

serialization

/ˌsɪəriəlaɪˈzeɪʃn/

n. 序列化

Serialization converts an object into a stream of bytes for storage.

序列化将对象转换为字节流以便存储。

JSON is a popular format for data serialization.

JSON 是一种流行的数据序列化格式。

asynchronous

/eɪˈsɪŋkrənəs/

adj. 异步的

Asynchronous programming allows the program to continue executing while waiting for I/O.

异步编程允许程序在等待 I/O 时继续执行。

Use async/await syntax for cleaner asynchronous code.

使用 async/await 语法可以获得更简洁的异步代码。

deprecated

/ˈdeprəkeɪtɪd/

adj. 已弃用的;不推荐的

This method is deprecated and will be removed in the next version.

此方法已弃用,将在下一版本中移除。

Avoid using deprecated APIs in new projects.

避免在新项目中使用已弃用的 API。