正在加载数据... (如果长时间停留在此,说明浏览器不支持当前脚本或 JS 报错)

Deep Tech

ArXiv 最新论文精选

MathNet: a Global Multimodal/multimodal*/ Benchmark/ˈbɛnʧˌmɑrk/ for Mathematical/ˌmæθəˈmætɪkəl/ Reasoning/ˈrizənɪŋ/ and Retrieval/rɪˈtrivəl/

Shaden Alshammari, Kevin Wen, Abrar Zainal 2026-04-20

Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems together with a benchmark for evaluating mathematical reasoning in generative models and mathematical retrieval in embedding-based systems. MathNet spans 47 countries, 17 languages, and two decades of competitions, comprising 30,676 expert-authored problems with solutions across diverse domains. In addition to the core dataset, we construct a retrieval benchmark consisting of mathematically equivalent and structurally similar problem pairs curated by human experts. MathNet supports three tasks: (i) Problem Solving, (ii) Math-Aware Retrieval, and (iii) Retrieval-Augmented Problem Solving. Experimental results show that even state-of-the-art reasoning models (78.4% for Gemini-3.1-Pro and 69.3% for GPT-5) remain challenged, while embedding models struggle to retrieve equivalent problems. We further show that retrieval-augmented generation performance is highly sensitive to retrieval quality; for example, DeepSeek-V3.2-Speciale achieves gains of up to 12%, obtaining the highest scores on the benchmark. MathNet provides the largest high-quality Olympiad dataset together with the first benchmark for evaluating mathematical problem retrieval, and we publicly release both the dataset and benchmark at https://mathnet.mit.edu.


数学问题解决仍然是对大型语言和多模态模型的推理的一项具有挑战性的测试,但现有基准在规模、语言覆盖范围和任务多样性方面受到限制。我们介绍了 MathNet,这是一个高质量、大规模、多模式和多语言的奥林匹克级数学问题数据集,以及用于评估生成模型中的数学推理和基于嵌入的系统中的数学检索的基准。 MathNet 跨越 47 个国家、17 种语言和二十年的竞赛,包含 30,676 个专家撰写的问题以及跨不同领域的解决方案。除了核心数据集之外,我们还构建了一个检索基准,该基准由人类专家策划的数学上等效且结构相似的问题对组成。 MathNet 支持三种任务:(i) 问题解决,(ii) 数学感知检索,以及 (iii) 检索增强问题解决。实验结果表明,即使是最先进的推理模型(Gemini-3.1-Pro 为 78.4%,GPT-5 为 69.3%)仍然面临挑战,而嵌入模型则难以检索同等问题。我们进一步表明,检索增强生成性能对检索质量高度敏感;例如,DeepSeek-V3.2-Speciale 的增益高达 12%,在基准测试中获得最高分。 MathNet 提供了最大的高质量奥林匹克数据集以及用于评估数学问题检索的第一个基准,我们在 https://mathnet.mit.edu 公开发布了数据集和基准。

Sessa: Selective/səˈlɛktɪv/ State Space Attention/əˈtɛnʃən/

Liubomyr Horbatko 2026-04-20

Modern sequence models are dominated by Transformers, where self-attention mixes information from the visible context in an input-dependent way. However, when retrieval is not sharp and attention remains diffuse over an effective support $S_{\mathrm{eff}}(t)$, the influence of any individual token is diluted, typically scaling as $O(1/S_{\mathrm{eff}}(t))$ and reaching $O(1/\ell)$ for old tokens in full-prefix settings. Structured state-space models process sequences recurrently through an explicit feedback path; selective variants such as Mamba make this feedback input-dependent, yet when freeze time cannot be sustained over long intervals, their long-range sensitivity decays exponentially with lag. Existing architectures therefore either retrieve from the past in a single read or propagate information through a single feedback chain. We introduce Sessa, a decoder that places attention inside a feedback path, enabling recurrent many-path aggregation within a layer. Under stated assumptions, Sessa admits regimes with a power-law memory tail in lag $\ell$ of order $O(\ell^{-β})$ for $0<β<1$, which is asymptotically slower than $1/\ell$; moreover, this rate is tight in an explicit diffuse uniform-routing setting where the influence is $Θ(\ell^{-β})$. Under the same conditions, only Sessa among the compared model classes realizes flexible selective retrieval, including non-decaying profiles. Empirically, under matched architectures and training budgets, Sessa achieves the strongest performance on our long-context benchmarks while remaining competitive with Transformer and Mamba style baselines on short-context language modeling.


现代序列模型以 Transformer 为主,其中自注意力以依赖于输入的方式混合来自可见上下文的信息。然而,当检索不敏锐并且注意力仍然分散在有效支持 $S_{\mathrm{eff}}(t)$ 上时,任何单个令牌的影响都会被稀释,通常缩放为 $O(1/S_{\mathrm{eff}}(t))$ 并在完整前缀设置中达到 $O(1/\ell)$ 旧令牌。结构化状态空间模型通过显式反馈路径循环处理序列;像 Mamba 这样的选择性变体使得这种反馈依赖于输入,但是当冻结时间不能长时间持续时,它们的远程灵敏度会随着滞后呈指数衰减。因此,现有架构要么在单次读取中检索过去,要么通过单个反馈链传播信息。我们引入了 Sessa,一种将注意力放在反馈路径内的解码器,从而实现层内的循环多路径聚合。在规定的假设下,Sessa 承认存在幂律记忆尾部,其滞后 $\ell$ 的阶数为 $O(\ell^{-β})$,$0<β<1$,渐近地慢于 $1/\ell$;此外,在影响为 $θ(\ell^{-β})$ 的显式扩散均匀路由设置中,该速率很紧。在相同条件下,比较模型类中只有Sessa实现了灵活的选择性检索,包括非衰减剖面。根据经验,在匹配的架构和训练预算下,Sessa 在我们的长上下文基准上实现了最强的性能,同时在短上下文语言建模上与 Transformer 和 Mamba 风格基准保持竞争力。

Bounded Ratio Reinforcement/ˌriɪnˈfɔrsmənt/ Learning/ˈlərnɪŋ/

Yunke Ao, Le Chen, Bruce D. Lee 2026-04-20

Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying foundations of trust region methods and the heuristic clipped objective used in PPO. In this paper, we bridge this gap by introducing the Bounded Ratio Reinforcement Learning (BRRL) framework. We formulate a novel regularized and constrained policy optimization problem and derive its analytical optimal solution. We prove that this solution ensures monotonic performance improvement. To handle parameterized policy classes, we develop a policy optimization algorithm called Bounded Policy Optimization (BPO) that minimizes an advantage-weighted divergence between the policy and the analytic optimal solution from BRRL. We further establish a lower bound on the expected performance of the resulting policy in terms of the BPO loss function. Notably, our framework also provides a new theoretical lens to interpret the success of the PPO loss, and connects trust region policy optimization and the Cross-Entropy Method (CEM). We additionally extend BPO to Group-relative BPO (GBPO) for LLM fine-tuning. Empirical evaluations of BPO across MuJoCo, Atari, and complex IsaacLab environments (e.g., Humanoid locomotion), and of GBPO for LLM fine-tuning tasks, demonstrate that BPO and GBPO generally match or outperform PPO and GRPO in stability and final performance.


近端策略优化(PPO)由于其跨领域的可扩展性和经验鲁棒性,已成为策略强化学习的主要算法。然而,信任域方法的底层基础与 PPO 中使用的启发式裁剪目标之间存在显着脱节。在本文中,我们通过引入有界比率强化学习(BRRL)框架来弥补这一差距。我们提出了一个新颖的正则化和约束策略优化问题,并推导出其解析最优解。我们证明该解决方案可确保单调性能提升。为了处理参数化策略类,我们开发了一种称为有界策略优化 (BPO) 的策略优化算法,该算法可以最大限度地减少策略与 BRRL 的分析最优解决方案之间的优势加权差异。我们进一步根据 BPO 损失函数确定最终政策的预期绩效下限。值得注意的是,我们的框架还提供了一个新的理论视角来解释 PPO 损失的成功,并将信任域策略优化和交叉熵方法(CEM)联系起来。我们还将 BPO 扩展到与集团相关的 BPO (GBPO),以进行 LLM 微调。对 MuJoCo、Atari 和复杂 IsaacLab 环境(例如人形运动)的 BPO 以及 LLM 微调任务的 GBPO 进行的实证评估表明,BPO 和 GBPO 在稳定性和最终性能方面通常与 PPO 和 GRPO 匹配或优于 PPO 和 GRPO。

GitHub Trending

近期 AI 热门项目

First Principles Thinking

第一性原理


将复杂问题分解为最基本的元素,然后从头开始重建解决方案。不依赖类比或既有经验,而是从根本真理出发进行推理。

实例:埃隆·马斯克在制造电池时,不接受'电池就是很贵'的假设,而是分析电池的原材料成本,发现可以大幅降低成本。

— 亚里士多德 / 伊隆·马斯克

Occam's Razor

奥卡姆剃刀


如无必要,勿增实体。在多个假设中,选择假设最少、最简洁的那个。复杂的解释往往隐藏着错误。

实例:当你听到马蹄声时,先想到马,而不是斑马。除非有明确证据表明是更罕见的情况。

— 威廉·奥卡姆 (14世纪)

Second-Order Thinking

二阶思维


不仅考虑行动的直接后果,还要思考这些后果带来的连锁反应。问自己:'然后呢?再然后呢?'

实例:降价促销会增加短期销量(一阶效应),但可能损害品牌形象并引发价格战(二阶效应)。

— 霍华德·马克斯

The only way to do great work is to love what you do.

做好工作的唯一方法就是热爱你所做的事情。

— Steve Jobs

In the middle of difficulty lies opportunity.

困难之中蕴藏着机遇。

— Albert Einstein

The best time to plant a tree was 20 years ago. The second best time is now.

种一棵树最好的时间是20年前。第二个最好的时间是现在。

— Chinese Proverb

不积跬步,无以至千里;不积小流,无以成江海。

— 荀子

博观而约取,厚积而薄发。

— 苏轼

Example

Stay hungry, stay foolish

保持饥饿,保持愚昧

The people who are crazy enough to think

那些疯狂到认为自己

they can change the world

能够改变世界的人

are the ones who do

往往正是那些真正改变世界的人

Here's to the crazy ones

致那些疯狂的人

The misfits, the rebels

那些格格不入的人,那些叛逆者

The troublemakers

那些惹是生非的人

The round pegs in the square holes

方孔中的圆钉

They're not fond of rules

他们不喜欢循规蹈矩

And they have no respect for the status quo

他们也不尊重现状

You can quote them, disagree with them

你可以引用他们,反对他们

Glorify or vilify them

颂扬或诋毁他们

But the only thing you can't do

但唯独不能忽视他们

is ignore them

Because they change things

因为他们改变了事物

They push the human race forward

他们推动了人类前进

algorithm

/ˈælɡəˌrɪðəm/

n. 算法;运算法则

The sorting algorithm runs in O(n log n) time complexity.

该排序算法的时间复杂度为 O(n log n)。

We need to optimize this algorithm for better performance.

我们需要优化这个算法以获得更好的性能。

recursion

/rɪˈkɜːrʒn/

n. 递归;循环

Recursion is a method where the solution depends on solutions to smaller instances.

递归是一种方法,其解决方案依赖于较小实例的解决方案。

Be careful with recursion to avoid stack overflow.

使用递归时要小心避免栈溢出。

encapsulation

/ɪnˌkæpsjuˈleɪʃn/

n. 封装;包装

Encapsulation hides the internal state of an object from the outside.

封装将对象的内部状态对外部隐藏。

Good encapsulation leads to more maintainable code.

良好的封装能带来更易维护的代码。

polymorphism

/ˌpɒliˈmɔːfɪzəm/

n. 多态性

Polymorphism allows objects of different classes to be treated as objects of a common superclass.

多态性允许不同类的对象被当作共同父类的对象来处理。

Method overriding is a common way to implement polymorphism.

方法重写是实现多态性的常见方式。

inheritance

/ɪnˈherɪtəns/

n. 继承;遗传

Inheritance enables new classes to receive the properties of existing classes.

继承使新类能够接收现有类的属性。

Multiple inheritance can lead to the diamond problem.

多重继承可能导致菱形继承问题。

abstraction

/æbˈstrækʃn/

n. 抽象;提取

Abstraction reduces complexity by hiding unnecessary details.

抽象通过隐藏不必要的细节来降低复杂性。

An abstract class cannot be instantiated directly.

抽象类不能被直接实例化。

concurrency

/kənˈkʌrənsi/

n. 并发;并发性

Concurrency allows multiple tasks to run in overlapping time periods.

并发允许多个任务在重叠的时间段内运行。

Handling concurrency correctly is crucial for multi-threaded applications.

正确处理并发对多线程应用程序至关重要。

serialization

/ˌsɪəriəlaɪˈzeɪʃn/

n. 序列化

Serialization converts an object into a stream of bytes for storage.

序列化将对象转换为字节流以便存储。

JSON is a popular format for data serialization.

JSON 是一种流行的数据序列化格式。

asynchronous

/eɪˈsɪŋkrənəs/

adj. 异步的

Asynchronous programming allows the program to continue executing while waiting for I/O.

异步编程允许程序在等待 I/O 时继续执行。

Use async/await syntax for cleaner asynchronous code.

使用 async/await 语法可以获得更简洁的异步代码。

deprecated

/ˈdeprəkeɪtɪd/

adj. 已弃用的;不推荐的

This method is deprecated and will be removed in the next version.

此方法已弃用,将在下一版本中移除。

Avoid using deprecated APIs in new projects.

避免在新项目中使用已弃用的 API。