正在加载数据... (如果长时间停留在此,说明浏览器不支持当前脚本或 JS 报错)

Deep Tech

ArXiv 最新论文精选

Modeling/ˈmɑdəlɪŋ/ LLM Agent Reviewer/rivˈjuər/ Dynamics/daɪˈnæmɪks/ in Elo-Ranked/eloranked*/ Review System

Hsiang-Wei Huang, Junbin Lu, Kuang-Ming Chen 2026-01-13

In this work, we explore the Large Language Model (LLM) agent reviewer dynamics in an Elo-ranked review system using real-world conference paper submissions. Multiple LLM agent reviewers with different personas are engage in multi round review interactions moderated by an Area Chair. We compare a baseline setting with conditions that incorporate Elo ratings and reviewer memory. Our simulation results showcase several interesting findings, including how incorporating Elo improves Area Chair decision accuracy, as well as reviewers' adaptive review strategy that exploits our Elo system without improving review effort. Our code is available at https://github.com/hsiangwei0903/EloReview.


在这项工作中,我们使用真实世界的会议论文提交来探索 Elo 排名评审系统中的大型语言模型 (LLM) 代理审稿人动态。具有不同角色的多名法学硕士代理评审员参与由区域主席主持的多轮评审互动。我们将基线设置与包含 Elo 评级和审稿人记忆的条件进行比较。我们的模拟结果展示了一些有趣的发现,包括合并 Elo 如何提高区域主席决策的准确性,以及审稿人的自适应审稿策略,该策略利用我们的 Elo 系统而不提高审稿工作量。我们的代码可在 https://github.com/hsiangwei0903/EloReview 获取。

Motion Attribution/ˌætrɪˈbjuʃən/ for Video Generation/ˌʤɛnərˈeɪʃən/

Xindi Wu, Despoina Paschalidou, Jun Gao 2026-01-13

Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood. We present Motive (MOTIon attribution for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models. We use this to study which fine-tuning clips improve or degrade temporal dynamics. Motive isolates temporal dynamics from static appearance via motion-weighted loss masks, yielding efficient and scalable motion-specific influence computation. On text-to-video models, Motive identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility. With Motive-selected high-influence data, our method improves both motion smoothness and dynamic degree on VBench, achieving a 74.1% human preference win rate compared with the pretrained base model. To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models and to use it to curate fine-tuning data.


尽管视频生成模型取得了快速进展,但人们对数据在影响运动中的作用却知之甚少。我们提出了 Motive(视频生成的 MOTIon 归因),这是一种以运动为中心、基于梯度的数据归因框架,可扩展到现代、大型、高质量的视频数据集和模型。我们用它来研究哪些微调剪辑可以改善或降低时间动态。 Motive 通过运动加权损失掩模将时间动态与静态外观隔离,从而产生高效且可扩展的运动特定影响计算。在文本到视频模型上,Motive 可以识别强烈影响运动的剪辑,并指导数据管理,从而提高时间一致性和物理合理性。利用Motive选择的高影响力数据,我们的方法提高了VBench上的运动平滑度和动态程度,与预训练的基础模型相比,实现了74.1%的人类偏好获胜率。据我们所知,这是第一个在视频生成模型中归因于运动而不是视觉外观并使用它来管理微调数据的框架。

MemRec: Collaborative/kəˈlæbərˌeɪtɪv/ Memory-Augmented/memoryaugmented*/ Agentic Recommender/recommender*/ System

Weixin Chen, Yuhan Zhao, Jingyuan Huang 2026-01-13

The evolution of recommender systems has shifted preference storage from rating matrices and dense embeddings to semantic memory in the agentic era. Yet existing agents rely on isolated memory, overlooking crucial collaborative signals. Bridging this gap is hindered by the dual challenges of distilling vast graph contexts without overwhelming reasoning agents with cognitive load, and evolving the collaborative memory efficiently without incurring prohibitive computational costs. To address this, we propose MemRec, a framework that architecturally decouples reasoning from memory management to enable efficient collaborative augmentation. MemRec introduces a dedicated, cost-effective LM_Mem to manage a dynamic collaborative memory graph, serving synthesized, high-signal context to a downstream LLM_Rec. The framework operates via a practical pipeline featuring efficient retrieval and cost-effective asynchronous graph propagation that evolves memory in the background. Extensive experiments on four benchmarks demonstrate that MemRec achieves state-of-the-art performance. Furthermore, architectural analysis confirms its flexibility, establishing a new Pareto frontier that balances reasoning quality, cost, and privacy through support for diverse deployments, including local open-source models. Code:https://github.com/rutgerswiselab/memrec and Homepage: https://memrec.weixinchen.com


推荐系统的发展已经将偏好存储从评级矩阵和密集嵌入转变为代理时代的语义记忆。然而现有的智能体依赖于孤立的记忆,忽视了关键的协作信号。弥合这一差距受到双重挑战的阻碍:提取大量图形上下文而不用认知负荷压倒推理代理,以及有效地发展协作记忆而不产生过高的计算成本。为了解决这个问题,我们提出了 MemRec,这是一个在架构上将推理与内存管理解耦的框架,以实现高效的协作增强。 MemRec 引入了专用的、经济高效的 LM_Mem 来管理动态协作内存图,为下游 LLM_Rec 提供合成的高信号上下文。该框架通过一个实用的管道进行操作,该管道具有高效的检索和经济高效的异步图形传播,可在后台演化内存。对四个基准的大量实验表明 MemRec 实现了最先进的性能。此外,架构分析证实了其灵活性,建立了一个新的帕累托前沿,通过支持包括本地开源模型在内的多种部署来平衡推理质量、成本和隐私。代码:https://github.com/rutgerswiselab/memrec 和主页:https://memrec.weixinchen.com

GitHub Trending

近期 AI 热门项目

First Principles Thinking

第一性原理


将复杂问题分解为最基本的元素,然后从头开始重建解决方案。不依赖类比或既有经验,而是从根本真理出发进行推理。

实例:埃隆·马斯克在制造电池时,不接受'电池就是很贵'的假设,而是分析电池的原材料成本,发现可以大幅降低成本。

— 亚里士多德 / 伊隆·马斯克

Occam's Razor

奥卡姆剃刀


如无必要,勿增实体。在多个假设中,选择假设最少、最简洁的那个。复杂的解释往往隐藏着错误。

实例:当你听到马蹄声时,先想到马,而不是斑马。除非有明确证据表明是更罕见的情况。

— 威廉·奥卡姆 (14世纪)

Second-Order Thinking

二阶思维


不仅考虑行动的直接后果,还要思考这些后果带来的连锁反应。问自己:'然后呢?再然后呢?'

实例:降价促销会增加短期销量(一阶效应),但可能损害品牌形象并引发价格战(二阶效应)。

— 霍华德·马克斯

The only way to do great work is to love what you do.

做好工作的唯一方法就是热爱你所做的事情。

— Steve Jobs

In the middle of difficulty lies opportunity.

困难之中蕴藏着机遇。

— Albert Einstein

The best time to plant a tree was 20 years ago. The second best time is now.

种一棵树最好的时间是20年前。第二个最好的时间是现在。

— Chinese Proverb

不积跬步,无以至千里;不积小流,无以成江海。

— 荀子

博观而约取,厚积而薄发。

— 苏轼

Example

Stay hungry, stay foolish

保持饥饿,保持愚昧

The people who are crazy enough to think

那些疯狂到认为自己

they can change the world

能够改变世界的人

are the ones who do

往往正是那些真正改变世界的人

Here's to the crazy ones

致那些疯狂的人

The misfits, the rebels

那些格格不入的人,那些叛逆者

The troublemakers

那些惹是生非的人

The round pegs in the square holes

方孔中的圆钉

They're not fond of rules

他们不喜欢循规蹈矩

And they have no respect for the status quo

他们也不尊重现状

You can quote them, disagree with them

你可以引用他们,反对他们

Glorify or vilify them

颂扬或诋毁他们

But the only thing you can't do

但唯独不能忽视他们

is ignore them

Because they change things

因为他们改变了事物

They push the human race forward

他们推动了人类前进

algorithm

/ˈælɡəˌrɪðəm/

n. 算法;运算法则

The sorting algorithm runs in O(n log n) time complexity.

该排序算法的时间复杂度为 O(n log n)。

We need to optimize this algorithm for better performance.

我们需要优化这个算法以获得更好的性能。

recursion

/rɪˈkɜːrʒn/

n. 递归;循环

Recursion is a method where the solution depends on solutions to smaller instances.

递归是一种方法,其解决方案依赖于较小实例的解决方案。

Be careful with recursion to avoid stack overflow.

使用递归时要小心避免栈溢出。

encapsulation

/ɪnˌkæpsjuˈleɪʃn/

n. 封装;包装

Encapsulation hides the internal state of an object from the outside.

封装将对象的内部状态对外部隐藏。

Good encapsulation leads to more maintainable code.

良好的封装能带来更易维护的代码。

polymorphism

/ˌpɒliˈmɔːfɪzəm/

n. 多态性

Polymorphism allows objects of different classes to be treated as objects of a common superclass.

多态性允许不同类的对象被当作共同父类的对象来处理。

Method overriding is a common way to implement polymorphism.

方法重写是实现多态性的常见方式。

inheritance

/ɪnˈherɪtəns/

n. 继承;遗传

Inheritance enables new classes to receive the properties of existing classes.

继承使新类能够接收现有类的属性。

Multiple inheritance can lead to the diamond problem.

多重继承可能导致菱形继承问题。

abstraction

/æbˈstrækʃn/

n. 抽象;提取

Abstraction reduces complexity by hiding unnecessary details.

抽象通过隐藏不必要的细节来降低复杂性。

An abstract class cannot be instantiated directly.

抽象类不能被直接实例化。

concurrency

/kənˈkʌrənsi/

n. 并发;并发性

Concurrency allows multiple tasks to run in overlapping time periods.

并发允许多个任务在重叠的时间段内运行。

Handling concurrency correctly is crucial for multi-threaded applications.

正确处理并发对多线程应用程序至关重要。

serialization

/ˌsɪəriəlaɪˈzeɪʃn/

n. 序列化

Serialization converts an object into a stream of bytes for storage.

序列化将对象转换为字节流以便存储。

JSON is a popular format for data serialization.

JSON 是一种流行的数据序列化格式。

asynchronous

/eɪˈsɪŋkrənəs/

adj. 异步的

Asynchronous programming allows the program to continue executing while waiting for I/O.

异步编程允许程序在等待 I/O 时继续执行。

Use async/await syntax for cleaner asynchronous code.

使用 async/await 语法可以获得更简洁的异步代码。

deprecated

/ˈdeprəkeɪtɪd/

adj. 已弃用的;不推荐的

This method is deprecated and will be removed in the next version.

此方法已弃用,将在下一版本中移除。

Avoid using deprecated APIs in new projects.

避免在新项目中使用已弃用的 API。