Modeling/ˈmɑdəlɪŋ/ LLM Agent Reviewer/rivˈjuər/ Dynamics/daɪˈnæmɪks/ in Elo-Ranked/eloranked*/ Review System
In this work, we explore the Large Language Model (LLM) agent reviewer dynamics in an Elo-ranked review system using real-world conference paper submissions. Multiple LLM agent reviewers with different personas are engage in multi round review interactions moderated by an Area Chair. We compare a baseline setting with conditions that incorporate Elo ratings and reviewer memory. Our simulation results showcase several interesting findings, including how incorporating Elo improves Area Chair decision accuracy, as well as reviewers' adaptive review strategy that exploits our Elo system without improving review effort. Our code is available at https://github.com/hsiangwei0903/EloReview.
在这项工作中,我们使用真实世界的会议论文提交来探索 Elo 排名评审系统中的大型语言模型 (LLM) 代理审稿人动态。具有不同角色的多名法学硕士代理评审员参与由区域主席主持的多轮评审互动。我们将基线设置与包含 Elo 评级和审稿人记忆的条件进行比较。我们的模拟结果展示了一些有趣的发现,包括合并 Elo 如何提高区域主席决策的准确性,以及审稿人的自适应审稿策略,该策略利用我们的 Elo 系统而不提高审稿工作量。我们的代码可在 https://github.com/hsiangwei0903/EloReview 获取。