Robustness/roʊˈbəstnəs/ Is a Function,/ˈfəŋkʃən/ Not a Number: A Factorized/factorized*/ Comprehensive/ˌkɑmpriˈhɛnsɪv/ Study of OOD Robustness/roʊˈbəstnəs/ in Vision-Based/visionbased*/ Driving
Out of distribution (OOD) robustness in autonomous driving is often reduced to a single number, hiding what breaks a policy. We decompose environments along five axes: scene (rural/urban), season, weather, time (day/night), and agent mix; and measure performance under controlled $k$-factor perturbations ($k \in \{0,1,2,3\}$). Using closed loop control in VISTA, we benchmark FC, CNN, and ViT policies, train compact ViT heads on frozen foundation-model (FM) features, and vary ID support in scale, diversity, and temporal context. (1) ViT policies are markedly more OOD-robust than comparably sized CNN/FC, and FM features yield state-of-the-art success at a latency cost. (2) Naive temporal inputs (multi-frame) do not beat the best single-frame baseline. (3) The largest single factor drops are rural $\rightarrow$ urban and day $\rightarrow$ night ($\sim 31\%$ each); actor swaps $\sim 10\%$, moderate rain $\sim 7\%$; season shifts can be drastic, and combining a time flip with other changes further degrades performance. (4) FM-feature policies stay above $85\%$ under three simultaneous changes; non-FM single-frame policies take a large first-shift hit, and all no-FM models fall below $50\%$ by three changes. (5) Interactions are non-additive: some pairings partially offset, whereas season-time combinations are especially harmful. (6) Training on winter/snow is most robust to single-factor shifts, while a rural+summer baseline gives the best overall OOD performance. (7) Scaling traces/views improves robustness ($+11.8$ points from $5$ to $14$ traces), yet targeted exposure to hard conditions can substitute for scale. (8) Using multiple ID environments broadens coverage and strengthens weak cases (urban OOD $60.6\% \rightarrow 70.1\%$) with a small ID drop; single-ID preserves peak performance but in a narrow domain. These results yield actionable design rules for OOD-robust driving policies.
自动驾驶中的分布外 (OOD) 鲁棒性通常会简化为一个数字,从而隐藏了违反策略的内容。我们沿着五个轴分解环境:场景(农村/城市)、季节、天气、时间(白天/夜晚)和代理组合;并测量受控 $k$ 因子扰动下的性能 ($k \in \{0,1,2,3\}$)。使用 VISTA 中的闭环控制,我们对 FC、CNN 和 ViT 策略进行基准测试,在冻结的基础模型 (FM) 特征上训练紧凑的 ViT 头,并在规模、多样性和时间上下文中改变 ID 支持。 (1) ViT 策略明显比同等规模的 CNN/FC 更具有 OOD 鲁棒性,并且 FM 功能以延迟成本取得了最先进的成功。 (2) 朴素时间输入(多帧)无法击败最佳单帧基线。 (3)单因素下降最大的是农村$\rightarrow$城市和白天$\rightarrow$夜间(各$\sim 31\%$);演员交换$\sim 10\%$,中雨$\sim 7\%$;季节变化可能会很剧烈,并且将时间翻转与其他变化结合起来会进一步降低性能。 (4) FM特色保单在三项同时变化下保持在$85\%$以上;非 FM 单帧策略受到较大的第一轮打击,所有非 FM 模型均通过三个变化跌至 50\%$ 以下。 (5) 相互作用是非累加性的:一些配对会部分抵消,而季节组合尤其有害。 (6) 冬季/雪地训练对于单因素变化最为稳健,而乡村+夏季基线则提供最佳的整体 OOD 性能。 (7) 缩放轨迹/视图可提高鲁棒性(从 5 美元到 14 美元轨迹,$+11.8 点),但有针对性地暴露在恶劣条件下可以替代缩放。 (8)使用多个ID环境扩大了覆盖范围并加强了弱案例(城市OOD $60.6\% \rightarrow 70.1\%$),ID下降幅度较小;单 ID 可以保持峰值性能,但范围很窄。这些结果为 OOD 稳健的驾驶策略提供了可行的设计规则。