正在加载数据... (如果长时间停留在此,说明浏览器不支持当前脚本或 JS 报错)

Deep Tech

ArXiv 最新论文精选

Robustness/roʊˈbəstnəs/ Is a Function,/ˈfəŋkʃən/ Not a Number: A Factorized/factorized*/ Comprehensive/ˌkɑmpriˈhɛnsɪv/ Study of OOD Robustness/roʊˈbəstnəs/ in Vision-Based/visionbased*/ Driving

Amir Mallak, Alaa Maalouf 2026-02-09

Out of distribution (OOD) robustness in autonomous driving is often reduced to a single number, hiding what breaks a policy. We decompose environments along five axes: scene (rural/urban), season, weather, time (day/night), and agent mix; and measure performance under controlled $k$-factor perturbations ($k \in \{0,1,2,3\}$). Using closed loop control in VISTA, we benchmark FC, CNN, and ViT policies, train compact ViT heads on frozen foundation-model (FM) features, and vary ID support in scale, diversity, and temporal context. (1) ViT policies are markedly more OOD-robust than comparably sized CNN/FC, and FM features yield state-of-the-art success at a latency cost. (2) Naive temporal inputs (multi-frame) do not beat the best single-frame baseline. (3) The largest single factor drops are rural $\rightarrow$ urban and day $\rightarrow$ night ($\sim 31\%$ each); actor swaps $\sim 10\%$, moderate rain $\sim 7\%$; season shifts can be drastic, and combining a time flip with other changes further degrades performance. (4) FM-feature policies stay above $85\%$ under three simultaneous changes; non-FM single-frame policies take a large first-shift hit, and all no-FM models fall below $50\%$ by three changes. (5) Interactions are non-additive: some pairings partially offset, whereas season-time combinations are especially harmful. (6) Training on winter/snow is most robust to single-factor shifts, while a rural+summer baseline gives the best overall OOD performance. (7) Scaling traces/views improves robustness ($+11.8$ points from $5$ to $14$ traces), yet targeted exposure to hard conditions can substitute for scale. (8) Using multiple ID environments broadens coverage and strengthens weak cases (urban OOD $60.6\% \rightarrow 70.1\%$) with a small ID drop; single-ID preserves peak performance but in a narrow domain. These results yield actionable design rules for OOD-robust driving policies.


自动驾驶中的分布外 (OOD) 鲁棒性通常会简化为一个数字,从而隐藏了违反策略的内容。我们沿着五个轴分解环境:场景(农村/城市)、季节、天气、时间(白天/夜晚)和代理组合;并测量受控 $k$ 因子扰动下的性能 ($k \in \{0,1,2,3\}$)。使用 VISTA 中的闭环控制,我们对 FC、CNN 和 ViT 策略进行基准测试,在冻结的基础模型 (FM) 特征上训练紧凑的 ViT 头,并在规模、多样性和时间上下文中改变 ID 支持。 (1) ViT 策略明显比同等规模的 CNN/FC 更具有 OOD 鲁棒性,并且 FM 功能以延迟成本取得了最先进的成功。 (2) 朴素时间输入(多帧)无法击败最佳单帧基线。 (3)单因素下降最大的是农村$\rightarrow$城市和白天$\rightarrow$夜间(各$\sim 31\%$);演员交换$\sim 10\%$,中雨$\sim 7\%$;季节变化可能会很剧烈,并且将时间翻转与其他变化结合起来会进一步降低性能。 (4) FM特色保单在三项同时变化下保持在$85\%$以上;非 FM 单帧策略受到较大的第一轮打击,所有非 FM 模型均通过三个变化跌至 50\%$ 以下。 (5) 相互作用是非累加性的:一些配对会部分抵消,而季节组合尤其有害。 (6) 冬季/雪地训练对于单因素变化最为稳健,而乡村+夏季基线则提供最佳的整体 OOD 性能。 (7) 缩放轨迹/视图可提高鲁棒性(从 5 美元到 14 美元轨迹,$+11.8 点),但有针对性地暴露在恶劣条件下可以替代缩放。 (8)使用多个ID环境扩大了覆盖范围并加强了弱案例(城市OOD $60.6\% \rightarrow 70.1\%$),ID下降幅度较小;单 ID 可以保持峰值性能,但范围很窄。这些结果为 OOD 稳健的驾驶策略提供了可行的设计规则。

CIC-Trap4Phish:/cictrapphish*/ A Unified Multi-Format/multiformat*/ Dataset for Phishing/ˈfɪʃɪŋ/ and Quishing/quishing*/ Attachment/əˈtæʧmənt/ Detection/dɪˈtɛkʃən/

Fatemeh Nejati, Mahdi Rabbani, Mansur Mirani 2026-02-09

Phishing attacks represents one of the primary attack methods which is used by cyber attackers. In many cases, attackers use deceptive emails along with malicious attachments to trick users into giving away sensitive information or installing malware while compromising entire systems. The flexibility of malicious email attachments makes them stand out as a preferred vector for attackers as they can embed harmful content such as malware or malicious URLs inside standard document formats. Although phishing email defenses have improved a lot, attackers continue to abuse attachments, enabling malicious content to bypass security measures. Moreover, another challenge that researches face in training advance models, is lack of an unified and comprehensive dataset that covers the most prevalent data types. To address this gap, we generated CIC-Trap4Phish, a multi-format dataset containing both malicious and benign samples across five categories commonly used in phishing campaigns: Microsoft Word documents, Excel spreadsheets, PDF files, HTML pages, and QR code images. For the first four file types, a set of execution-free static feature pipeline was proposed, designed to capture structural, lexical, and metadata-based indicators without the need to open or execute files. Feature selection was performed using a combination of SHAP analysis and feature importance, yielding compact, discriminative feature subsets for each file type. The selected features were evaluated by using lightweight machine learning models, including Random Forest, XGBoost, and Decision Tree. All models demonstrate high detection accuracy across formats. For QR code-based phishing (quishing), two complementary methods were implemented: image-based detection by employing Convolutional Neural Networks (CNNs) and lexical analysis of decoded URLs using recent lightweight language models.


网络钓鱼攻击是网络攻击者使用的主要攻击方法之一。在许多情况下,攻击者使用欺骗性电子邮件和恶意附件来诱骗用户泄露敏感信息或安装恶意软件,同时危及整个系统。恶意电子邮件附件的灵活性使其成为攻击者的首选载体,因为它们可以在标准文档格式中嵌入有害内容,例如恶意软件或恶意 URL。尽管网络钓鱼电子邮件防御已经有了很大改进,但攻击者仍在继续滥用附件,使恶意内容能够绕过安全措施。此外,研究人员在训练高级模型时面临的另一个挑战是缺乏涵盖最流行数据类型的统一且全面的数据集。为了弥补这一差距,我们生成了 CIC-Trap4Phish,这是一个多格式数据集,其中包含网络钓鱼活动中常用的五个类别的恶意和良性样本:Microsoft Word 文档、Excel 电子表格、PDF 文件、HTML 页面和 QR 代码图像。对于前四种文件类型,提出了一组免执行静态特征管道,旨在捕获结构、词汇和基于元数据的指标,而无需打开或执行文件。结合 SHAP 分析和特征重要性来执行特征选择,为每种文件类型生成紧凑的、有区别的特征子集。使用轻量级机器学习模型(包括随机森林、XGBoost 和决策树)对所选特征进行评估。所有模型都表现出跨格式的高检测精度。对于基于 QR 码的网络钓鱼(quishing),实施了两种互补的方法:使用卷积神经网络 (CNN) 进行基于图像的检测,以及使用最新的轻量级语言模型对解码的 URL 进行词法分析。

ArcFlow: Unleashing/ənˈliʃɪŋ/ 2-Step Text-to-Image/texttoimage*/ Generation/ˌʤɛnərˈeɪʃən/ via High-Precision/highprecision*/ Non-Linear/nɑnˈlɪˌniər/ Flow Distillation/ˌdɪstəˈleɪʃən/

Zihan Yang, Shuyuan Tu, Licheng Zhang 2026-02-09

Diffusion models have achieved remarkable generation quality, but they suffer from significant inference cost due to their reliance on multiple sequential denoising steps, motivating recent efforts to distill this inference process into a few-step regime. However, existing distillation methods typically approximate the teacher trajectory by using linear shortcuts, which makes it difficult to match its constantly changing tangent directions as velocities evolve across timesteps, thereby leading to quality degradation. To address this limitation, we propose ArcFlow, a few-step distillation framework that explicitly employs non-linear flow trajectories to approximate pre-trained teacher trajectories. Concretely, ArcFlow parameterizes the velocity field underlying the inference trajectory as a mixture of continuous momentum processes. This enables ArcFlow to capture velocity evolution and extrapolate coherent velocities to form a continuous non-linear trajectory within each denoising step. Importantly, this parameterization admits an analytical integration of this non-linear trajectory, which circumvents numerical discretization errors and results in high-precision approximation of the teacher trajectory. To train this parameterization into a few-step generator, we implement ArcFlow via trajectory distillation on pre-trained teacher models using lightweight adapters. This strategy ensures fast, stable convergence while preserving generative diversity and quality. Built on large-scale models (Qwen-Image-20B and FLUX.1-dev), ArcFlow only fine-tunes on less than 5% of original parameters and achieves a 40x speedup with 2 NFEs over the original multi-step teachers without significant quality degradation. Experiments on benchmarks show the effectiveness of ArcFlow both qualitatively and quantitatively.


扩散模型已经实现了卓越的生成质量,但由于依赖多个连续的去噪步骤,它们的推理成本很高,这促使人们最近努力将这种推理过程提炼为几个步骤。然而,现有的蒸馏方法通常使用线性捷径来近似教师轨迹,这使得随着速度跨时间步长的变化而难以匹配其不断变化的切线方向,从而导致质量下降。为了解决这个限制,我们提出了 ArcFlow,这是一种分步蒸馏框架,它明确地采用非线性流动轨迹来近似预先训练的教师轨迹。具体来说,ArcFlow 将推理轨迹下的速度场参数化为连续动量过程的混合。这使得 ArcFlow 能够捕获速度演化并推断相干速度,以在每个去噪步骤内形成连续的非线性轨迹。重要的是,这种参数化允许对非线性轨迹进行分析积分,从而避免数值离散误差并导致教师轨迹的高精度近似。为了将这种参数化训练成几步生成器,我们使用轻量级适配器在预先训练的教师模型上通过轨迹蒸馏来实现 ArcFlow。该策略确保快速、稳定的收敛,同时保持生成多样性和质量。 ArcFlow 基于大型模型(Qwen-Image-20B 和 FLUX.1-dev)构建,仅对不到 5% 的原始参数进行微调,与原始多步教师相比,通过 2 个 NFE 实现了 40 倍的加速,而质量没有显着下降。基准实验从定性和定量两个方面证明了 ArcFlow 的有效性。

GitHub Trending

近期 AI 热门项目

First Principles Thinking

第一性原理


将复杂问题分解为最基本的元素,然后从头开始重建解决方案。不依赖类比或既有经验,而是从根本真理出发进行推理。

实例:埃隆·马斯克在制造电池时,不接受'电池就是很贵'的假设,而是分析电池的原材料成本,发现可以大幅降低成本。

— 亚里士多德 / 伊隆·马斯克

Occam's Razor

奥卡姆剃刀


如无必要,勿增实体。在多个假设中,选择假设最少、最简洁的那个。复杂的解释往往隐藏着错误。

实例:当你听到马蹄声时,先想到马,而不是斑马。除非有明确证据表明是更罕见的情况。

— 威廉·奥卡姆 (14世纪)

Second-Order Thinking

二阶思维


不仅考虑行动的直接后果,还要思考这些后果带来的连锁反应。问自己:'然后呢?再然后呢?'

实例:降价促销会增加短期销量(一阶效应),但可能损害品牌形象并引发价格战(二阶效应)。

— 霍华德·马克斯

The only way to do great work is to love what you do.

做好工作的唯一方法就是热爱你所做的事情。

— Steve Jobs

In the middle of difficulty lies opportunity.

困难之中蕴藏着机遇。

— Albert Einstein

The best time to plant a tree was 20 years ago. The second best time is now.

种一棵树最好的时间是20年前。第二个最好的时间是现在。

— Chinese Proverb

不积跬步,无以至千里;不积小流,无以成江海。

— 荀子

博观而约取,厚积而薄发。

— 苏轼

Example

Stay hungry, stay foolish

保持饥饿,保持愚昧

The people who are crazy enough to think

那些疯狂到认为自己

they can change the world

能够改变世界的人

are the ones who do

往往正是那些真正改变世界的人

Here's to the crazy ones

致那些疯狂的人

The misfits, the rebels

那些格格不入的人,那些叛逆者

The troublemakers

那些惹是生非的人

The round pegs in the square holes

方孔中的圆钉

They're not fond of rules

他们不喜欢循规蹈矩

And they have no respect for the status quo

他们也不尊重现状

You can quote them, disagree with them

你可以引用他们,反对他们

Glorify or vilify them

颂扬或诋毁他们

But the only thing you can't do

但唯独不能忽视他们

is ignore them

Because they change things

因为他们改变了事物

They push the human race forward

他们推动了人类前进

algorithm

/ˈælɡəˌrɪðəm/

n. 算法;运算法则

The sorting algorithm runs in O(n log n) time complexity.

该排序算法的时间复杂度为 O(n log n)。

We need to optimize this algorithm for better performance.

我们需要优化这个算法以获得更好的性能。

recursion

/rɪˈkɜːrʒn/

n. 递归;循环

Recursion is a method where the solution depends on solutions to smaller instances.

递归是一种方法,其解决方案依赖于较小实例的解决方案。

Be careful with recursion to avoid stack overflow.

使用递归时要小心避免栈溢出。

encapsulation

/ɪnˌkæpsjuˈleɪʃn/

n. 封装;包装

Encapsulation hides the internal state of an object from the outside.

封装将对象的内部状态对外部隐藏。

Good encapsulation leads to more maintainable code.

良好的封装能带来更易维护的代码。

polymorphism

/ˌpɒliˈmɔːfɪzəm/

n. 多态性

Polymorphism allows objects of different classes to be treated as objects of a common superclass.

多态性允许不同类的对象被当作共同父类的对象来处理。

Method overriding is a common way to implement polymorphism.

方法重写是实现多态性的常见方式。

inheritance

/ɪnˈherɪtəns/

n. 继承;遗传

Inheritance enables new classes to receive the properties of existing classes.

继承使新类能够接收现有类的属性。

Multiple inheritance can lead to the diamond problem.

多重继承可能导致菱形继承问题。

abstraction

/æbˈstrækʃn/

n. 抽象;提取

Abstraction reduces complexity by hiding unnecessary details.

抽象通过隐藏不必要的细节来降低复杂性。

An abstract class cannot be instantiated directly.

抽象类不能被直接实例化。

concurrency

/kənˈkʌrənsi/

n. 并发;并发性

Concurrency allows multiple tasks to run in overlapping time periods.

并发允许多个任务在重叠的时间段内运行。

Handling concurrency correctly is crucial for multi-threaded applications.

正确处理并发对多线程应用程序至关重要。

serialization

/ˌsɪəriəlaɪˈzeɪʃn/

n. 序列化

Serialization converts an object into a stream of bytes for storage.

序列化将对象转换为字节流以便存储。

JSON is a popular format for data serialization.

JSON 是一种流行的数据序列化格式。

asynchronous

/eɪˈsɪŋkrənəs/

adj. 异步的

Asynchronous programming allows the program to continue executing while waiting for I/O.

异步编程允许程序在等待 I/O 时继续执行。

Use async/await syntax for cleaner asynchronous code.

使用 async/await 语法可以获得更简洁的异步代码。

deprecated

/ˈdeprəkeɪtɪd/

adj. 已弃用的;不推荐的

This method is deprecated and will be removed in the next version.

此方法已弃用,将在下一版本中移除。

Avoid using deprecated APIs in new projects.

避免在新项目中使用已弃用的 API。