Publications by Matt Fredrikson

Preprint

SecCodePRM: A Process Reward Model for Code Security

2026
Yu W, Mangal R, Luo Y, Hu K, He J, Pasareanu CS, Fredrikson M

Preprint

A Mixture of Linear Corrections Generates Secure Code

2025
Yu W, Mangal R, Zhuo T, Fredrikson M, Pasareanu CS

Preprint

Adversarial Attacks on Robotic Vision Language Action Models

2025
Jones EK, Robey A, Zou A, Ravichandran Z, Pappas GJ, Hassani H, Fredrikson M, Kolter JZ

Preprint

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

2025
Andriushchenko M, Souly A, Dziemian M, Duenas D, Lin M, Wang J, Hendrycks D, Zou A, Kolter Z, Fredrikson M, Winsor E, Wynne J, Gal Y, Davies X

Conference

AGENTHARM: A BENCHMARK FOR MEASURING HARMFULNESS OF LLM AGENTS

2025 • 13th International Conference on Learning Representations Iclr 2025 • 18136-18171
Andriushchenko M, Souly A, Dziemian M, Duenas D, Lin M, Wang J, Hendrycks D, Zou A, Kolter Z, Fredrikson M, Gal Y, Davies X

Conference

ALIGNED LLMS ARE NOT ALIGNED BROWSER AGENTS

2025 • 13th International Conference on Learning Representations Iclr 2025 • 62386-62407
Kumar P, Lau E, Vijayakumar S, Trinh T, Team SR, Chang E, Robinson V, Hendryx S, Zhou S, Fredrikson M, Yue S, Wang Z

Preprint

D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models

2025
Krishna S, Zou A, Gupta R, Jones EK, Winter N, Hendrycks D, Kolter JZ, Fredrikson M, Matsoukas S

Preprint

Evaluating Language Model Reasoning about Confidential Information

2025
Sam D, Robey A, Zou A, Fredrikson M, Kolter JZ

Conference

LLM Whisperer: An Inconspicuous Attack to Bias LLM Responses

2025 • PROCEEDINGS OF THE 2025 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2025
Lin W, Gerchanovsky A, Akgul O, Bauer L, Fredrikson M, Wang Z

Preprint

LLM Whisperer: An Inconspicuous Attack to Bias LLM Responses

2025
Lin W, Gerchanovsky A, Akgul O, Bauer L, Fredrikson M, Wang Z

Conference

Prototype-Integrated Representation Learning for Novelty Detection

2025 • 2025 IEEE 24TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM • 934-941
Vijayakumar S, Faloutsos C, Fredrikson M

Preprint

Representation Engineering: A Top-Down Approach to AI Transparency

2025
Zou A, Phan L, Chen S, Campbell J, Guo P, Ren R, Pan A, Yin X, Mazeika M, Dombrowski A-K, Goel S, Li N, Byun MJ, Wang Z, Mallen A, Basart S, Koyejo S, Song D, Fredrikson M, Kolter JZ, Hendrycks D

Preprint

Safety Pretraining: Toward the Next Generation of Safe AI

2025
Maini P, Goyal S, Sam D, Robey A, Savani Y, Jiang Y, Zou A, Fredrikson M, Lipton ZC, Kolter JZ

Preprint

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition

2025
Zou A, Lin M, Jones E, Nowak M, Dziemian M, Winter N, Grattan A, Nathanael V, Croft A, Davies X, Patel J, Kirk R, Burnikell N, Gal Y, Hendrycks D, Kolter JZ, Fredrikson M

Preprint

Transferable Adversarial Attacks on Black-Box Vision-Language Models

2025
Hu K, Yu W, Zhang L, Robey A, Zou A, Xu C, Hu H, Fredrikson M

Conference

A RECIPE FOR IMPROVED CERTIFIABLE ROBUSTNESS

2024 • 12th International Conference on Learning Representations Iclr 2024
Hu K, Leino K, Wang Z, Fredrikson M

Preprint

A Recipe for Improved Certifiable Robustness

2024
Hu K, Leino K, Wang Z, Fredrikson M

Conference

Attacks and Defenses for Large Language Models on Coding Tasks

2024 • Proceedings / IEEE International Conference, Automated Software Engineering ; sponsored by IEEE Computer Society, NASA Ames Research Center, in cooperation with AAAI, ACM SIGART and SIGSOFT. IEEE International Automated Software Enginee... • 2268-2272
Zhang C, Wang Z, Zhao R, Mangal R, Fredrikson M, Jia L, Pasareanu CS

Conference

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

2024 • Advances in Neural Information Processing Systems
Hu K, Yu W, Li Y, Yao T, Li X, Liu W, Yu L, Shen Z, Chen K, Fredrikson M

Preprint

Improving Alignment and Robustness with Circuit Breakers

2024
Zou A, Phan L, Wang J, Duenas D, Lin M, Andriushchenko M, Wang R, Kolter Z, Fredrikson M, Hendrycks D

Conference

Improving Alignment and Robustness with Circuit Breakers

2024 • Advances in Neural Information Processing Systems
Zou A, Phan L, Wang J, Duenas D, Lin M, Andriushchenko M, Wang R, Kolter Z, Fredrikson M, Hendrycks D

Preprint

Is Certifying $\ell_p$ Robustness Still Worthwhile?

2023
Mangal R, Leino K, Wang Z, Hu K, Yu W, Pasareanu C, Datta A, Fredrikson M

Conference

ON THE PERILS OF CASCADING ROBUST CLASSIFIERS

2023 • 11th International Conference on Learning Representations Iclr 2023
Mangal R, Wang Z, Zhang C, Leino K, Păsăreanu C, Fredrikson M

Preprint

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

2023
Zhang C, Wang Z, Mangal R, Fredrikson M, Jia L, Pasareanu C

Preprint

Universal and Transferable Adversarial Attacks on Aligned Language Models

2023
Zou A, Wang Z, Carlini N, Nasr M, Kolter JZ, Fredrikson M

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

Publications by Matt Fredrikson

Preprint

Preprint

Preprint

Preprint

Conference

Conference

Preprint

Preprint

Conference

Preprint

Conference

Preprint

Preprint

Preprint

Preprint

Conference

Preprint

Conference

Conference

Preprint

Conference

Preprint

Conference

Preprint

Preprint

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

What can we help you find?

Preprint

Preprint

Preprint

Preprint

Conference

Conference

Preprint

Preprint

Conference

Preprint

Conference

Preprint

Preprint

Preprint

Preprint

Conference

Preprint

Conference

Conference

Preprint

Conference

Preprint

Conference

Preprint

Preprint