Doctoral Thesis Oral - Jiaming (Andy) Zou

— 1:30pm

Location:
In Person and Virtual - ET - Reddy Conference Room, Gates Hillman 4405 and Zoom

Speaker:
JIAMING (ANDY) ZOU, Ph.D. CandidateComputer Science DepartmentCarnegie Mellon University
https://andyzoujm.github.io/

Improving Safety and Security of Generative Models

Recent advances in large language and multimodal models have enabled powerful new applications, but they also raise critical challenges in safety, robustness, and alignment. This thesis studies these challenges through three complementary research directions. First, we show that current alignment methods remain brittle by developing adversarial attacks that reliably bypass safeguards across text, multimodal, and embodied systems, demonstrating that alignment alone does not guarantee robustness. Second, we introduce evaluation frameworks and benchmarks that systematically measure safety failures in modern AI systems, revealing widespread vulnerabilities in deployed models and agents. Third, we propose methods to improve alignment and control, including representation-level interventions, circuit breakers, and safety pretraining, which significantly reduce attack success while preserving model capability. Together, these contributions advance our understanding of AI safety risks and provide practical tools for building safer and more trustworthy AI systems.

Thesis Committee
Zico Kolter (Co-chair)
Matt Fredrikson (Co-chair)
Graham Neubig
Nicholas Carlini (Anthropic)

In Person and Zoom Participation.  See announcement. 

For More Information:
matthewstewart@cmu.edu


Add event to Google
Add event to iCal