Computer Science Thesis Proposal

In Person and Virtual - ET - Traffic21 Classroom, Gates Hillman 6501 and Zoom

LESLIE ALISON RICE , Ph.D. Student, Computer Science Department, Carnegie Mellon University

Methods for robust training and evaluation of deep neural networks

As machine learning systems are deployed in real-world, potentially safety-critical, applications, researchers are increasingly interested in the robustness of these systems to perturbed inputs. A large body of work has focused on robustness to the worst-case perturbation in some set, termed adversarial robustness. Deep neural networks have been shown to lack adversarial robustness to small, human-imperceptible input perturbations, prompting studies of how to improve the adversarial robustness of these networks. Adversarial training, which involves training on worst-case perturbed inputs, has proven to be the dominant method for training empirically robust networks.

The problem of training adversarially robust networks remains quite challenging, due to the increased training cost and the trade-off between accuracy on unperturbed vs. perturbed inputs. We revisit some of early adversarial training methods, namely the single-step fast gradient sign method (FGSM), the subsequent multi-step projected gradient descent (PGD) method. First, we show that, defying previous belief, FGSM adversarial training, with certain modifications, can result in networks robust to PGD attacks, at a much lower training cost than PGD adversarial training. Second, we show that, unlike standard training of deep neural networks, overfitting to the training set during adversarial training can result in significantly worse test-time robustness. We show that by simply early stopping adversarial training,

PGD can perform just as well as supposed algorithmic improvements introduced subsequently. Due to the challenges of training for worst-case robustness, as well as a desire for practicality, researchers have also focused on average-case robustness, i.e. robustness to random perturbations, a notion which also underlies standard data augmentation strategies. We argue that a sliding scale between the extremes of worst-case and average-case robustness provides a valuable additional metric by which to gauge robustness, which we term intermediate robustness. We illustrate that each of these two extremes is naturally characterized by a (functional) q-norm over perturbation space, and propose a method for efficiently estimating the value of these norms using Markov chain Monte Carlo based path sampling. We propose to extend this work to study the intermediate robustness of large pretrained networks, e.g. foundation models, and to develop better methods for training according to intermediate robustness objectives.

Thesis Committee:

J. Zico Kolter (Chair)

Matt Fredrikson

Aditi Raghunathan

Nicholas Carlini (Google Brain) Additional Information

In Person and Zoom Participation. See announcement.