Principles of Programming Seminar
In Person - Reddy Conference Room, Gates Hillman 4405
HAZEM TORFAH , Postdoctoral Researcher, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley
Synthesizing Pareto-Optimal Interpretations for Black-Box Models
We present a new multi-objective optimization approach for synthesizing interpretations that "explain" the behavior of black-box machine learning models. Constructing human-understandable interpretations for black-box models often requires balancing conflicting objectives. A simple interpretation may be easier to understand for humans while being less precise in its predictions vis-a-vis a complex interpretation. Existing methods for synthesizing interpretations use a single objective function and are often optimized for a single class of interpretations.
In contrast, we provide a more general and multi-objective synthesis framework that allows users to choose (1) the class of syntactic templates from which an interpretation should be synthesized, and (2) quantitative measures on both the correctness and explainability of an interpretation. For a given black-box, our approach yields a set of Pareto-optimal interpretations with respect to the correctness and explainability measures. We show that the underlying multi-objective optimization problem can be solved via a reduction to quantitative constraint solving, such as weighted maximum satisfiability.
To demonstrate the benefits of our approach, we have applied it to synthesize interpretations for black-box neural-network classifiers. Our experiments show that there often exists a rich and varied set of choices for interpretations that are missed by existing approaches.
Hazem Torfah is a postdoctoral researcher in the EECS Department at UC Berkeley. He received his doctoral degree in Computer Science in December 2019 from Saarland University, Germany. His research interests are the formal specification, verification, and synthesis of cyber-physical systems. In his Ph.D., Hazem developed a quantitative theory for reactive systems based on model counting. His doctoral thesis was awarded with the Dr.-Edward-Martin Dissertation award. He is one of the main developers of the RTLola monitoring framework, which has been integrated into the ARTIS fleet of unmanned aerial vehicles in close collaboration with the German Aerospace Center (DLR). Hazem's current focus is the development of quantitative methods for the explainability and runtime assurance of AI-based autonomous systems.