Theory Lunch Seminar - Zoe Xi
— 1:00pm
Location:
In Person
-
Gates Hillman 8102
Speaker:
ZOE XI,
Ph.D. Student, Electrical Engineering and Computer Science Department, Massachusetts Institute of Technology
https://zoe-xi.github.io/
As AI models continue to develop powerful capabilities, it becomes critical that we are able to verify that their output is aligned with our intentions. A recent line of work focuses on verification via debate, a model of interactive proofs where two competing powerful provers, or AI models, debate each other to convince a weak verifier, or a human, of the correctness of their claim. However, debate assumes that the two AI models possess equal abilities and that one of them is truthful, which may not be realistic.
In this talk, I will present recent work on single-prover interactive proofs for AI safety. Prior results in single-prover interactive proofs do not immediately carry over to the AI safety setting because they do not work when the computation has access to an oracle, such as to human judgment or an external database such as the web. Our work presents doubly-efficient single-prover interactive proofs for oracle-aided computations (also known as relativizing proofs), in the settings where (1) the computation is robust, in the sense that the output does not change if at most a small fraction of the answers to oracle queries are incorrect, or (2) the oracle is a low-degree polynomial. These results suggest that interactive verification is possible even without debate, under structured or noise-tolerant oracle access.
Based on joint work with Liyan Chen and Yael Tauman Kalai.
For More Information:
hfleisch@andrew.cmu.edu