Jan Hoffmann, Computer Science Department Faculty, Carnegie Mellon

Jan Hoffmann

Assistant Professor

Office: 9105 Gates & Hillman Centers

Email: jhoffmann@cs.cmu.edu

My research mission is to discover beautiful mathematical ideas that have a real-world impact, shape the way programmers think, and help to create software that is more reliable, efficient, and secure.

I am particularly interested in quantitative properties of programs and their dependencies on the surrounding software and hardware environments. Examples include energy usage, execution time, probabilisitic properties, and the amount of leaked information or privacy. Such properties are inextricably linked with every program and should serve as objective quality measures and guiding principles in programming. Despite their importance, there are still few practical tools that help programmers to reason about quantitative properties of their code.

A main goal of my research is to develop techniques and tools for quantitative analysis and verification that are compositional, efficient, and simple enough to be used by software developers without excessive training. Realizing this vision involves using techniques from formal methods, programming languages, constraint solving, analysis of algorithms, and design and implementation of systems. Two focus areas for applications of my work are probabilistic programming languages and programming languages for digital contracts.

Resource-Usage Analysis

Quantitative resource-usage analysis of software has inspired some of the most exciting research in computer science, which ranges from systems (e.g., non-blocking algorithms and real-time operating systems) to theory (e.g., complexity theory and analysis of algorithms). Nevertheless, it is still challenging for software developers to understand the quantitative performance characteristics of their code: On the one hand, software systems are increasingly complex, concurrent, and distributed. On the other hand, developers rely more and more on off-the-shelf software libraries and software is executed using cloud services that are black boxes with respect to resource usage. The failure of having a precise understanding of the resource usage of programs is at the root of many problems of software:
  • Performance bugs, as seen for example at the healthcare.gov website of the US government in fall 2013, are common and expensive.
  • Side channels make it easy for an attacker to obtain secret or private information.
  • Increasing complexity makes it difficult to correctly predict the timing of safety-critical real-time systems.
  • Stack overflow continues to be a serious and common problem in embedded systems.
  • Energy bugs regularly drain the battery of mobile devices and autonomous systems.
To address these problems, I work on new ways to model and analyze resource usage. For instance, I developed an automatic resource analysis system that derives tight worst-case bounds for functional programs. It is the first automatic analysis system that finds bounds for classical examples such as quick sort and insertion sort for lists of lists, longest common subsequence via dynamic programming, or Dijkstra's single-source shortest-path algorithm. The analysis is type-based and uses a novel polynomial amortized resource analysis that is highly compositional and can efficiently derive precise polynomial bounds for complex and realistic programs. This analysis system is implemented in the programming language Resource Aware ML.

Programming Languages for Digital Contracts

Programming digital contracts comes with unique challenges, which include expressing and enforcing protocols of interaction, controlling resource usage, and tracking linear assets. Resource (or "gas") usage of digital contracts is of particular importance for transparency and consensus. However, obliviousness of resource usage in existing contract languages makes it hard to predict the cost of executing a contract and to prevent denial-of-service vulnerabilities. Moreover, instead of centering contracts on their interactions with users, the high-level protocol of the intended interactions with a contract is buried in the implementation code, hampering understanding, formal reasoning, and trust. Finally, existing contract languages fail to enforce linearity of assets, endangering the validity of a contract when assets get duplicated or deleted.

Together with my collaboraters at Carnegie Mellon, I work on the design of new programming languages that meet the domain-specific requirements of digital contracts. Our insights are that session types can express and enforce the interaction protocols underlying a contract and that automatic amrotized resource analysis (AARA) can predict and enforce gas bounds of a contract. Recently, we have developed Nomos, a novel domain-specific programming language for implementing digital contracts. To express and enforce protocols, Nomos combines binary session types, AARA, and a ML-style functional language.

Analysis of Probabilistic Programs

Probabilistic reasoning is becoming increasingly important in verification. Probabilistic models such as Bayesian Networks and Markov Decision Processes are used to formalize and analyze systems and natural phenomena. For probabilistic and machine-learning algorithms, certain guarantees can only be given with high probability. Additionally, sound static analyses can improve the accuracy and efficiency of probabilistic inference. It turns out that there are close connections between (manual) quantitative reasoning methods for probabilistic programs and (automatic) resource analyses for deterministic programs. My research group is exploring these connections to develop a unified framework for quantitative verification that unites automatic and interactive techniques to reason about resource usage and probabilistic programs.

For example, my group and I have developed the first automatic analysis that derives symbolic bounds on the expected resource usage of probabilistic programs. In another line of work, my collaborators and I have developed more general automatic analysis tools for probabilistic programs that extend static analyses for deterministic programs. An application is solving the Bayesian inference problem for finite-state probabilistic programs featuring categorical random variables and loops.