Dimitris Margaritis

Learning Bayesian Network Model Structure from Data Degree Type: Ph.D. in Computer Science
Advisor(s): Sebastian Thrun
Graduated: May 2003

Abstract:

In this thesis I address the important problem of the determination of the structure of directed statistical models, with the widely used class of Bayesian network models as a concrete vehicle of my ideas. The structure of a Bayesian network represents a set of conditional independence relations that hold in the domain. Learning the structure of the Bayesian network model that represents a domain can reveal insights into its underlying causal structure. Moreover, it can also be used for prediction of quantities that are difficult, expensive, or unethical to measure -- such as the probability of lung cancer for example -- based on other quantities that are easier to obtain. The contributions of this thesis include (a) an algorithm for determining the structure of a Bayesian network model from statistical independence statements; (b) a statistical independence test for continuous variables; and finally (c) a practical application of structure learning to a decision support problem, where a model learned from the database -- most importantly its structure -- is used in lieu of the database to yield fast approximate answers to count queries, surpassing in certain aspects other state-of-the-art approaches to the same problem.

Thesis Committee:
Sebastian Thrun (Chair)
Christos Faloutsos
Andrew W. Moore
Peter Spirtes
Gregory F. Cooper (University of Pittsburgh)

Randy Bryant, Head, Computer Science Department
James Morris, Dean, School of Computer Science

Keywords:
Bayesian networks, Bayesian network structure learning, continuous variable independence test, Markov blanket, causal discovery, DataCube approximation, database count queries

CMU-CS-03-153.pdf (1.05 MB) ( 126 pages)
Copyright Notice