Indrayana Rustandi Predictive fMRI Analysis for Multiple Subjects and Multiple Studies Degree Type: Ph.D. in Computer Science Advisor(s): Tom M. Mitchell Graduated: May 2010 Abstract: In the context of predictive fMRI data analysis, the state of the art is to perform the analysis separately for each particular subject in a specific study. Given the nature of the fMRI data where there are many more features than instances, this kind of analysis might produce suboptimal predictive models since the data might not be sufficient to obtain accurate models. Based on findings in the cognitive neuroscience field, there is a reason to believe that data from other subjects and from different but similar studies exhibit similar patterns of activations, implying that there is some potential for increasing the data available to train the predictive models by analyzing together data coming from multiple subjects and multiple studies. However, each subject's brain might still exhibit some variations in the activations compared to other subjects' brains, based on factors such as differences in anatomy, experience, or environment. A major challenge in doing predictive analysis of fMRI data from multiple subjects and multiple studies is having a model that can effectively account for these variations. In this thesis, we propose two classes of methods for predictive fMRI analysis across multiple subjects and studies. The first class of methods are based on the hierarchical linear model where we assume that different subjects (studies) can have different but still similar parameter values. However, this class of methods are still too restrictive in the sense that they require that the different fMRI datasets to be registered to a common brain, a step that might introduce distortions in the data. To remove this restriction, we propose a second class of methods based on the idea of common factors present in different subjects/studies fMRI data. We consider learning these factors using principal components analysis and canonical correlation analysis. Based on the application of these methods in the context two kinds of predictive tasks–predicting the cognitive states associated with some brain activations and predicting the brain activations associated with some cognitive states–we show that we can indeed effectively combine fMRI data from multiple subjects and multiple studies and obtain significantly better accuracies compared to single-subject predictive models. Thesis Committee: Tom M. Mitchell (Chair) Zoubin Ghahramani Eric Xing David Blei (Princeton University) Jeannette Wing, Head, Computer Science Department Randy Bryant, Dean, School of Computer Science