Computer Science 5th Year Masters Thesis Presentation

Monday, August 9, 2021 - 10:00am to 11:30am


Virtual Presentation - ET Remote Access - Zoom


ZHENDONG YUAN, Masters Student

Deep Learning-Based Data Augmentation for Breast Lesion Detection

Deep learning has become increasingly popular in a wide range of applications in the past few years. The performance improvements in hardware and model architecture have made it possible to train a deeper and wider network to achieve state-of-the-art (SOTA) performance for those applications. However, there still exist several potential obstacles that researchers have to overcome before producing a model that could actually be useful in reality. One of the common obstacles is related to the data itself. The training data collected from a small hospital could be limited in quantity and a pre-trained model taken from other hospitals could have bad generalization performance due to potential differences in the X-ray machines and the environment in which the mammogram is taken. Moreover, since the majority of the data collected from the mammogram comes from patients who actually have no illness, there could be a serious imbalance of positive/negative cases in the training data. Models trained using such data could naively achieve an extremely high overall accuracy by predicting everything as normal and would have no actual value in reality. However, lesion/cancer detection is a task that requires the model's predictions to be accurate for both positive/negative cases, resilient to noises, and consistent across different data sources.

 In this thesis, we provide workarounds to the issues mentioned. Our experiment is based on the Pittsburgh mammogram dataset that is comprised of 81149 images collected from approximately 22267 distinct patients. In order to deal with the dataset size restriction and to realize localized explanation, we decide to use a patch-based model for the lesion classification. We extract the normal patches from the breast tissue in images with BIRADS level of 1. The lesion patches are extracted from the ROI(region of interest) labeled by the radiologist from images with BIRADS level score of 0,2 using computer vision techniques. We designed our own techniques to deal with the serious data imbalance via deep learning-based SMOTE and GAN and test those techniques with a deep convolutional model that is similar to VGG16.

Thesis Committee:
Adam Perer (Chair)
Zachary Lipton

Additional Information

Zoom Participation. See announcement.

For More Information, Contact:


5th Year Master's Thesis Presentation