Tuesday, December 7, 2021 - 1:00pm to 2:00pm
Location:Virtual Presentation - ET Remote Access - Zoom
Speaker:ANDERS ØLAND, Ph.D. Student http://andersoland.com/?page_id=9
K-Filtering: A Simple Method for Quantifying & Exploiting Data Redundancy for Deep Learning
Deep neural networks are data-hungry, and labeled data can be hard to come by. In addition, the training process is both time-consuming and costly. This calls for new methods that specifically address the efficiency of both the data collection process as well as the training. To this end, we introduce an efficient method for discovering redundancies in labeled data. Using a simple k-nearest neighbors approach, we filter away the examples that carry the least information about the decision boundary; i.e. the most redundant ones. On several standard benchmarks, we show that up to 60% of the training examples may be discarded beforehandwithout significantly hurting the final test accuracy. This reduces the cost of training, while also enabling more informed decisions to be made when collecting labeled training data. Furthermore, we show that certain ill-conditioned learning problems may be impossible to solve unless a large number of redundant examples are removed from the dataset. Thus, our work provides useful new insights with potential applications to a wide range of tasks, such as hyperparameter optimization, architecture search, transfer learning, greedy layer-wise training, continual learning, and imbalanced classification. While our focus is on deep networks, our method applies to a large range of machine learning algorithms.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.
Zoom Participation. See announcement.