Shilpa Anna George

Low-Bandwidth Remote Sensing of Rare Events Degree Type: Ph.D. in Computer Science
Advisor(s): Mahadev Satyanarayanan (Satya)
Graduated: May 2023

Abstract:

Remote Sensing enables knowledge discovery from live data collected by unmanned probes. Planetary exploration, drone surveillance, and underwater sensing are three examples of domains in which remote sensing plays a central role. Near real-time knowledge acquisition of a rare target during such missions is challenging due to three extremes: low bandwidth, novelty of target, and class imbalance. We call the learning that happens in these extreme conditions as Live Learning. This is a new capability at the intersection of edge computing and machine learning. It aims to learn a model for a rare target from unlabeled data captured on distributed probes that are only reachable over a low-bandwidth network.

The main contribution of this thesis is the design, implementation, and evaluation of Hawk, an interactive model-agnostic live learning system that enables the discovery of rare novel phenomena from a stream of extremely skewed unlabeled visual data capture on weakly-connected remote sensing probes. Hawk is designed to optimize the use of two critical resources: (a) the network bandwidth from the remote source to the human expert, and (b) the expert's labeling bandwidth. Live Learning embodies a new semi-supervised learning algorithm to train models on-the-fly to discover instances of a target from very few initial labeled data. We show the effectiveness of Hawk by performing extensive validation on three very demanding publicly-available datasets from the domains mentioned above. Each of these datasets was released within the past few years, and has been used in recent ML research publications in its domain.

Our experiments show that even at bandwidths as low as 12 kbps and a base rate of 0.1%, a team of 7 probes is able to use Hawk to discover up to 87% of the event instances that could have been discovered using a brute-force model. Such a model is created from advance knowledge, transmission and labeling of all mission data. Our results show 1.5X–2X improvement in recall when Live Learning in Hawk is combined with recent Few Shot Learning algorithms such as SnaTCHer. Our results also show how the use of Diversity Sampling can further improve recall in Hawk.

Thesis Committee:
Mahadev Satyanarayanan (Chair)
Deva Ramanan
Ameet Talwalkar
Padmanabhan Pillai (Intel Labs)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Keywords:
Edge Computing, Remote Sensing, Active Learning, Live Learning, Video Analytics, Training Data Creation

CMU-CS-23-104.pdf (8.46 MB) ( 110 pages)
Copyright Notice