Computer Science Thesis Proposal

— 2:30pm

Location:
In Person and Virtual - ET - Traffic21 Classroom, Gates Hillman 6501

Speaker:
MAGDALEN DOBSON MANOHAR, Ph.D. Student, Computer Science Department, Carnegie Mellon University
https://magdalendobson.github.io/


New Techniques for Parallelism and Concurrency in Nearest Neighbor Search

Nearest neighbor search in both high and low dimensions is an important problem in the field of computer science and beyond. In low dimensions, it is crucial for applications such as ray tracing, collision detection, and motion planning. In high dimensions, approximate nearest neighbor search is used for many applications handling machine-learning generated embeddings of various types of data, such as search engines, ads recommendation, and information retrieval. In this thesis proposal, we make progress on nearest neighbor search in both dimensional regimes by making contributions to parallelism, concurrency, and robustness to challenging distributions.

In low dimensions, we present the zd-tree, a data structure for nearest neighbor search with provable guarantees on search times, a scalable parallel build, and the ability to perform batch-dynamic updates in parallel. We also present the CLEANN-Tree (for Concurrent Linearizable Efficient Augmented Nearest Neighbor Search Tree), a generalization of the zd-tree which supports concurrent queries and updates using new work on versioned pointers and lock-free locks.

In high dimensions, we present ParlayANN, a library of scalable, lock-free, and deterministic graph-based nearest neighbor search algorithms. We also present in-progress work on several topics; the first is building a specialized vector database for nearest neighbor search with the goal of supporting linearizable and deterministic updates as well as snapshotting and crash recovery with performance on par with the best existing vector index. Finally, we present in-progress work on adapting graph-based nearest neighbor search algorithms for a particular kind of challenging dataset: namely, those designed with duplicate detection in mind.

Thesis Committee:

Guy E. Blelloch (Chair)
Phillip B. Gibbons
Andy Pavlo
Harsha Vardhan Simhadri (Microsoft Research)
Matthijs Douze (Meta AI Research)
 

Additional Information
In Person and Zoom Participation.  See announcement.

For More Information:
In Person and Virtual - ET


Add event to Google
Add event to iCal