Parallel Data Laboratory Talk - Pat Helland & Daniel May

— 1:00pm

Location:
Virtual Presentation - ET - Remote Access - Zoom

Speaker:
PAT HELLAND and DANIEL MAY , Salesforce

Yours, Mine, and Ours: Efficient Set Reconciliation in O(n log n) of the SET DIFFERENCE

We will discuss and explain a new algorithm that empowers efficient reconciliation of sets. Extremely large sets can be reconciled in O(n log n) of the SET DIFFERENCE, not the underlying size of the sets. The algorithm is a variant of erasure codes (familiar to the database community) and fountain codes (familiar to the data communications community). This opens new avenues for solutions based on repairing sets that do not even yet exist! When distributed systems agree in advance what items belong in a set, different participants can add items to the set effectively performing replica repair over future content. 

We will explain the set reconciliation algorithm presented at SIGCOMM 2024 in August within a paper titled "Practical Rateless Set Reconciliation" by Yang et al and how it can accomplish such efficiency. Many disparate research opportunities are opened by this algorithm including replica repair (faster than Merkle Trees), improved gossip protocols, scientific computations including detecting small differences in large genomes, management of cloud based control planes, and possibly even improvements to multi-phase protocols used for distributed systems. We hope to conclude with the audience brainstorming about even more possible applications. 

— 

Pat Helland has been building distributed systems and databases since 1978 at companies including Tandem, Microsoft, and Amazon. He is constantly curious about emerging trends in technology and their implications on systems. Pat has been working on database technology at Salesforce since 2012. 

Daniel May works at Salesforce and has been analyzing and improving the performance of complex systems for more than 7 years. He spends much of his spare time voraciously consuming technical papers, largely about distributed systems. 

Zoom Participation.  See announcement.

Event Website:
https://pdl.cmu.edu/talk-series/2025/061225.shtml


Add event to Google
Add event to iCal