Database Seminar - Andy Grove September 30, 2024 4:30pm — 5:30pm Location: Virtual Presentation - ET - Remote Access - Zoom Speaker: ANDY GROVE, Apache Arrow , and, Apache DataFusion PMC Member https://www.linkedin.com/in/andygrove/ Accelerating Apache Spark workloads with Apache DataFusion Comet Apache Spark is one of the most widely-used distributed data analysis frameworks. However, its JVM-based and row-oriented query execution engine limits Spark’s performance and scalability. In this talk, we will introduce DataFusion Comet, an accelerator for Apache Spark designed to improve the efficiency of Spark queries by translating them into native queries that leverage Apache Arrow and Apache DataFusion. We will explore the core architecture of Comet and explain how Spark plans are translated into native plans and talk about some of the challenges of providing Spark compatibility. — Andy Grove is an Apache Arrow & Apache DataFusion PMC Member and the original creator of Apache DataFusion. This talk is part of the Database Building Blocks SeminarZoom Participation. See announcement. Event Website: https://db.cs.cmu.edu/events/building-blocks-apache-datafusion-comet-andy-grove