Department of Computer Science Colloqium - University of Pittsburgh February 6, 2019 3:00pm — 5:00pm Location: 5317 - Sennott Square Speaker: DEEPAK MAJETI and ANATOLI SHEIN, Deepak Majeti - Technical Lead; Anatoli Shein Software Engineer Separating Compute and Storage for Modern Big Data Analytics Speaker: Deepak Majeti, Technical Lead; Anatoli Shein, Software Engineer Location: Sennot Square 5317 Separating Compute and Storage for Modern Big Data Analytics Big Data analytical applications today analyze data from diverse information sources. Modern data analyses also involve many specialized compute pipelines with data moving across different data lakes. Therefore, it is important for modern computational tools to be able to handle diverse storage platforms such as HDFS, PureStorage, and S3. Further, hardware and software advances gave rise to (private/public) cloud services that provide orders of magnitude elasticity capable of delivering real-time results. Computation tools must now additionally scale well to take advantage of this elasticity. These factors require the separation of compute and storage for modern Big Data analytics. However, there are many challenges with the separation of compute and storage. Some of these challenges include scaling quickly, hiding data transfer latency, optimizing locality, sharing system resources, synchronizing multiple distributed systems, effective caching, etc. In this talk, we will present some of the advances being made at Vertica to handle the aforementioned modern data analytics challenges. —Deepak Majeti is a technical lead at Vertica where he leads the Vertica SQL on Hadoop product. He is also a PMC member of the Apache ORC Project and a committer for the Apache Parquet Project. Deepak’s interests lie in combining HPC and Big Data domains for building scalable, high-performant, and energy-efficient data analytics tools for modern computer architectures. Deepak holds a P.h.D. from Rice University. Anatoli Shein is a software engineer at Vertica where he works on Vertica's integration with Hadoop ecosystem. He is a contributor to Apache HDFS project with a focus on improving Libhdfs++, a native C++ client for fast access to HDFS data. His interests include big data analytics, stream processing, and distributed systems. Anatoli is a P.h.D. candidate at the University of Pittsburgh. Faculty Host: Panos K. Chrysanthis For More Information: panos@cs.pitt.edu Add event to Google Add event to iCal