Department of Computer Science Colloqium - University of Pittsburgh

— 5:00pm

Location:
5317 - Sennott Square

Speaker:
DEEPAK MAJETI and ANATOLI SHEIN, Deepak Majeti - Technical Lead; Anatoli Shein Software Engineer


Separating Compute and Storage for Modern Big Data Analytics


Speaker: Deepak Majeti, Technical Lead; Anatoli Shein, Software Engineer

Location: Sennot Square 5317


Separating Compute and Storage for Modern Big Data Analytics

Big Data analytical applications today analyze data from diverse information sources. Modern data analyses also involve many specialized compute pipelines with data moving across different data lakes. Therefore, it is important for modern computational tools to be able to handle diverse storage platforms such as HDFS, PureStorage, and S3. Further, hardware and software advances gave rise to (private/public) cloud services that provide orders of magnitude elasticity capable of delivering real-time results. Computation tools must now additionally scale well to take advantage of this elasticity. These factors require the separation of compute and storage for modern Big Data analytics.

However, there are many challenges with the separation of compute and storage. Some of these challenges include scaling quickly, hiding data transfer latency, optimizing locality, sharing system resources, synchronizing multiple distributed systems, effective caching, etc.

In this talk, we will present some of the advances being made at Vertica to handle the aforementioned modern data analytics challenges.

Deepak Majeti is a technical lead at Vertica where he leads the Vertica SQL on Hadoop product. He is also a PMC member of the Apache ORC Project and a committer for the Apache Parquet Project. Deepak’s interests lie in combining HPC and Big Data domains for building scalable, high-performant, and energy-efficient data analytics tools for modern computer architectures. Deepak holds a P.h.D. from Rice University.

Anatoli Shein is a software engineer at Vertica where he works on Vertica's integration with Hadoop ecosystem. He is a contributor to Apache HDFS project with a focus on improving Libhdfs++, a native C++ client for fast access to HDFS data. His interests include big data analytics, stream processing, and distributed systems. Anatoli is a P.h.D. candidate at the University of Pittsburgh.

Faculty Host:  Panos K. Chrysanthis

For More Information:
panos@cs.pitt.edu