Colloquium - PlinyCompute: Connecting Programming, Computation and Storage for Big Data Analytics

Date: 
February 15, 2019 - 4:00pm to 5:00pm
Location: 
118 MLH
Speaker: 
Jia Zou
Rice University | Department of Computer Science

Users want Big Data analytics systems that provide interactive-speed ad-hoc query processing and short training times for machine learning. But the performance of existing systems is not always great. In this talk, I identify two reasons for this. First, such systems are heavily layered, with many separate softwares working together: a distributed file system, an in-memory file system, the JVM, and the computational system itself. Communication across layers leads to inefficiencies. Also, it is difficult to automatically optimize computations residing in opaque user codes, such as user defined functions (UDFs).

In this talk, I will describe my work aimed at solving those problems. First, I will present a novel declarative programming interface, based on lambda calculus, that forces programmers to expose intent and compiles into a standalone, intermediate representation of computations that facilitates relational-style query optimization and automatic data placement. Second, I will describe a novel storage system that avoids the layering overhead by pushing down analytics computations and managing all analytics data in disk and memory in a monolithic distributed system. In the end, I will describe my on-going work and future research plan for building a novel D3 big data analytics platform to provide Declarative programming, Deterministic performance and Dynamic interaction with edge devices, human in the loop and environment simulators.

 

Bio

Jia Zhou - Department of Computer Science, Rice UniversityJia Zou is a Research Scientist in the Department of Computer Science at Rice University. Prior to join Rice in 2015, She worked in IBM Research - China as a Research Staff Member. She received her Ph.D degree from Tsinghua University, China. Her research investigates and builds high performance and scalable systems for Big Data management and analytics, which has led to an open source system called PlinyCompute and publications in top Big Data management venues, including VLDB and SIGMOD. She mentors undergraduate students, graduate students and high school student for their research works. She also has served the TPC member of Cluster 2018 and has reviewed more than 40 papers for IEEE Transactions on Parallel and Distributed Systems (TPDS), IEEE Transactions on Knowledge and Data Engineering (TKDE) and so on.