Computer Science Speaker Series Master Calendar

Back to Listing

Feifei Li: Towards Building Interactive and Online Analytical Systems

Event Type
Jiawei Han, Professor, Computer Science; Aditya Parameswaran, Assistant Professor, Computer Science
2405 Siebel Center
Nov 10, 2017   1:30 pm  
Feifei Li, Director of Graduate Studies, School Of Computing, Associate Professor, School Of Computing, University of Utah, Salt Lake City
Lisa Yanello
Originating Calendar
Computer Science Speakers Calendar

Supporting interactive queries and analytics over large data is a critical requirement in many data-driven applications. The classic external memory model based on IO optimizations no longer works well in the era of big data due to its high latency. Instead, newer systems (e.g., Spark, Impala) rely on in-memory computing over a cluster of commodity machines to offer scale-out interactive data analytics. 

In the context of large spatio-temporal data, this talk presents the Simba system that offers scalable and efficient in-memory analytics over a cluster. Simba extends the Spark SQL engine to support rich query and analytical semantics through both SQL and DataFrame API (e.g., spatial join, knn join, trajectories).

An effective query optimizer leveraging its indexing support and geometry-aware query optimization is designed. Furthermore, the system is able to provide online analytics that explores the accuracy-efficiency tradeoff through novel online aggregation techniques that support complex multi-way join queries and random sampling over joins.  Lastly, we will also present ongoing extensions to Simba that explores spatio-temporal learning and sentiment analysis over large data.

link for robots only