Building machine learning systems in the era of data-centric AI

Ce Zhang

Recent advances in machine learning systems have made it incredibly easier to train ML models given a training set. However, this does not mean that the job of an MLDev or MLOps engineer is any easier. As we sail past the era in which the major goal of ML platforms is to support the building of models, we might have to think about our next-generation ML platforms as something that support the iteration of data. This is a challenging task, which requires us to take a holistic view of data quality, data management, and machine learning altogether. In this talk, I will discuss some of our thoughts in this space, illustrated by several recent results that we get in data debugging and data cleaning for ML models to systematically enforce their quality and trustworthiness.


Ce is an Assistant Professor in Computer Science at ETH Zurich. The mission of his research is to make machine learning techniques widely accessible---while being cost-efficient and trustworthy---to everyone who wants to use them to make our world a better place. He believes in a system approach in enabling this goal, and his current research focuses on building next-generation machine learning platforms and systems that are data-centric, human-centric, and declaratively scalable. Before joining ETH, Ce finished his PhD at the University of Wisconsin-Madison and spent another year as a postdoctoral researcher at Stanford, both advised by Christopher RĂ©. His work has received recognitions such as the SIGMOD Best Paper Award, SIGMOD Research Highlight Award, Google Focused Research Award, an ERC Grant, and has been featured and reported by Science, Nature, the Communications of the ACM, and various media outlets such as Atlantic, WIRED, Quanta Magazine, etc.