Colloquium - Probabilistic models for large, noisy, and dynamic data

118 MLH
Jay Pujara
UC Santa Cruz | Data, Discovery and Decisions (D3)

We inhabit a vast, uncertain, and dynamic universe. To succeed in such an environment, artificial intelligence approaches must handle massive amounts of noisy, changing evidence.  My research addresses the problems of building scalable, probabilistic models amenable to online updates. To illustrate the potential of such models, I present my work on knowledge graph identification, which jointly resolves the entities, attributes, and relationships in a knowledge graph by combining statistical NLP signals and semantic constraints. Using probabilistic soft logic, a statistical relational learning framework I helped develop, I demonstrate how knowledge graph identification can scale to millions of uncertain candidate facts and tens of millions of semantic dependencies in real-world data while achieving state-of-the-art performance. My work further extends this scalability by adopting a distributed computing approach, reducing the inference time of knowledge graph identification from two hours to ten minutes. Updating large, collective models like those used for knowledge graphs with new information poses a significant challenge. I develop a regret bound for probabilistic models and use this bound to motivate practical algorithms that support low-regret updates while improving inference time over 65%. Finally, I highlight several active projects in sustainability, bioinformatics, and mobile analytics that provide a promising foundation for future research.


Jay Pujara is a postdoctoral researcher at the University of California, Santa Cruz whose principal areas of research are machine learning, artificial intelligence, and data science. He completed his PhD at the University of Maryland, College Park and received his MS and BS at Carnegie Mellon University. Prior to his PhD, Jay spent six years at Yahoo! working on mail spam detection, user trust, and contextual mail experiences, and he has also worked at Google, LinkedIn and Oracle. Jay is the author of over twenty peer-reviewed publications and has received three best paper awards for his work. He is a recognized authority on knowledge graphs, and has organized the Automatic Knowledge Base Construction (AKBC) workshop, recently presented a tutorial on knowledge graph construction, and has had his work featured in AI Magazine.