CS Colloquium - Exploring the Adversarial Robustness of Language Models

Speaker

Muchao Ye

Abstract

Language models built by deep neural networks have achieved great success in various areas of artificial intelligence, which have played an increasingly vital role in profound applications including chat bots and smart health care. However, since deep neural networks are vulnerable to adversarial examples, there are still concerns about applying them to safety-critical tasks. In this talk, I will present a series of methods to evaluate and certify their adversarial robustness for building robust language models, which is the key to solving this conundrum.

Firstly, I will introduce a new idea of conducting text adversarial attacks for evaluating the adversarial robustness of language models in the most realistic hard-label setting, which is to incorporate a pre-trained word embedding space as an optimization intermediate. The proposed gradient-based optimization methods based on that idea successfully tackle the inefficiency problem of existing ones. A deep dive into such a viewpoint further shows that utilizing an estimated decision boundary helps improve the quality of crafted adversarial examples. Secondly, I will further discuss a unified certified robust training framework for enhancing the certified robustness of language models. It provides a stronger robustness guarantee by removing unnecessary modules and harnessing a novel decoupled regularization loss. Finally, I will conclude my talk with an outlook on improving the adversarial robustness of multi-modal foundation models, applying them to healthcare for communication disorders, and building a secure learning paradigm for AI agents.

Bio

Muchao Ye is a PhD candidate in the College of Information Sciences and Technology at the Pennsylvania State University, advised by Dr. Fenglong Ma. Before that, he obtained his Bachelor of Engineering degree in Information Engineering at South China University of Technology. His research interests lie in the intersection of AI, security, and healthcare, with a focus on improving AI safety from the perspective of adversarial robustness. His research works have been published in top venues including NeurIPS, KDD, AAAI, ACL, and the Web Conference.

Wednesday, February 7, 2024 3:30pm to 4:30pm

MacLean Hall

110

2 West Washington Street, Iowa City, IA 52240

Computer Science Dept.

View on Event Calendar

Individuals with disabilities are encouraged to attend all University of Iowa–sponsored events. If you are a person with a disability who requires a reasonable accommodation in order to participate in this program, please contact Computer Science Dept. in advance at 319-335-0713 or matthieu-biger@uiowa.edu.