Business Analytics Seminar Speaker

Muchao Ye

Title: Evaluating and Certifying the Adversarial Robustness of Neural Language Models

Abstract:

Language models (LMs) built by deep neural networks (DNNs) have achieved great success in various areas of artificial intelligence, which have played an increasingly vital role in profound applications including chat bots and smart health care. Nonetheless, the vulnerability of DNNs against adversarial examples still threatens the application of neural LMs to safety-critical tasks. To specify, the adversary can easily make DNNs change their correct predictions into incorrect ones by adding small perturbations to the original input text. In this talk, we identify key limitations in evaluating and certifying the adversarial robustness of neural LMs and bridge those gaps through efficient hard-label text adversarial attacks and unified certified robust training.

The first step of developing LMs with adversarial robustness is evaluating whether they are empirically robust against perturbed texts. The vital technique for the evaluation pipeline is related to the text adversarial attack for constructing a text that can fool LMs, which ideally shall output high-quality adversarial examples in a realistic setting with high efficiency. However, current evaluation models proposed in the realistic hard-label setting adopt heuristic search methods and consequently meet an inefficiency problem. To tackle this limitation, we introduce a series of hard-label adversarial attack methods, which successfully tackle the inefficiency problem under the idea of using a pre-trained word embedding space as an intermediate. A deep dive into this idea illustrates that utilizing estimated decision boundary information helps improve the quality of crafted adversarial examples.

The ultimate goal for constructing robust LMs is to obtain ones for which adversarial examples do not exist, which can be realized through certified robust training. Current methods have proposed different types of certified robust training in either the discrete input space or the continuous latent feature space. We identify the structural gap within current pipelines and unify those in the word embedding space. Such a unification provides a stronger robustness guarantee by removing unnecessary modules, i.e., bound computation modules such as interval bound propagation, and harnessing a new decoupled regularization learning paradigm.

Given the aforementioned contributions, we believe our findings will shed light on the investigation of the adversarial robustness of multi-modal large language models and developing robust LMs for safety-critical applications.

Bio:

Muchao Ye is an Assistant Professor in Computer Science at the University of Iowa. He received his PhD from the College of Information Sciences and Technology at the Pennsylvania State University in 2024. Before that, he obtained his Bachelor of Engineering degree in Information Engineering at South China University of Technology. His research interests lie in the intersection of AI, security, and health care, with a focus on improving AI safety from the perspective of adversarial robustness. His research works have been published in top venues including NeurIPS, KDD, AAAI, ACL, and the Web Conference. 

Friday, September 20, 2024 11:00am to 12:00pm
Pappajohn Business Building
W401
21 East Market Street, Iowa City, IA 52245
View on Event Calendar
Individuals with disabilities are encouraged to attend all University of Iowa–sponsored events. If you are a person with a disability who requires a reasonable accommodation in order to participate in this program, please contact Mengxiao Zhang in advance at 319-335-0858 or mengxiao-zhang@uiowa.edu.