Liu Yi

Lecture

Liu Yi

Published：2023-11-09

Report Title: Large Model Security Based on Prompt Word Engineering

Report time: November 9, 2023 15:00

Report location: Science Building b1002

Host: Zhao Yongxin

Report Summary:

With the widespread application of large models in various fields such as healthcare, finance, entertainment, and education, the security challenges they bring are gradually emerging. Although large models have brought us a lot of convenience, at the same time, their related security risks have also begun to receive attention from the industry and academia. In this report, we will delve into a new and dangerous attack method that appears in large language models - prompt word injection attack. This attack uses specific cue words to induce the model to produce unexpected outputs. In addition, we also conducted a detailed study on the jailbreaking problem of large models, which is a technical challenge on how to evade the original alignment constraints of the model. The report will share the latest empirical research progress on large model jailbreak warning words and delve into how to use automated tools to accelerate and improve this attack method.

Reported by:

Liu Yi, a doctoral student at the School of Computer Science and Engineering, Nanyang University of Technology, studied under Professor Liu Yang, a renowned international expert in software engineering. The main research directions are large-scale model security, software testing, etc. His research mainly focuses on large-scale model security and software testing, and related work has been published in USENIX At top international conferences such as S&P, NDSS, ICSE, ASE, etc. In terms of large-scale model security, he completed empirical research on jailbreak warning words earlier and designed an automated method for generating jailbreak warning words; At the same time, in terms of prompt word injection, attacks on integrated large language models for commercial applications were implemented earlier, and automated prompt word injection attack tools were designed. In software testing, he led the design of a RESTful API independent testing tool, which won the Huawei Cloud Top 10 Excellent Technical Cooperation Project Award. It has also been applied to multiple product lines within Huawei and invited for commercial testing by Huawei Cloud.