When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment

Publication
The Fourteenth International Conference on Learning Representations
Yuxin Xiao
Yuxin Xiao

Yuxin Xiao is a Ph.D. candidate at MIT IDSS. His research focuses on building safe, robust, and trustworthy LLMs and advancing their reasoning and decision-making capabilities for healthcare and other high-stakes applications. Yuxin obtained his M.S. in Machine Learning at Carnegie Mellon University and his B.S. in Computer Science and B.S. in Statistics and Mathematics at the University of Illinois at Urbana-Champaign.

Related