When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment

Yuxin Xiao, Sana Tonekaboni, Walter Gerych, Vinith Menon Suriyakumar, Marzyeh Ghassemi

2026

URL

Type

Conference paper

Publication

The Fourteenth International Conference on Learning Representations

Yuxin Xiao

Yuxin Xiao is a Ph.D. candidate at MIT IDSS. His research focuses on building safe, robust, and trustworthy LLMs and advancing their reasoning and decision-making capabilities for healthcare and other high-stakes applications. Yuxin obtained his M.S. in Machine Learning at Carnegie Mellon University and his B.S. in Computer Science and B.S. in Statistics and Mathematics at the University of Illinois at Urbana-Champaign.

When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment

Yuxin Xiao

Related