When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment

Publication
The Fourteenth International Conference on Learning Representations
Yuxin Xiao
Yuxin Xiao

Yuxin Xiao is a Ph.D. candidate at MIT IDSS. His research focuses on building safe, robust, and trustworthy LLMs and advancing their reasoning and decision-making capabilities for healthcare and other high-stakes applications. Yuxin obtained his M.S. in Machine Learning at Carnegie Mellon University and his B.S. in Computer Science and B.S. in Statistics and Mathematics at the University of Illinois at Urbana-Champaign.

Sana Tonekaboni
Sana Tonekaboni

Sana is a postdoctoral fellow at the Broad Institute of MIT and Harvard. Her research focuses on developing methods that integrate multimodal biomedical data to better understand human health. She is also interested in challenges of deploying clinical ML in healthcare environments and finding solutions for effective and safe use of such tools in practice. Sana received her PhD in computer science from the University of Toronto, under supervision of Dr. Anna Goldenberg, where she was an Apple scholar in AI/ML and a CIHR health system impact fellow.

Related