Weiliang Zhao

I'm a first-year PhD student at Columbia University, advised by Professor Junfeng Yang and Professor Zhou Yu. My research interests mainly include Alignment of LLMs and Mechanistic Interpretability.

I completed my Master’s in Computer Science in the Department of Computer Science at Columbia University, advised by Prof. Junfeng Yang and Prof. Chengzhi Mao.

I hold a BSc in Mathematics from the University of Edinburgh, where I was advised by Prof. Buark Buke.

📮Email / 🔗LinkedIn / 🎓Google Scholar / 📃CV

Preprints

Proactive Defense Against LLM Jailbreak
Weiliang Zhao, JinJun Peng, Daniel Ben-Levi, Junfeng Yang, Chengzhi Mao,
Under Review

A proactive defense framework that injects strategically crafted spurious outputs to mislead attackers’ optimization loops, prematurely collapsing multi-turn jailbreak searches and dramatically reducing LLM vulnerability.

Publications

Diversity Helps Jailbreak Large Language Models
Weiliang Zhao, Daniel Ben-Levi, Junfeng Yang, Chengzhi Mao,
NAACL, 2025, Oral
arXiv

A Generalised jailbreaking technique by encouraging higher levels of diversification and adjacent obfuscated prompting to evaluate the vulnerabilities of LLMs.

Learning to Rewrite: Generalized LLM-Generated Text Detection
Wei Hao, Ran Li , Weiliang Zhao, Junfeng Yang, Chengzhi Mao,
ACL, 2025
arXiv

We propose a method designed to enhance the detection of LLM-generated text by learning to rewrite more on LLM-generated inputs and less on human generated inputs.

Acknowledgement

I would like to acknowledge the Thinker Research Grants support from Thinking Machine.

Visitor Map

Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.