李浩然博士的研究方向聚焦于大模型的隐私与安全,紧密结合实际问题与前沿挑战,主要涵盖大模型越狱、提示注入、信息泄露以及后门攻击与防御等领域。他致力于开发基于语境的安全与隐私解决方案,以保障基座大模型及智能体应用的安全可信,同时确保其满足法律法规、平台政策、社会规范和个人偏好的要求,这些研究为提升大模型的安全性和可信性提供了理论基础与技术支持。 代表性论文: [1] Wei Fan, Haoran Li*, Zheye Deng, Weiqi Wang, Yangqiu Song. GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory. Proceedings of EMNLP 2024. (Outstanding Paper Award) [2] Haoran Li, Dadi Guo, Wei Fan, Mingshi Xu, Jie Huang, Fanpu Meng, Yangqiu Song. Multi-step Jailbreaking Privacy Attacks on ChatGPT. Findings of EMNLP 2023. [3] Haoran Li, Dadi Guo, Donghao Li, Wei Fan, Qi Hu, Xin Liu, Chunkit Chan, Duanyi Yao, Yuan Yao, Yangqiu Song. PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models. Proceedings of ACL 2024. (Oral) [4] Haoran Li, Yulin Chen, Zihao Zheng, Qi Hu, Chunkit Chan, Heshan Liu, Yangqiu Song. Simulate and Eliminate: Revoke Backdoors for Generative Large Language Models. Proceedings of AAAI 2025. (Oral) [5] Haoran Li, Wenbin Hu, Huihao Jing, Yulin Chen, Qi Hu, Sirui Han, Tianshu Chu, Peizhao Hu, Yangqiu Song. PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance. Proceedings of ACL 2025. [6] Haoran Li, Mingshi Xu, Yangqiu Song. Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence. Findings of ACL 2023. [7] Haoran Li, Yangqiu Song, Lixin Fan. You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas. Proceedings of NAACL 2022. (Oral Presentation) [8] Haoran Li, Wei Fan Yulin Chen, Jiayang Cheng, Tianshu Chu, Xuebing Zhou, Peizhao Hu, Yangqiu Song. Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory. Proceedings of NAACL 2025. (Oral Presentation) |