Hi, I’m Zhipeng Yang (杨致芃). I received my Bachelor’s degree in Robot Engineering from Southeast University, China. My research interests lie in Explainable and Trustworthy AI, with a particular focus on the Mechanistic Interpretability of Large Language Models (LLMs). I am especially interested in the following directions:

Planning and Reasoning in LLMs: As LLMs are increasingly used for decision-making in real-world agents, it is crucial to uncover their underlying reasoning mechanisms. Understanding how they plan and decompose tasks can help ensure the safety, transparency, and accountability of their decisions. My recent works include: LLM-based Robot Task Planning (arXiv) and Internal Chain-of-Thought in LLMs (EMNLP 2025 Main, arXiv).
Safety and Behavioral Alignment of LLMs: Despite their impressive knowledge capabilities, LLMs remain vulnerable to generating harmful content, misinformation, or inappropriate refusals. Currently, I am working on a project addressing the phenomenon of Over-Refusal, where models reject benign prompts due to excessive safety alignment.

I entered the field of Interpretability in 2024 Summer, motivated by a growing realization that the rapid development of deep learning has often been driven in a blind manner, with progress measured primarily by input–output performance metrics while the models themselves remain black boxes. My research is guided by the conviction that advancing interpretability is not only critical for the trustworthiness and safety of AI systems, but also indispensable for restoring the scientific rigor and transparency that should underlie the future of machine learning.

🔥 News

I am currently seeking PhD or collaboration opportunities. If you are interested, please feel free to contact me via email: <yangzp135@outlook.com>.

📝 Publications

* indicates equal contribution.

Yang, Z., Li, J., Xia, S., & Hu, X.. Internal Chain-of-Thought: Empirical Evidence for Layer‑wise Subtask Scheduling in LLMs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025 Main). arXiv code
Wang, R.*, Yang, Z.*, Zhao, Z., Tong, X., Hong, Z., & Qian, K.. LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots. In 2024 43rd Chinese Control Conference (CCC 2024). (Oral) arXiv
Yang, Z.*, Wang, R.*, Tan, Y., & Xie, L.. MALT: Multi-scale Action Learning Transformer for Online Action Detection. In 2024 International Joint Conference on Neural Networks (IJCNN 2024). (Oral) arXiv

📖 Educations

2021.08 - 2025.06, Bachelor in Robot Engineering, Southeast University.

💻 Internship

2024.10 - 2025.5, HKUST GZ, supervised by Prof. Xuming Hu.
2025.6 - now, MBZUAI, supervised by Prof. Lijie Hu.