Zhihong Shao 邵智宏

I’m a final-year Ph.D. student in Conversational AI Group, Department of Computer Science and Technology, Tsinghua University. I’m fortunate to be advised by Prof. Minlie Huang.

My interests are in natural language processing and deep learning. I am particularly interested in how we can build a robust and scalable AI system that can leverage diverse skills (e.g., tool use and reasoning) to aggregate possibly-heterogeneous information and answer natural language questions precisely regardless of their complexity.

Research Highlights

LLM Multi-Step Reasoning & Tool Augmentation

  • Improve Math Reasoning with Tool Integration: ToRA (ToRA-34B is the first open-source TOOL-AUGMENTED LLM scoring over 50% on the competition-level MATH dataset);
  • Improve Math Reasoning via Math Training and RL: (i) Process-based Reward Model: Math-Shepherd for process supervision without human annotations; (ii) Math Training and RL: DeepSeekMath (DeepSeekMath 7B is the first open-source LLM scoring over 50% WITHOUT RELYING ON TOOLS on the competition-level MATH dataset);
  • Improve Formal Math Reasoning with Synthetic Data: DeepSeek-Prover trained on formal math data synthesized by iterating auto-formalization and proof search, which solves 50% of problems from miniF2F-test.;
  • Inference-Time Optimization: (i) Prompt Optimization: Synthetic Prompting for automatically synthesizing high-quality CoT demonstrations; (ii) Self-Correction based on Feedback from Tools: Critic which shows that current LLMs struggle with intrinsic self-correction and propose tool-aided correction for more stable improvements.

Publications

  • Peiyi Wang, Lei Li, Zhihong Shao, R.X. Xu, Damai Dai, Yifei Li, Deli Chen, Y.Wu, Zhifang Sui
    Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
    The Annual Meeting of the Association for Computational Linguistics (ACL), 2024.
    [paper]

  • Jiaxin Wen, Ruiqi Zhong, Pei Ke, Zhihong Shao, Hongning Wang, Minlie Huang
    Assisting Humans For Scalable Oversight by Learning Decomposition From Human Feedback: A Case Study in Competitive Programming
    The Annual Meeting of the Association for Computational Linguistics (ACL), 2024.
    [paper]

  • Zhihong Shao, Zhibin Gou, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, and Weizhu Chen
    ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
    International Conference on Learning Representations (ICLR), 2024.
    [Paper]/[Code]
    (ToRA-34B is the first open-source model that attains an accuracy over 50% on the competition-level MATH dataset)

  • Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen
    Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
    Findings of Empirical Methods in Natural Language Processing (Findings of EMNLP), 2023.
    [Paper]

  • Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen
    CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
    International Conference on Learning Representations (ICLR), 2024.
    [paper]/[code]

  • Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen
    Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models
    International Conference on Machine Learning (ICML), 2023.
    [Paper]

  • Zhihong Shao, Fei Huang, Minlie Huang
    Chaining Simultaneous Thoughts for Numerical Reasoning
    Findings of Empirical Methods in Natural Language Processing (Findings of EMNLP), 2022.
    [Paper]

  • Zhihong Shao, Minlie Huang
    Answering Open-Domain Multi-Answer Questions via a Recall-then-Verify Framework
    The Annual Meeting of the Association for Computational Linguistics (ACL), 2022.
    [Paper]/[Code]
    (Best QA system on the AmbigNQ leaderboard)

  • Zhihong Shao, Zhongqin Wu, Minlie Huang
    AdvExpander: Generating Natural Language Adversarial Examples by Expanding Text
    Transactions on Audio, Speech, and Language Processing (TASLP), vol. 30, pp. 1184-1196, 2022.
    [Paper]

  • Zhihong Shao, Lifeng Shang, Qun Liu, Minlie Huang
    A Mutual Information Maximization Approach for the Spurious Solution Problem in Weakly Supervised Question Answering
    The Annual Meeting of the Association for Computational Linguistics (ACL), 2021.
    [Paper]/[Code]

  • Zhihong Shao, Minlie Huang, Jiangtao Wen, Wenfei Xu, and Xiaoyan Zhu
    Long and Diverse Text Generation with Planning-based Hierarchical Variational Model
    Empirical Methods in Natural Language Processing (EMNLP), 2019.
    [Paper]/[Code]

Preprints

  • Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, DeepSeek-AI
    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
    Arxiv abs/2406.11931, 2024.
    [paper]

  • Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, Xiaodan Liang
    DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
    Arxiv abs/2405.14333, 2024.
    [paper]

  • DeepSeek-AI
    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
    Arxiv abs/2405.04434, 2024.
    [paper]

  • Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo
    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
    Arxiv abs/2402.03300, 2024.
    [paper]/[code]

  • DeepSeek-AI
    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
    Arxiv abs/2401.02954, 2024.
    [paper]/[code]

  • Fei Huang, Dazhen Wan, Zhihong Shao, Pei Ke, Jian Guan, Yilin Niu, Xiaoyan Zhu, and Minlie Huang
    CoTK: An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
    Arxiv abs/2002.00583, 2020.
    [Paper]/[Code]

Selected Honors and Awards

  • Lenovo Scholarship, Tsinghua University, 2023
  • 1nd Prize, Comprehensive Scholarship, Tsinghua University, 2022
  • 2nd Prize, Comprehensive Scholarship, Tsinghua University, 2021
  • 3rd Prize, the National Final of “LAN QIAO CUP” C/C++ Group, 2018
  • China National Scholarship 2017
  • 1st Prize, National College Students Mathematics Competition (non-math-major), 2016
  • China National Scholarship, 2016

Services

Reviewer/Program Committee: ACL, EMNLP, NLPCC, ARR

Teaching

I was a TA for the following undergraduate courses:

  • Artificial Neural Network (2019 Fall, 2020 Fall, 2021 Fall, 2022 Fall)
  • Object-Oriented Programming (2020 Spring, 2021 Spring, 2022 Spring)