Hao Hu



Technical Staff @ Moonshot AI
Ph.D. @ Tsinghua University

Email Email
 Github Github
 Linkedin Linkedin
 CV Resume
gscholar Google Scholar

About

I am currently a Member of Technical Staff at Moonshot AI, focusing on computer-use and deep research agents. I obtained my Ph.D. from IIIS, Tsinghua University in 2024, where I had the honor of being advised by Prof. Chongjie Zhang and Prof. Yang Gao. During my Ph.D., I was fortunate to be a visiting scholar at Northwestern University, working with Prof. Zhaoran Wang .

My primary goal is to build intelligent and autonomous agents that liberate humans from tedious tasks and deepen our understanding of intelligence. To this end, my research focuses on designing deep reinforcement learning algorithms and developing agents powered by foundation models. Please feel free to contact me if you are interested in collaborating on building next-generation agents at Moonshot!

Publications

  1. OPENCUA: Open Foundations for Computer-Use Agents
    Xinyuan Wang*, Bowen Wang*, Dunjie Lu*, Junlin Yang*, Tianbao Xie*, Junli Wang*, Jiaqi Deng, Xiaole Guo, Yiheng Xu, Chen Henry Wu, Zhennan Shen, Zhuokai Li, Ryan Li, Xiaochuan Li, Junda Chen, Boyuan Zheng, Peihang Liu, Fangyu Lei, Ruisheng Cao, Yeqiao Fu, Dongchan Shi, Martin Shi, Jiarui Hu, Yuyan Wang, Jixuan Chen, Yuxiao Ye, Danyang Zhang, Hao Hu, Huarong Chen, Dikang Du, Zaida Zhou, Haotian Yao, Ziwei Chen, Qizheng Gu, Yipu Wang, Heng Wang, Diyi Yang, Victor Zhong, Flood Sung, Y. Charles, Zhilin Yang, Tao Yu
    Thirty-ninth Conference on Neural Information Processing Systems (NeurIPS), 2025
    PDF | Code
  2. Kimi K2: Open Agentic Intelligence
    Kimi Team
    Technical Report, 2025
    PDF | Blog
  3. Kimi K1.5: Scaling Reinforcement Learning with LLMs
    Kimi Team
    Technical Report, 2025
    PDF
  4. Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents
    Zhihan Liu*, Hao Hu*, Shenao Zhang*, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang
    Forty-first International Conference on Machine Learning (ICML), 2024
    NeurIPS Workshop on Foundation Models for Decision Making, 2023
    PDF | Code | Project Page
  5. CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries
    Ni Mu*, Hao Hu*, Xiao Hu, Yiqin Yang, Bo Xu, Qing-Shan Jia
    Thirty-ninth Conference on Neural Information Processing Systems (NeurIPS), 2025
    PDF | Code
  6. Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset
    Yiqin Yang, Quanwei Wang, Chenghao Li, Hao Hu, Chengjie Wu, Yuhua Jiang, Dianyu Zhong, Ziyou Zhang, Qianchuan Zhao, Chongjie Zhang, Xu Bo
    Forty-second International Conference on Machine Learning (ICML), 2025
    PDF
  7. Episodic Novelty through Temporal Distance
    Hao Hu*, Yiqin Yang*, Jianing Ye, Chengjie Wu, Ziqing Mai, Yujing Hu, Tangjie Lv, Changjie Fan, Qianchuan Zhao, Chongjie Zhang
    Thirteenth International Conference on Learning Representations (ICLR), 2025
    PDF | Code
  8. Bayesian Design Principles for Offline-to-Online Reinforcement Learning
    Hao Hu*, Yiqin Yang*, Jianing Ye, Chengjie Wu, Ziqing Mai, Yujing Hu, Tangjie Lv, Changjie Fan, Qianchuan Zhao, Chongjie Zhang
    Forty-first International Conference on Machine Learning (ICML), 2024
    PDF | Code
  9. Planning, Fast and Slow: Online Reinforcement Learning with Action-Free Offline Data via Multiscale Planners
    Chengjie Wu*, Hao Hu*, Yiqin Yang, Ning Zhang, Chongjie Zhang
    Forty-first International Conference on Machine Learning (ICML), 2024
    PDF
  10. Stylized Offline Reinforcement Learning: Extracting Diverse High-Quality Behaviors from Heterogeneous Datasets
    Yihuan Mao, Chengjie Wu, Xi Chen, Hao Hu, Ji Jiang, Tianze Zhou, Tangjie Lv, Changjie Fan, Zhipeng Hu, Yi Wu, Yujing Hu, Chongjie Zhang
    The Twelfth International Conference on Learning Representations (ICLR), 2024
    PDF
  11. One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
    Zhihan Liu* Miao Lu* Wei Xiong* Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang
    Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS Spotlight), 2023
    PDF | Code
  12. Unsupervised Behavior Extraction via Random Intent Priors
    Hao Hu*, Yiqin Yang* Jianing Ye, Ziqing Mai, Chongjie Zhang
    Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023
    PDF | Code
  13. What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?
    Rui Yang, Yong Lin, Xiaoteng Ma, Hao Hu , Chongjie Zhang, Tong Zhang
    Eleventh International Conference on Learning Representations (ICLR), 2023
    PDF | Code
  14. The Provable Benefit of Unsupervised Data Sharing for Offline Reinforcement Learning
    Hao Hu*, Yiqin Yang*, Qianchuan Zhao, Chongjie Zhang
    Eleventh International Conference on Learning Representations (ICLR), 2023
    PDF | Code
  15. Flow to Control: Offline Reinforcement Learning with Lossless Primitive Discovery
    Yiqin Yang*, Hao Hu*, Wenzhe Li*, Siyuan Li, Jun Yang, Qianchuan Zhao, Chongjie Zhang
    Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023
    PDF | Code
  16. On the Role of Discount Factor in Offline Reinforcement Learning
    Hao Hu*, Yiqin Yang*, Qianchuan Zhao, Chongjie Zhang
    International Conference on Machine Learning (ICML), 2022
    PDF | Code
  17. Offline Reinforcement Learning with Value-based Episodic Memory
    Xiaoteng Ma*, Yiqin Yang*, Hao Hu*, Qihan Liu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, Bin Liang
    Tenth International Conference on Learning Representations (ICLR), 2022
    PDF | Code
  18. On the Estimation Bias in Double Q-Learning
    Zhizhou Ren, Guangxiang Zhu, Hao Hu, Beining Han, Jianglun Chen, Chongjie Zhang
    Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), 2021
    PDF | Code
  19. Generalizable Episodic Memory for Deep Reinforcement Learning
    Hao Hu, Jianing Ye, Zhizhou Ren, Guangxiang Zhu, and Chongjie Zhang
    International Conference on Machine Learning (ICML), 2021
    PDF | Code
  20. MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration
    Jin Zhang*, Jianhao Wang*, Hao Hu, Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang
    International Conference on Machine Learning (ICML), 2021
    PDF | Code
  21. Query-Efficient Offline Preference-Based Reinforcement Learning via In-Dataset Exploration
    Hao Hu*, Yiqin Yang*, Shuai Wang, Bo Liu, Yang Gao, Chongjie Zhang
    Under Review
    PDF

Experience

Selected Talks

Education