About me

Di He (贺笛) is currently a Senior Researcher in Machine Learning Group at Microsoft Research Asia. He obtained his bachelor, master and Ph.D. degrees from Peking University, advised by Liwei Wang.

Di’s main research interests include representation learning (mainly focusing on learning the representation of structured data such as languages and graphs), trust-worthy machine learning, and deep learning optimization. The primary goal of his work is to develop efficient algorithms that can capture accurate and robust features from data through deep neural networks. To achieve this goal, Di focuses on providing a deeper understanding of different neural network architectures for different practical scenarios and their optimization processes. Di has been serving on the PCs and Senior PCs of the top machine learning and artificial intelligence conferences, such as ICML, NIPS, ICLR, AAAI, and IJCAI.

Publications (Full List)

[1] Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, Liwei Wang, Tie-Yan Liu, “Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding“, NeurIPS 2021

[2] Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu, “Do Transformers Really Perform Bad for Graph Representation?“, Winner of KDD CUP 2021 – Graph Prediction Track,NeurIPS 2021

[3] Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tie-Yan Liu, Arnold Overwijk, “Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder“, EMNLP 2021

[4] Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu, “How could Neural Networks Understand Programs“, ICML 2021

[5] Bohang Zhang, Tianle Cai, Zhou Lu, Di He, Liwei Wang, “Towards Certifying ℓ∞ Robustness using Neural Networks with ℓ∞-dist Neurons“, ICML 2021

[6] Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, Liwei Wang, “GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training“, ICML 2021

[7] Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu, “Taking Notes on the Fly Helps Language Pre-training“, ICLR 2021

[8] Guolin Ke, Di He, Tie-Yan Liu, “Rethinking Positional Encoding in Language Pre-training“, ICLR 2021

[9] Mingqing Xiao, Shuxin Zheng, Chang Liu, Yaolong Wang, Di He, Guolin Ke, Jiang Bian, Zhouchen Lin, Tie-Yan Liu, “Invertible Image Rescaling“, ECCV 2020 (Oral)

[10] Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, Tie-Yan Liu, “On Layer Normalization in the Transformer Architecture“, ICML 2020

[11] Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Liwei Wang, “MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius”, ICLR 2020

[12] Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu, “Incorporating BERT into Neural Machine Translation“, ICLR 2020

[13] Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Liwei Wang, “Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems“, NeurIPS 2019 Beyond First Order Method in ML Workshop

[14] Yiping Lu, Zhuohan Li, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Liwei Wang, Tie-Yan Liu, “Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View“, NeurIPS 2019 Machine Learning and the Physical Sciences Workshop

[15] Zhiqing Sun, Zhuohan Li, Haoqing Wang, Di He, Zi Lin, Zhihong Deng, ”Fast Structured Decoding for Sequence Models“, NeurIPS 2019

[16] Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, Liwei Wang, “Adversarially Robust Generalization Just Requires More Unlabeled Data”, Preprint

[17] Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Tao Qin, Jianhuang Lai, Tie-Yan Liu, “Machine Translation With Weakly Paired Documents“, EMNLP 2019

[18] Zhuohan Li, Di He, Fei Tian, Tao Qin, Liwei Wang, Tie-Yan Liu, “Hint-based Training for Non-autoregressive Translation“, EMNLP 2019

[19] Linyuan Gong, Di He, Zhuohan Li, Tao Qin, Liwei Wang, Tie-Yan Liu, “Efficient Training of BERT by Progressively Stacking“, ICML 2019

[20] Chaoyu Guan, Xiting Wang, Quanshi Zhang, Runjin Chen, Di He, Xing Xie, “Towards a Deep and Unified Understanding of Deep Neural Models in NLP“, ICML 2019

[21] Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, Tie-Yan Liu, “Representation Degeneration Problem in Training Natural Language Generation Models“, ICLR 2019

[21] Chengyue Gong, Di He, Xu Tan, Tao Qin, Liwei Wang, Tie-Yan Liu, “FRAGE: Frequency-Agnostic Word Representation“, NeurIPS 2018

[22] Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, Tie-Yan Liu, “Layer-wise Coordination Between Encoder and Decoder for Neural Machine Translation“, NeurIPS 2018

[23] Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Liwei Wang, Tie-Yan Liu, “Towards Binary-valued Gates for Robust LSTM Training“, ICML 2018

[24] Di He, Hanqing Lu, Yingce Xia, Tao Qin, Liwei Wang, Tie-Yan Liu, “Decoding with Value Networks for Neural Machine Translation“, NeurIPS 2017

[25] Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma, “Dual Learning for Machine Translation”, NeurIPS 2016

[26] Wei Chen, Di He, Tie-Yan Liu, Tao Qin, Yixin Tao, Liwei Wang, “Generalized Second Price Auction with Probabilistic Broad Match”, Proceedings of the fifteenth ACM conference on Economics and Computation (EC), 2014

[27] Di He, Wei Chen, Liwei Wang, Tie-Yan Liu, “A Game-Theoretic Machine Learning Approach for Revenue Maximization in Sponsored Search”, IJCAI 2013

[28] Yining Wang, Liwei Wang, Yuanzhi Li, Di He, Wei Chen, Tie-Yan Liu, “A Theoretical Analysis of NDCG Type Ranking Measures”, Annual Conference on Learning Theory (COLT), 2013

Past supervised undergraduates

Qizhe Xie (CMU)

Chengyue Gong (UT Austin)

Jun Gao (University of Toronto)

Zhuohan Li (UC Berkeley)

Zhiqing Sun (CMU)

Yiping Lu (Stanford)

Linyuan Gong (UC Berkeley)

Runtian Zhai (CMU)

Tianle Cai (Princeton)