Abstract: How to enable fast deep learning is a key problem in AI research nowadays. In this talk, I will approach it from two perspectives: models and algorithms. Firstly, I will propose a model, LSTM-Jump, that can skip unimportant information in sequential data, mimicking the skimming behavior of human reading. Trained with an efficient reinforcement learning algorithm, this model can be several times faster than a vanilla LSTM in inference time. Then I will introduce a sequence encoding model that discards recurrent networks, which thus fully supports parallel training and inference. Based on this technique, a new question-answering model, QANet, is proposed. Combined with data augmentation approach via cyclic-translation, this model achieves No.1 performance in the competitive Stanford Question and Answer Dataset (SQuAD), while being times faster than the prevalent models. Lastly, I will talk about a general gradient normalization technique for efficient deep networks training. This method can not only alleviate the gradient vanishing problem, but also regularize the model to achieve better generalization.
Bio: Adams Wei Yu is a Ph.D. candidate in Machine Learning Department at Carnegie Mellon University, advised by Professor Jaime Carbonell and Alex Smola. His research interest is in artificial intelligence, encompassing deep learning, large-scale optimization and natural language processing. The main theme of his research is to accelerate AI by designing efficient models and algorithms. His research work has been published in various leading conferences and journals, including ICML, NIPS, ICLR, ACL, COLT, JMLR, AISTATS, AAAI and VLDB. His paper has been selected in INFORMS 2014 Data Mining Best Student Paper Finalist, and his coauthored paper was nominated as Best Paper in ICME 2011. He is a Nvidia PhD Fellow, Snap PhD Fellow, Siebel Scholar and CMU Presidential Fellow. He served as the workflow Chair of AISTATS 2017. http://www.cs.cmu.edu/~weiyu/