Abstract/Details

Training Recurrent Neural Networks

Sutskever, Ilya.   University of Toronto (Canada) ProQuest Dissertations Publishing,  2013. NS22066.

Abstract (summary)

Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging problems.

We first describe a new probabilistic sequence model that combines Restricted Boltzmann Machines and RNNs. The new model is more powerful than similar models while being less difficult to train.

Next, we present a new variant of the Hessian-free (HF) optimizer and show that it can train RNNs on tasks that have extreme long-range temporal dependencies, which were previously considered to be impossibly hard. We then apply HF to character-level language modelling and get excellent results.

We also apply HF to optimal control and obtain RNN control laws that can successfully operate under conditions of delayed feedback and unknown disturbances.

Finally, we describe a random parameter initialization scheme that allows gradient descent with momentum to train RNNs on problems with long-term dependencies. This directly contradicts widespread beliefs about the inability of first-order methods to do so, and suggests that previous attempts at training RNNs failed partly due to flaws in the random initialization.

Indexing (details)


Subject
Computer science
Classification
0984: Computer science
Identifier / keyword
Applied sciences; HF optimizer; Hessian-free optimizer; RNNs; Recurrent Neural Networks; Restricted Boltzmann Machines; Temporal dependencies
Title
Training Recurrent Neural Networks
Author
Sutskever, Ilya
Number of pages
149
Degree date
2013
School code
0779
Source
DAI-B 75/06(E), Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
978-0-499-22066-0
Advisor
Hinton, Geoffrey
University/institution
University of Toronto (Canada)
University location
Canada -- Ontario, CA
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
NS22066
ProQuest document ID
1501655550
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
https://www.proquest.com/docview/1501655550