-
Representation Learning - greedy layer-wise unsupervised pretraining개인 공부/딥러닝 기초 개념 2023. 9. 3. 16:42
Unsupervised learning played a key historical role in the revival of deep neural networks, enabling researchers for the first time to train a deep supervised network without requiring architectural specialization like convolution or recurrence. We call this procedure unsupervised pretraining, or more precisely, greedy layer-wise unsupervised pretraining. This procedure is a canonical example of how a representation learned for one task can sometimes be useful for another task.
Greedy layer-wise unsupervised pretraining relies on a single-layer representation learning algorithm such as RBM, a single-layer autoencoder, a sparse coding model, or another model that learns latent representations. Each layer is pretrained using unsupervised learning, taking the output of the previous layer and producing as output a new representation of the data, whose distribution is hopefully simper.
Greedy layer-wise training procedures based on unsupervised criteria have long been used to sidestep the difficulty of jointly training the layers of a deep neural net for a supervised task. This approach dates back at least as far as the Neocognitron. The deep learning renaissance of 2006 began with the discovery that this greedy learning procedure could be used to find a good initialization for a joint learning procedure over all the layers, and that this approach could be used to successfully train even fully connected architectures.
Prior to this discovery, only convolutional deep networks or networks whose depth resulted from recurrence were regarded as feasible to train. Today, we now know that greedy layer-wise pretraining is not required to train fully connected deep architectures, but the unsupervised pretraining approach was the first method to succeed.
Greedy layer-wise pretraining is called greedy because it is a greedy algorithm, meaning that it optimizes each piece of the solution independently, one piece at a time, rather than jointly optimizing all pieces. It is layer-wise because these independent pieces are the layers of the network. Specially, greedy layer-wise pretraining proceeds one layer at a time, training the k-th layer while keeping the previous ones fixed. In particular, the lower layers are not adapted after the upper layers are introduced. It is called unsupervised because each layer is trained with an unsupervised representation learning algorithm. However it is also pretraining, because it is supposed to be only a first step before a joint training algorithm is applied to fine-tune all the layers together. In the context of supervised learning task, it can be viewed as a regularizer and a form of parameter initialization.
It is common to use the word "pretraining" to refer not only the pretraining stage itself but to the entire two phase protocol that combines the pretraining phase and supervised learning phase. The supervised learning phase may involve tarning simple classifier on the top of the features learned in the pretraining phase, or it may involve supervised fine-tuing of the entire network learned in the pretraining phase. No matter what kind of unsupervised learning algorithm or what model type is employed, in the vast majority of cases, the overall training scheme is nearly the same. While the choice of unsupervised learning algorithm will obviously impact the details, most applications of unsupervised pretraining follow this basic protocol.
'개인 공부 > 딥러닝 기초 개념' 카테고리의 다른 글
Representation Learning - Introduction (0) 2023.08.26 Eigendecomposition (0) 2023.08.26