Autoencoder一定要excel输入相同内容和输出相同吗

评论-2206&
  前言:
  本次是练习2个隐含层的网络的训练方法,每个网络层都是用的sparse autoencoder思想,利用两个隐含层的网络来提取出输入数据的特征。本次实验验要完成的任务是对MINST进行手写数字识别,实验内容及步骤参考网页教程。当提取出手写数字图片的特征后,就用softmax进行对其进行分类。关于MINST的介绍可以参考网页:。本文的理论介绍也可以参考前面的博文:。
  实验基础:
  进行deep network的训练方法大致如下:
  1. 用原始输入数据作为输入,训练出(利用sparse autoencoder方法)第一个隐含层结构的网络参数,并将用训练好的参数算出第1个隐含层的输出。
  2. 把步骤1的输出作为第2个网络的输入,用同样的方法训练第2个隐含层网络的参数。
  3. 用步骤2 的输出作为多分类器softmax的输入,然后利用原始数据的标签来训练出softmax分类器的网络参数。
  4. 计算2个隐含层加softmax分类器整个网络一起的损失函数,以及整个网络对每个参数的偏导函数值。
  5. 用步骤1,2和3的网络参数作为整个深度网络(2个隐含层,1个softmax输出层)参数初始化的值,然后用lbfs算法迭代求出上面损失函数最小值附近处的参数值,并作为整个网络最后的最优参数值。
  上面的训练过程是针对使用softmax分类器进行的,而softmax分类器的损失函数等是有公式进行计算的。所以在进行参数校正时,可以对把所有网络看做是一个整体,然后计算整个网络的损失函数和其偏导,这样的话当我们有了标注好了的数据后,就可以用前面训练好了的参数作为初始参数,然后用优化算法求得整个网络的参数了。但如果我们后面的分类器不是用的softmax分类器,而是用的其它的,比如svm,随机森林等,这个时候前面特征提取的网络参数已经预训练好了,用该参数是可以初始化前面的网络,但是此时该怎么微调呢?因为此时标注的数值只能在后面的分类器中才用得到,所以没法计算系统的损失函数等。难道又要将前面n层网络的最终输出等价于第一层网络的输入(也就是多网络的sparse autoencoder)?本人暂时还没弄清楚,日后应该会想明白的。
  关于深度网络的学习几个需要注意的小点(假设隐含层为2层):
利用sparse autoencoder进行预训练时,需要依次计算出每个隐含层的输出,如果后面是采用softmax分类器的话,则同样也需要用最后一个隐含层的输出作为softmax的输入来训练softmax的网络参数。
由步骤1可知,在进行参数校正之前是需要对分类器的参数进行预训练的。且在进行参数校正(Finetuning&)时是将所有的隐含层看做是一个单一的网络层,因此每一次迭代就可以更新所有网络层的参数。
  另外在实际的训练过程中可以看到,训练第一个隐含层所用的时间较长,应该需要训练的参数矩阵为200*784(没包括b参数),训练第二个隐含层的时间较第一个隐含层要短些,主要原因是此时只需学习到200*200的参数矩阵,其参数个数大大减小。而训练softmax的时间更短,那是因为它的参数个数更少,且损失函数和偏导的计算公式也没有前面两层的复杂。最后对整个网络的微调所用的时间和第二个隐含层的训练时间长短差不多。
&  程序中部分函数:
  [params, netconfig] = stack2params(stack)
  是将stack层次的网络参数(可能是多个参数)转换成一个向量params,这样有利用使用各种优化算法来进行优化操作。Netconfig中保存的是该网络的相关信息,其中netconfig.inputsize表示的是网络的输入层节点的个数。netconfig.layersizes中的元素分别表示每一个隐含层对应节点的个数。&
  [ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, numClasses, netconfig,lambda, data, labels)
  该函数内部实现整个网络损失函数和损失函数对每个参数偏导的计算。其中损失函数是个实数值,当然就只有1个了,其计算方法是根据sofmax分类器来计算的,只需知道标签值和softmax输出层的值即可。而损失函数对所有参数的偏导却有很多个,因此每个参数处应该就有一个偏导值,这些参数不仅包括了多个隐含层的,而且还包括了softmax那个网络层的。其中softmax那部分的偏导是根据其公式直接获得,而深度网络层那部分这通过BP算法方向推理得到(即先计算每一层的误差值,然后利用该误差值计算参数w和b)。&
  stack = params2stack(params, netconfig)
  和上面的函数功能相反,是吧一个向量参数按照深度网络的结构依次展开。
  [pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data)
  这个函数其实就是对输入的data数据进行预测,看该data对应的输出类别是多少。其中theta为整个网络的参数(包括了分类器部分的网络),numClasses为所需分类的类别,netconfig为网络的结构参数。
  [h, array] = display_network(A, opt_normalize, opt_graycolor, cols, opt_colmajor)
  该函数是用来显示矩阵A的,此时要求A中的每一列为一个权值,并且A是完全平方数。函数运行后会将A中每一列显示为一个小的patch图像,具体的有多少个patch和patch之间该怎么摆设是程序内部自动决定的。
&  &matlab内嵌函数:
  struct:
  &&s =表示创建一个结构数组s。
  nargout:
  表示函数输出参数的个数。
  save:
  比如函数save('saves/step2.mat', 'sae1OptTheta');则要求当前目录下有saves这个目录,否则该语句会调用失败的。
  实验结果:
  第一个隐含层的特征值如下所示:
  第二个隐含层的特征值显示不知道该怎么弄,因为第二个隐含层每个节点都是对应的200维,用display_network这个函数去显示的话是不行的,它只能显示维数能够开平方的那些特征,所以不知道是该将200弄成20*10,还是弄成16*25好,很好奇关于deep learning那么多文章中第二层网络是怎么显示的,将200分解后的显示哪个具有代表性呢?待定。所以这里暂且不显示,因为截取200前面的196位用display_network来显示的话,什么都看不出来:
  没有经过网络参数微调时的识别准去率为:
  Before Finetuning Test Accuracy: 92.190%
  经过了网络参数微调后的识别准确率为:
  After Finetuning Test Accuracy: 97.670%
  实验主要部分代码及注释:
  stackedAEExercise.m:&  
%% CS294A/CS294W Stacked Autoencoder Exercise
Instructions
------------
This file contains code that helps you get started on the
sstacked autoencoder exercise. You will need to complete code in
stackedAECost.m
You will also need to have implemented sparseAutoencoderCost.m and
softmaxCost.m from previous exercises. You will need the initializeParameters.m
loadMNISTImages.m, and loadMNISTLabels.m files from previous exercises.
For the purpose of completing the assignment, you do not need to
change the code in this file.
%%======================================================================
%% STEP 0: Here we provide the relevant parameters values that will
allow your sparse autoencoder to you do not need to
change the parameters below.
DISPLAY = true;
inputSize = 28 * 28;
numClasses = 10;
hiddenSizeL1 = 200;
% Layer 1 Hidden Size
hiddenSizeL2 = 200;
% Layer 2 Hidden Size
sparsityParam = 0.1;
% desired average activation of the hidden units.
% (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
in the lecture notes).
lambda = 3e-3;
% weight decay parameter
% weight of sparsity penalty term
%%======================================================================
%% STEP 1: Load data from the MNIST database
This loads our training data from the MNIST database files.
% Load MNIST database files
trainData = loadMNISTImages('train-images.idx3-ubyte');
trainLabels = loadMNISTLabels('train-labels.idx1-ubyte');
trainLabels(trainLabels == 0) = 10; % Remap 0 to 10 since our labels need to start from 1
%%======================================================================
%% STEP 2: Train the first sparse autoencoder
This trains the first sparse autoencoder on the unlabelled STL training
If you've correctly implemented sparseAutoencoderCost.m, you don't need
to change anything here.
Randomly initialize the parameters
sae1Theta = initializeParameters(hiddenSizeL1, inputSize);
%% ---------------------- YOUR CODE HERE
---------------------------------
Instructions: Train the first layer sparse autoencoder, this layer has
an hidden size of "hiddenSizeL1"
You should store the optimal parameters in sae1OptTheta
addpath minFunc/;
options.Method = 'lbfgs';
options.maxIter = 400;
options.display = 'on';
[sae1OptTheta, cost] =
minFunc(@(p)sparseAutoencoderCost(p,...
inputSize,hiddenSizeL1,lambda,sparsityParam,beta,trainData),sae1Theta,options);%训练出第一层网络的参数
save('saves/step2.mat', 'sae1OptTheta');
if DISPLAY
W1 = reshape(sae1OptTheta(1:hiddenSizeL1 * inputSize), hiddenSizeL1, inputSize);
display_network(W1');
% -------------------------------------------------------------------------
%%======================================================================
%% STEP 2: Train the second sparse autoencoder
This trains the second sparse autoencoder on the first autoencoder
If you've correctly implemented sparseAutoencoderCost.m, you don't need
to change anything here.
[sae1Features] = feedForwardAutoencoder(sae1OptTheta, hiddenSizeL1, ...
inputSize, trainData);
Randomly initialize the parameters
sae2Theta = initializeParameters(hiddenSizeL2, hiddenSizeL1);
%% ---------------------- YOUR CODE HERE
---------------------------------
Instructions: Train the second layer sparse autoencoder, this layer has
an hidden size of "hiddenSizeL2" and an inputsize of
"hiddenSizeL1"
You should store the optimal parameters in sae2OptTheta
[sae2OptTheta, cost] =
minFunc(@(p)sparseAutoencoderCost(p,...
hiddenSizeL1,hiddenSizeL2,lambda,sparsityParam,beta,sae1Features),sae2Theta,options);%训练出第一层网络的参数
save('saves/step3.mat', 'sae2OptTheta');
if DISPLAY
W11 = reshape(sae1OptTheta(1:hiddenSizeL1 * inputSize), hiddenSizeL1, inputSize);
W12 = reshape(sae2OptTheta(1:hiddenSizeL2 * hiddenSizeL1), hiddenSizeL2, hiddenSizeL1);
% TODO(zellyn): figure out how to display a 2-level network
display_network(log(W11' ./ (1-W11')) * W12');
W12_temp = W12(1:196,1:196);
display_network(W12_temp');
display_network(W12_temp');
% -------------------------------------------------------------------------
%%======================================================================
%% STEP 3: Train the softmax classifier
This trains the sparse autoencoder on the second autoencoder features.
If you've correctly implemented softmaxCost.m, you don't need
to change anything here.
[sae2Features] = feedForwardAutoencoder(sae2OptTheta, hiddenSizeL2, ...
hiddenSizeL1, sae1Features);
Randomly initialize the parameters
saeSoftmaxTheta = 0.005 * randn(hiddenSizeL2 * numClasses, 1);
%% ---------------------- YOUR CODE HERE
---------------------------------
Instructions: Train the softmax classifier, the classifier takes in
input of dimension "hiddenSizeL2" corresponding to the
hidden layer size of the 2nd layer.
You should store the optimal parameters in saeSoftmaxOptTheta
NOTE: If you used softmaxTrain to complete this part of the exercise,
set saeSoftmaxOptTheta = softmaxModel.optTheta(:);
softmaxLambda = 1e-4;
numClasses = 10;
softoptions =
softoptions.maxIter = 400;
softmaxModel = softmaxTrain(hiddenSizeL2,numClasses,softmaxLambda,...
sae2Features,trainLabels,softoptions);
saeSoftmaxOptTheta = softmaxModel.optTheta(:);
save('saves/step4.mat', 'saeSoftmaxOptTheta');
% -------------------------------------------------------------------------
%%======================================================================
%% STEP 5: Finetune softmax model
% Implement the stackedAECost to give the combined cost of the whole model
% then run this cell.
% Initialize the stack using the parameters learned
stack = cell(2,1);
%其中的saelOptTheta和sae1ptTheta都是包含了sparse autoencoder的重建层网络权值的
stack{1}.w = reshape(sae1OptTheta(1:hiddenSizeL1*inputSize), ...
hiddenSizeL1, inputSize);
stack{1}.b = sae1OptTheta(2*hiddenSizeL1*inputSize+1:2*hiddenSizeL1*inputSize+hiddenSizeL1);
stack{2}.w = reshape(sae2OptTheta(1:hiddenSizeL2*hiddenSizeL1), ...
hiddenSizeL2, hiddenSizeL1);
stack{2}.b = sae2OptTheta(2*hiddenSizeL2*hiddenSizeL1+1:2*hiddenSizeL2*hiddenSizeL1+hiddenSizeL2);
% Initialize the parameters for the deep model
[stackparams, netconfig] = stack2params(stack);
stackedAETheta = [ saeSoftmaxOptT stackparams ];%stackedAETheta是个向量,为整个网络的参数,包括分类器那部分,且分类器那部分的参数放前面
%% ---------------------- YOUR CODE HERE
---------------------------------
Instructions: Train the deep network, hidden size here refers to the '
dimension of the input to the classifier, which corresponds
to "hiddenSizeL2".
[stackedAEOptTheta, cost] =
minFunc(@(p)stackedAECost(p,inputSize,hiddenSizeL2,...
numClasses, netconfig,lambda, trainData, trainLabels),...
stackedAETheta,options);%训练出第一层网络的参数
save('saves/step5.mat', 'stackedAEOptTheta');
if DISPLAY
optStack = params2stack(stackedAEOptTheta(hiddenSizeL2*numClasses+1:end), netconfig);
W11 = optStack{1}.w;
W12 = optStack{2}.w;
% TODO(zellyn): figure out how to display a 2-level network
% display_network(log(1 ./ (1-W11')) * W12');
% -------------------------------------------------------------------------
%%======================================================================
%% STEP 6: Test
Instructions: You will need to complete the code in stackedAEPredict.m
before running this part of the code
% Get labelled test images
% Note that we apply the same kind of preprocessing as the training set
testData = loadMNISTImages('t10k-images.idx3-ubyte');
testLabels = loadMNISTLabels('t10k-labels.idx1-ubyte');
testLabels(testLabels == 0) = 10; % Remap 0 to 10
[pred] = stackedAEPredict(stackedAETheta, inputSize, hiddenSizeL2, ...
numClasses, netconfig, testData);
acc = mean(testLabels(:) == pred(:));
fprintf('Before Finetuning Test Accuracy: %0.3f%%\n', acc * 100);
[pred] = stackedAEPredict(stackedAEOptTheta, inputSize, hiddenSizeL2, ...
numClasses, netconfig, testData);
acc = mean(testLabels(:) == pred(:));
fprintf('After Finetuning Test Accuracy: %0.3f%%\n', acc * 100);
% Accuracy is the proportion of correctly classified images
% The results for our implementation were:
% Before Finetuning Test Accuracy: 87.7%
% After Finetuning Test Accuracy:
% If your values are too low (accuracy less than 95%), you should check
% your code for errors, and make sure you are training on the
% entire data set of 60000 28x28 training images
% (unless you modified the loading code, this should be the case)
  stackedAECost.m:&  
function [ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, ...
numClasses, netconfig, ...
lambda, data, labels)
% stackedAECost: Takes a trained softmaxTheta and a training data set with labels,
% and returns cost and gradient using a stacked autoencoder model. Used for
% finetuning.
% theta: trained weights from the autoencoder
% visibleSize: the number of input units
% hiddenSize:
the number of hidden units *at the 2nd layer*
% numClasses:
the number of categories
% netconfig:
the network configuration of the stack
the weight regularization penalty
% data: Our matrix containing the training data as columns.
So, data(:,i) is the i-th training example.
% labels: A vector containing labels, where labels(i) is the label for the
% i-th training example
%% Unroll softmaxTheta parameter
% We first extract the part which compute the softmax gradient
softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);
% Extract out the "stack"
stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);
% You will need to compute the following gradients
softmaxThetaGrad = zeros(size(softmaxTheta));
stackgrad = cell(size(stack));
for d = 1:numel(stack)
stackgrad{d}.w = zeros(size(stack{d}.w));
stackgrad{d}.b = zeros(size(stack{d}.b));
cost = 0; % You need to compute this
% You might find these variables useful
M = size(data, 2);
groundTruth = full(sparse(labels, 1:M, 1));
%% --------------------------- YOUR CODE HERE -----------------------------
Instructions: Compute the cost function and gradient vector for
the stacked autoencoder.
You are given a stack variable which is a cell-array of
the weights and biases for every layer. In particular, you
can refer to the weights of Layer d, using stack{d}.w and
the biases using stack{d}.b . To get the total number of
layers, you can use numel(stack).
The last layer of the network is connected to the softmax
classification layer, softmaxTheta.
You should compute the gradients for the softmaxTheta,
storing that in softmaxThetaGrad. Similarly, you should
compute the gradients for each layer in the stack, storing
the gradients in stackgrad{d}.w and stackgrad{d}.b
Note that the size of the matrices in stackgrad should
match exactly that of the size of the matrices in stack.
depth = numel(stack);
z = cell(depth+1,1);
a = cell(depth+1, 1);
for layer = (1:depth)
z{layer+1} = stack{layer}.w * a{layer} + repmat(stack{layer}.b, [1, size(a{layer},2)]);
a{layer+1} = sigmoid(z{layer+1});
M = softmaxTheta * a{depth+1};
M = bsxfun(@minus, M, max(M));
p = bsxfun(@rdivide, exp(M), sum(exp(M)));
cost = -1/numClasses * groundTruth(:)' * log(p(:)) + lambda/2 * sum(softmaxTheta(:) .^ 2);
softmaxThetaGrad = -1/numClasses * (groundTruth - p) * a{depth+1}' + lambda * softmaxT
d = cell(depth+1);
d{depth+1} = -(softmaxTheta' * (groundTruth - p)) .* a{depth+1} .* (1-a{depth+1});
for layer = (depth:-1:2)
d{layer} = (stack{layer}.w' * d{layer+1}) .* a{layer} .* (1-a{layer});
for layer = (depth:-1:1)
stackgrad{layer}.w = (1/numClasses) * d{layer+1} * a{layer}';
stackgrad{layer}.b = (1/numClasses) * sum(d{layer+1}, 2);
% -------------------------------------------------------------------------
%% Roll gradient vector
grad = [softmaxThetaGrad(:) ; stack2params(stackgrad)];
% You might find this useful
function sigm = sigmoid(x)
sigm = 1 ./ (1 + exp(-x));
  stackedAEPredict.m:&  
function [pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data)
% stackedAEPredict: Takes a trained theta and a test data set,
% and returns the predicted labels for each example.
% theta: trained weights from the autoencoder
% visibleSize: the number of input units
% hiddenSize:
the number of hidden units *at the 2nd layer*
% numClasses:
the number of categories
% data: Our matrix containing the training data as columns.
So, data(:,i) is the i-th training example.
% Your code should produce the prediction matrix
% pred, where pred(i) is argmax_c P(y(c) | x(i)).
%% Unroll theta parameter
% We first extract the part which compute the softmax gradient
softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);
% Extract out the "stack"
stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);
%% ---------- YOUR CODE HERE --------------------------------------
Instructions: Compute pred using theta assuming that the labels start
depth = numel(stack);
z = cell(depth+1,1);
a = cell(depth+1, 1);
for layer = (1:depth)
z{layer+1} = stack{layer}.w * a{layer} + repmat(stack{layer}.b, [1, size(a{layer},2)]);
a{layer+1} = sigmoid(z{layer+1});
[~, pred] = max(softmaxTheta * a{depth+1});%閫夋?鐜囨渶澶х殑閭d釜杈撳嚭鍊?
% -----------------------------------------------------------
% You might find this useful
function sigm = sigmoid(x)
sigm = 1 ./ (1 + exp(-x));
  参考资料:
阅读(...) 评论()
阿萨德发斯蒂芬干货| 自动编码器基础知识 + 代码实例
干货| 自动编码器基础知识 + 代码实例
很显然,深度学习即将对我们的社会产生重大显著的影响。Mobibit 创始人兼 CEO Pramod Chandrayan 近日在 codeburst.io 上发文对自动编码器的基础知识和类型进行了介绍并给出了代码实例。当人类大脑与深度学习机器合作时:在我们开始揭秘深度网络之前,让我们先定义一下深度学习。根据我的理解:「深度学习是一种先进的机器学习技术,其中存在多个彼此通信的抽象层,每一层都与前一层深度相连,并根据前一层馈送的输出进行决策。」Investopedia 将深度学习定义成:「深度学习是人工智能(AI)领域中机器学习中的一个子集,其有网络状的结构,可以从非结构化或无标记的数据中以无监督的方式学习。也被称为深度神经学习或深度神经网络。」今天我们将深入解读无监督预训练网络(Unsupervised Pertained Networks)的工作方式。UPN:无监督预训练网络这种无监督学习网络可以进一步分类成自动编码器深度信念网络(DBN)生成对抗网络(GAN)自动编码器是一种有三层的神经网络:输入层、隐藏层(编码层)和解码层。该网络的目的是重构其输入,使其隐藏层学习到该输入的良好表征。自动编码器神经网络是一种无监督机器学习算法,其应用了反向传播,可将目标值设置成与输入值相等。自动编码器的训练目标是将输入复制到输出。在内部,它有一个描述用于表征其输入的代码的隐藏层。自动编码器的目标是学习函数 h(x)≈x。换句话说,它要学习一个近似的恒等函数,使得输出 x^ 近似等于输入 x。自动编码器属于神经网络家族,但它们也和 PCA(主成分分析)紧密相关。关于自动编码器的一些关键事实:它是一种类似于 PCA 的无监督机器学习算法它要最小化和 PCA 一样的目标函数它是一种神经网络这种神经网络的目标输出就是其输入尽管自动编码器与 PCA 很相似,但自动编码器比 PCA 灵活得多。在编码过程中,自动编码器既能表征线性变换,也能表征非线性变换;而 PCA 只能执行线性变换。因为自动编码器的网络表征形式,所以可将其作为层用于构建深度学习网络。自动编码器的类型:1. 去噪自动编码器2. 稀疏自动编码器3. 变分自动编码器(VAE)4. 收缩自动编码器(CAE/contractive autoencoder)A. 去噪自动编码器这是最基本的一种自动编码器,它会随机地部分采用受损的输入来解决恒等函数风险,使得自动编码器必须进行恢复或去噪。这项技术可用于得到输入的良好表征。良好的表征是指可以从受损的输入稳健地获得的表征,该表征可被用于恢复其对应的无噪声输入。去噪自动编码器背后的思想很简单。为了迫使隐藏层发现更加稳健的特征并且为了防止其只是学习其中的恒等关系,我们在训练自动编码器时会让其从受损的版本中重建输入。应用在输入上的噪声量以百分比的形式呈现。一般来说,30% 或 0.3 就很好,但如果你的数据非常少,你可能就需要考虑增加更多噪声。堆叠的去噪自动编码器(SDA):这是一种在层上使用了无监督预训练机制的去噪自编码器,其中当一层被预训练用于在之前层的输入上执行特征选择和特征提取后,后面会跟上一个监督式的微调(fine-tuning)阶段。SDA 只是将很多个去噪自动编码器融合在了一起。一旦前面 k 层训练完成,我们就可以训练第 k+1 层,因为我们现在可以根据下面的层计算代码或隐含表征。一旦所有层都预训练完成,网络就会进入一个被称为微调的阶段。在这里我们会为微调使用监督学习机制,以最小化被监督任务上的预测误差。然后,我们以训练多层感知器的方式训练整个网络。在这个阶段,我们仅考虑每个自动编码器的编码部分。这个阶段是有监督的,自此以后我们就在训练中使用目标类别了。使用代码示例解释 SDA这一节源自 deeplearning.net(对于想要理解深度学习的人来说,这个网站提供了很好的参考),其中使用案例对堆叠的去噪自动编码器进行了很好的解释。我们可以以两种方式看待堆叠的去噪自动编码器:一是自动编码器列表,二是多层感知器(MLP)。在预训练过程中,我们使用了第一种方式,即我们将我们的模型看作是一组自动编码器列表,并分开训练每个自动编码器。在第二个训练阶段,我们使用第二种方式。这两种方式是有联系的,因为:自动编码器和 MLP 的 sigmoid 层共享参数;MLP 的中间层计算出的隐含表征被用作自动编码器的输入。class
):&&&Stacked denoising auto-encoder class (SdA)A stacked denoising autoencoder model is obtained by stacking severaldAs. The hidden layer of the dA at layer `i` becomes the input ofthe dA at layer `i+1`. The first layer dA gets as input the input ofthe SdA, and the hidden layer of the last dA represents the output.Note that after pretraining, the SdA is dealt with as a normal MLP,the dAs are only used to initialize the weights.&&&def
,numpy_rng
,theano_rng
,hidden_layers_sizes
,corruption_levels
]):&&& This class is made to support a variable number of layers.:type numpy_rng: numpy.random.RandomState:param numpy_rng: numpy random number generator used to draw initialweights:type theano_rng: theano.tensor.shared_randomstreams.RandomStreams:param theano_rng: The if None is given one isgenerated based on a seed drawn from `rng`:type n_ins: int:param n_ins: dimension of the input to the sdA:type hidden_layers_sizes: list of ints:param hidden_layers_sizes: intermediate layers size, must containat least one value:type n_outs: int:param n_outs: dimension of the output of the network:type corruption_levels: list of float:param corruption_levels: amount of corruption to use for eachlayerself
sigmoid_layers
hidden_layers_sizes
theano_rng
:theano_rng
RandomStreams
))# allocate symbolic variables for the dataself
'x'
# the data is presented as rasterized imagesself
'y'
# the labels are presented as 1D vector of# [int] labelsself.sigmoid_layers 将会存储 MLP 形式的 sigmoid 层,而 self.dA_layers 将会存储与该 MLP 层关联的去噪自动编码器。接下来,我们构建 n_layers sigmoid 层和 n_layers 去噪自动编码器,其中 n_layers 是我们的模型的深度。我们使用了多层感知器中引入的 HiddenLayer 类,但有一项修改:我们将 tanh 非线性替换成了 logistic 函数我们链接了 sigmoid 层来构建一个 MLP,而且我们在构建自动编码器时使得每个自动编码器的编码部分都与其对应的 sigmoid 层共享权重矩阵和偏置。for
):# construct the sigmoidal layer# the size of the input is either the number of hidden units of# the layer below or the input size if we are on the first layerif
:input_size
:input_size
hidden_layers_sizes
]# the input to this layer is either the activation of the hidden# layer below or the input of the SdA if you are on the first# layerlayer_input
xlayer_input
sigmoid_layers
outputsigmoid_layer
HiddenLayer
layer_input
input_size
hidden_layers_sizes
],activation
)# add the layer to our list of layersself
sigmoid_layers
sigmoid_layer
)# its arguably a philosophical question...# but we are going to only declare that the parameters of the# sigmoid_layers are parameters of the StackedDAA# the visible biases in the dA are parameters of those# dA, but not the SdAself
sigmoid_layer
)# Construct a denoising autoencoder that shared weights with thisdA_layer
,theano_rng
theano_rng
,n_visible
input_size
hidden_layers_sizes
sigmoid_layer
sigmoid_layer
)现在我们只需要在这个 sigmoid 层上添加一个 logistic 层即可,这样我们就有了一个 MLP。我们将使用 LogisticRegression 类,这个类是在使用 logistic 回归分类 MNIST 数字时引入的。# We now need to add a logistic layer on top of the MLPself
LogisticRegression
sigmoid_layers
hidden_layers_sizes
n_outs)self
)# construct a function that implements one step of finetunining# compute the cost for second phase of training,# defined as the negative log likelihoodself
finetune_cost
negative_log_likelihood
)# compute the gradients with respect to the model parameters# symbolic variable that points to the number of errors made on the# minibatch given by self.x and self.yself
)SdA 类也提供了一种为其层中的去噪自动编码器生成训练函数的方法。它们会作为一个列表返回,其中元素 i 是一个函数——该函数实现了训练对应于第 i 层的 dA 的步骤。def
pretraining_functions
train_set_x
batch_size
):''' Generates a list of functions, each of them implementing onestep in trainnig the dA corresponding to the layer with same index.The function will require as input the minibatch index, and to traina dA you just need to iterate, calling the corresponding function onall minibatch indexes.:type train_set_x: theano.tensor.TensorType:param train_set_x: Shared variable that contains all datapoints usedfor training the dA:type batch_size: int:param batch_size: size of a [mini]batch:type learning_rate: float:param learning_rate: learning rate used during training for any ofthe dA layers'''# index to a [mini]batchindex
'index'
# index to a minibatch为了修改训练过程中的受损水平或学习率,我们将它们与 Theano 变量联系了起来。corruption_level
'corruption'
# % of corruption to uselearning_rate
'lr'
# learning rate to use# begining of a batch, given `index`batch_begin
batch_size# ending of a batch given `index`batch_end
batch_begin
batch_sizepretrain_fns
:# get the cost and the updates listcost
get_cost_updates
corruption_level
,learning_rate
)# compile the theano functionfn
corruption_level
learning_rate
)],outputs
train_set_x
batch_begin
]}# append `fn` to the list of functionspretrain_fns
pretrain_fns现在任意 pretrain_fns[i] 函数都可以使用索引参数了,可选的有 corruption(受损水平)或 lr(学习率)。注意这些参数名是在它们被构建时赋予 Theano 变量的名字,而不是 Python 变量(learning_rate 或 corruption_level)的名字。在使用 Theano 时一定要记住这一点。我们用同样的方式构建了用于构建微调过程中所需函数的方法(train_fn、valid_score 和 test_score)。def
build_finetune_functions
batch_size
learning_rate
):'''Generates a function `train` that implements one step offinetuning, a function `validate` that computes the error ona batch from the validation set, and a function `test` thatcomputes the error on a batch from the testing set:type datasets: list of pairs of theano.tensor.TensorType:param datasets: It is a list that contthe has to contain three pairs, `train`,`valid`, `test` in this order, where each pairis formed of two Theano variables, one for thedatapoints, the other for the labels:param batch_size: size of a minibatch:param learning_rate: learning rate used during finetune stage(
train_set_x
train_set_y
valid_set_x
valid_set_y
test_set_x
test_set_y
]# compute number of minibatches for training, validation and testingn_valid_batches
valid_set_x
]n_valid_batches
batch_sizen_test_batches
test_set_x
]n_test_batches
batch_sizeindex
'index'
# index to a [mini]batchgparams
finetune_cost
)# compute list of fine-tuning updatesupdates
learning_rate
)]train_fn
finetune_cost
train_set_x
batch_size
batch_sizeself
train_set_y
'train'test_score_i
test_set_x
test_set_y
'test'valid_score_i
valid_set_x
valid_set_y
'valid'# Create a function that scans the entire validation setdef
valid_score
valid_score_i
n_valid_batches
)]# Create a function that scans the entire test setdef
test_score
test_score_i
n_test_batches
valid_score
test_score注意,valid_score 和 test_score 并不是 Theano 函数,而是分别在整个验证集和整个测试集上循环的 Python 函数,可以在这些集合上产生一个损失列表。总结下面给出的几行代码就构建了一个堆叠的去噪自动编码器:numpy_rng
RandomState
'... building the model'
)# construct the stacked denoising autoencoder classsda
(numpy_rng
,hidden_layers_sizes
10该网络的训练分两个阶段:逐层的预训练,之后是微调。对于预训练阶段,我们将在网络的所有层上进行循环。对于每个层,我们都将使用编译过的实现 SGD 步骤的函数,以优化权重,从而降低该层的重构成本。这个函数将根据 pretraining_epochs 在训练集上执行固定数量的 epoch。########################## PRETRAINING THE MODEL #print
'... getting the pretraining functions'
)pretraining_fns
pretraining_functions
train_set_x
train_set_x
,batch_size
batch_size
'... pre-training the model'
)start_time
default_timer
()## Pre-train layer-wisecorruption_levels
):# go through pretraining epochsfor
pretraining_epochs
):# go through the training setc
batch_index
n_train_batches
pretraining_fns
batch_index
,corruption
corruption_levels
pretrain_lr
'Pre-training layer %i, epoch %d, cost %f'
'float64'
)))end_time
default_timer
'The pretraining code for file '
+' ran for %.2fm'
start_time
)这里的微调循环和多层感知器中的微调过程很相似。唯一的区别是它使用了 build_finetune_functions 给出的函数。执行代码用户可以通过调用以下 Python CLI 来运行该代码:python code
py默认情况下,该代码会为每一层运行 15 次预训练 epoch,其批大小为 1。第一层的受损水平为 0.1,第二层为 0.2,第三层为 0.3。预训练的学习率为 0.001,微调学习率为 0.1。预训练耗时 585.01 分钟,每 epoch 平均 13 分钟。微调经历了 36 epoch,耗时 444.2 分钟,每 epoch 平均 12.34 分钟。最后的验证分数是 1.39%,测试分数是 1.3%。这些结果是在配置了 Intel Xeon E5430 @ 2.66GHz CPU 的机器上得到的,它有单线程的 GotoBLAS。
本文仅代表作者观点,不代表百度立场。系作者授权百家号发表,未经许可不得转载。
百家号 最近更新:
简介: 国内首家前沿科技媒体服务类平台
作者最新文章

我要回帖

更多关于 将相同类型的输入插孔 的文章

 

随机推荐