用caffe faster rcnn训练cnn怎么调参

羽毛球技术 | 体育赛事 | 英文歌曲 | 住宅风水 | 用户界面设计师 | 六爻 | 书籍改编电影 | 德国足球甲级联赛 | 欧美明星 | PLC | 中国足球 | aj1 | 国家队 | 拜仁慕尼黑足球俱乐部 | 小说创作 | 配音 | iOS应用 | NBA 2K | 古典音乐 | 面相 | 火影忍者 | 武汉大学 | 土拨鼠 | 营销策划 | 秦时明月之天行九歌 | 设计师 | 巴塞罗那足球俱乐部 | 尤文图斯 | 实况足球（游戏） | 少帅 | 罗玉凤 | 比利时 | 跑鞋 | 冷知识 | 肖战 | 李元胜 | 古琴 | 按键精灵 | 罗兰 | 徐波 | 激光手术 | 角色扮演 | 关晓彤 | 微电影 | safari | 北京国安 | 古汉语 | 曼彻斯特联 | 玄幻小说 | 科幻小说 | 双眼皮手术 | 主题曲 | 年会 | 检测仪 | 徒步 | 互联网公司 | 百度输入法 | 镜头 | 宜昌市 | 自拍 | 金蝶 | 电子烟 | 网站建设 | 广播体操 | 文身 | nba篮球 | 索尼(sony) | 天体物理学 | 痛风 | 象棋 | 牛皮癣 | 皮肤护理 | 周星驰（人物） | 试管婴儿 | 亚足联亚洲杯（AFC Asian Cup） | 健美 | 美术生 | 迅雷（软件） | 战斗机 | 穿越小说 | 张璐 | 姓氏 | 诸葛亮 | 后宫·甄嬛传（书籍） | 虎牙直播 | snh48 | 阿迪达斯 | 投影仪 | 组装机 | 微信群 | 阿迪达斯(adidas) | 网球王子 | 分子生物学 | 耽美 | 武磊 | 婚礼 | 表演 | 中国武术 | 动画电影 | Air Jordan | 张子枫 | 免费软件 | 相声演员 | 摩羯座 | 宿舍 | ansys | 法国足球甲级联赛 | 户外 | 剧场版 | 杨凡 | 科幻电影 | galgame | 融资 | 关节炎 | NBA季后赛 | 神话 | 王力宏（人物） | 建模 | 计算机病毒 | 广州恒大淘宝足球俱乐部 | 北京奥运会 | 电脑电源 | 百度翻译 | 字幕 | 讯飞输入法 | 海关 | 易烊千玺 | 深度学习 | 编辑器 | 澳门特别行政区 | 直播 | 流氓软件 | 事故 | 大片 | 李景亮 | 郭富城 | 日语歌曲 | 卡牌游戏 | 小品 | 东京 | 花卉 | 音乐剧 | 互联网创业 | 占卜 | 羽毛球拍 | 婆媳关系 | 日本动画 | 巴黎 | 拳击比赛 | 东南亚 | 足球经理（FM）（游戏） | youtube | 胡歌（演员） | 地铁跑酷 | 植发 | 张继科 | 三国 | 用户界面 | 演技 | 百度竞价 | 青梅竹马 | 移动硬盘 | 韩晓鹏 | 马龙 | 瘦腿 | 宠物医疗 | 巨蟹座 | 徐峥 | 天蝎座 | 胸肌 | 赵丽颖（演员） | adidas阿迪达斯 | 低音炮 | 星际争霸（游戏） | 豆瓣电影 | 微信开放平台 | 手绘 | 吉他学习 | 江苏卫视 | 模特 | 创意 | 团队管理 | 奢侈品 | 王源 | TANK | 笛子 | 偶像 | 莱斯特城 | 维生素 | 新百伦 | 国际物流 | 前女友 | 李小龙 | 华语流行音乐 | 猎头公司 | crm | 搏击项目 | 网站运营 | 鼻炎 | 篮球游戏 |

你的位置：网站首页 >> 频道首页 >>健身 >>用caffe faster rcnn训练cnn怎么调参

用caffe faster rcnn训练cnn怎么调参

来源：蜘蛛抓取(WebSpider) 时间：2016-07-30 02:01 标签： caffe rcnn

DIY Deep Learning for Vision: A Tutorial with Caffe 报告笔记 - 推酷
DIY Deep Learning for Vision: A Tutorial with Caffe 报告笔记
报告时间是北京时间 12月14日凌晨一点到两点，主讲人是 Caffe 团队的核心之一 Evan Shelhamer。第一次用
参加视频会议，效果真是不错。
报告后分享出了视频和展示文件（链接在收到邮件后会补上）。
Caffe 此前听过没用过，所以报告前自己试运行了一下，参照
。Caffe 安装、上手都很快，Protobuf 式的层定义很直观，模型修改或算法调整变得很容易，相当于只需要改配置文件。还找到了他们放在 Google Docs 上一个教程 PPT，
（已搬到墙里），后来发现这次报告的 PPT 就是在这个基础上修改的。
本次报告主要内容是
对机器学习、深度学习的一些介绍，包括若干深度学习的经典模型；
Caffe 的优势（模块化、速度、社区支持等）、基本结构（网络定义、层定义、Blob等）和用法（模型中损失函数、优化方法、共享权重等的配置、应用举例、参数调优的技巧），以及未来方向（CPU/GPU 并行化、Pythonification、Fully Convolutional Networks等）。
以下是报告中的截图配上自己的一点笔记，一手资料请参见上面给出的会后分享链接。
PPT 的首页取自该项目的一个在线
，输入图片 url，识别物体类别。
左边是浅层特征，各类别物体杂乱无章；右边是深度特征，一些类别有较为明显的分别。特别地， dog、bird、invertebrate 这三类动物类别离得较近，而 building、vehicle、commodity 这类无生命类别离得较近，可见深度特征的强大。
此外，在深层结构当中，隐层神经元的激活可能与特定的物体类别有关，比如有的神经元对人像敏感，而有的对数字或建筑物敏感，最下面一层是闪光灯（或与之类似，比如反光的脑门……）效果。
Caffe 的优势，网络结构的模块化和易表达是显然的，社区资源也同样强大，比如下两页内容。
Caffe 的 Reference Models 可供学术使用，比如 AlexNet、R-CNN、CaffeNet，包括模型定义、优化方法和预训练权重。
中有用户贡献的模型可供参考使用，比如 VGG、Network-in-Network。
Caffe 支持丰富的模型表达形式，包括 DAGs、Weight Sharing 以及 Siamese Network。
网络和层定义采用 protobuf 的样式。
Layer 指的是权重和偏置，可以定义连接数、权重初始化方法等。
Blob 是四维数据结构，保存节点上的数值以及模型参数，可以通过编程在 CPU 和 GPU 间传输。
模型定义之外，还需要一个指定优化策略的配置文件，用以训练模型。
使用 Caffe 训练的一般步骤就是
数据预处理；
模型定义；
求解策略定义；
此处给出了两个例子，
，都很好 follow。
调参中重点讲了一个模型迁移的实例，用某项任务已有模型的参数作为新任务模型的参数初始值，然后进行模型训练。
模型训练一般由浅入深，逐步降低学习速率，以保持预训练参数的某些性质。
接下来具体讲述了 Loss、Solver、DAG、Weight Sharing 的概念和配置。
对同一模型，不同 Solver 的表现有差。
一般深度学习模型是线性形式的，比如 LeNet，而 Caffe 支持 DAG 形式的模型。
Caffe 的近期动向，CPU/GPU 并行化、Pythonification、Fully Convolutional Networks等。
Caffe 的团队，拜 Yangqing Jia 师兄……
文献参考。
语音回答中，Evan 提到 UCB 的一个团队正在开发 Scala 接口，不过尚属实验性质；Caffe 团队在考虑和 UCB 的 AMP 团队合作，扩展到 Spark 这一计算平台上；除了已支持的 CPU/GPU 计算，也考虑扩展支持 OpenCl；对于 Theano、Torch，鼓励大家尝试、比较……
文字问答如下，由 Yangqing Jia 回复。
Q: Is the pre-trained model avaialbe for download to accelerate our work on other kinds of images?
A:FYI - for pretrained models that we release, please refer to the model zoo page here: http://caffe.berkeleyvision.org/model_zoo.html
Q: Android platform ?
A:People have asked about android/ios platforms. In principle this is possible since the code is purely in C, but of course some engineering efforts are needed to write makefiles like Android.mk for this. Our bandwidth is limited and we are focusing on the research part, but we welcome pull requests on github if you write one (and we thank you in advance)! Also, kindly check out the blog post by my colleague Pete Warden about our efforts on running with Jetson TK1: //how-to-run-the-caffe-deep-learning-vision-library-on-nvidias-jetson-mobile-gpu-board/
Q: Can you discuss status and/or considerations for adding opencl support (and so be vendor neutral, as opposed to NVIDIA CUDA)?
A:In terms of using OpenCL - it has been under discussion for a while, but we are kind of shortstaffed so we focus more on the research side - we welcome contributions from open-source communities of course, please join us at github :)
Q: do you have an online examples of unsupervised losses
A:For unsupevised losses and training there is a bundled example of an MNIST autoencoder.
“盗取”一页 PPT 作为本文总结。
已发表评论数()
请填写推刊名
描述不能大于100个字符!
权限设置：公开
仅自己可见
正文不准确
标题不准确
排版有问题
主题不准确
没有分页内容
图片无法显示
视频无法显示
与原文不一致在Caffe上利用自己的数据集进行微调
在Caffe上利用自己的数据集进行微调
参照的说明，完成了在caffe上利用自己的数据集进行微调，现在将整个过程记录如下。
一.准备数据集
1.准备原始数据集
数据集中包括训练集和测试集。我是在caffe-windows主目录下新建了lp文件夹，里面有train和val两个文件夹。train里面又有pos_train和neg_train两个文件夹，分别存放着用于训练的正样本和负样本（各有500张256*256的图片）。val文件夹里面存放着100张图片。
2.制作标签txt文件
这一步我们需要对数据集中的图片添加标签，然后存成txt文件。由于我对matlab比较熟悉，所以用matlab编写了相应的程序，如下：
pos_folder="your path";
neg_folder = "your path";
image_width = 120;
%图像小块的宽度
image_height = 34;
%图像小块的高度
pixel_total = image_width * image_height*3; %图像中像素点的数目
pos = dir(pos_folder);
[pos_row,pos_line] = size(pos);
neg = dir(neg_folder);
[neg_row,neg_line] = size(neg);
fid=fopen('train.txt','w');
for i=3:pos_row
img_name = [pos_folder,pos(i).name];
[a,b]=strtok(img_name,pos_folder);
d=strcat(a,b);
d=strcat('train_pos/',d);
fprintf(fid,'%s %d\n',d,1);
for i=3:neg_row
img_name = [neg_folder,neg(i).name];
[a,b]=strtok(img_name,neg_folder);
d=strcat(a,b);
d=strcat('train_neg/',d);
fprintf(fid,'%s %d\n',d,2);
fclose(fid);
上述程序生成的train.txt内容如下：
同理可制作val.txt,如下所示：
3.利用caffe把数据集转换成leveldb的格式
打开maincaller.cpp，编译convert_imageset.cpp.，
然后在caffe主目录下建立一个convert.bat文件，内容如下：
SET GLOG_logtostderr=1
bin\MainCaller.exe data/lp/train/
data/lp/train.txt data/lp/mtrainlb 0
双击convert.bat即可生成mtrainlb文件夹，这里面存放的就是leveldb格式的数据。
二.开始微调自己的网络
首先选择编译fine_tune.cpp，生成maincaller.exe
我直接使用的是imagenet的网络结构，所以将examples/imagenet里面的imagenet_train.prototxt、imagenet_val.prototxt、imagenet_solver.prototxt直接拷过来修改一下。
imagenet_train.prototxt、imagenet_val.prototxt改动的地方主要是：
source:mtrainlb
fc8的输出改成2层
接下来同样创建一个finetune.bat文件，内容如下：
copy ..\..\bin\MainCaller.exe ..\..\bin\fintune.exe
SET GLOG_logtostderr=1
“../../bin/fintune.exe” imagenet_solver.prototxt caffe_reference_imagenet_model
然后双击运行，一开始的时候报错：
mtrainldb文件中的：MANIFEST-000007句柄无效的提示
然后程序就崩溃了。
在这个问题上折腾了很多时间，最后仔细观察命令行的打印信息才发现：
我这边出现这个问题的原因是train.protxt和test.protxt会使用同一个leveldb文件，因此会重复打开而失败。我就把mtrainldb复制了一下，重命名为mtrainldb2，然后把val.protxt里面的source改成mtrainldb2就好了。
接下来双击fine_tune.exe就可以开始微调了。CNN的训练方面目前接触比较少，接下来会研究CNN训练的一些trick 参数等等。
我的热门文章
即使是一小步也想与你分享Caffe 实例笔记 1 CaffeNet从训练到分类及可视化参数特征微调
Caffe 实例笔记 1 CaffeNet从训练到分类及可视化参数特征微调
本文主要分四部分
1. 在命令行进行训练
2. 使用pycaffe进行分类及特征可视化
3. 进行微调，将caffenet使用在图片风格的预测上
1 使用caffeNet训练自己的数据集
主要参考：
官方网址：
数据集及第一部分参考网址：
主要步骤：
1. 准备数据集
2. 标记数据集
3. 创建lmdb格式的数据
4. 计算均值
5. 设置网络及求解器
6. 运行求解
由于imagenet的数据集太大，博主电脑显卡840m太弱，所以就选择了第二个网址中的数据集
，其训练集为1000张10类图片，验证集为200张图片，原作者已经整理好其标签放于对应的txt文件中，所以这里就省去上面的1-2步骤。
1.1 创建lmdb
使用对应的数据集创建lmdb：
这里使用 examples/imagenet/create_imagenet.sh，需要更改其路径和尺寸设置的选项，为了减小更改的数目，这里并没有自己新创建一个文件夹，而是直接使用了原来的imagenet的文件夹，而且将train.txt,val.txt都放置于/data/ilsvrc12中，
TRAIN_DATA_ROOT=/home/beatree/caffe-rc3/examples/imagenet/train/
VAL_DATA_ROOT=/home/beatree/caffe-rc3/examples/imagenet/val/
RESIZE=true
注意下面的地址的含义：
echo "Creating train lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$TRAIN_DATA_ROOT \
$DATA/train.txt \
$EXAMPLE/ilsvrc12_train_lmdb
主要用了tools里的convert_imageset
1.2 计算均值
模型需要我们从每张图片减去均值,所以我们需要获得训练的均值，直接利用./examples/imagenet/make_imagenet_mean.sh创建均值文件binaryproto，如果之前创建了新的路径，这里同样需要修改sh文件里的路径。
这里的主要语句是
$TOOLS/compute_image_mean $EXAMPLE/ilsvrc12_train_lmdb \
$DATA/imagenet_mean.binaryproto
如果显示Check failed: size_in_datum == data_size () Incorrect data field size说明上一步的图片没有统一尺寸
1.3 设置网络及求解器
这里是利用原文的网络设置tain_val.prototxt和slover.prototext，在models/bvlc_reference_caffenet/solver.prototxt路径中，这里的训练和验证的网络基本一样用 include { phase: TRAIN } or include { phase: TEST }和来区分，其两点不同之处具体为：
mirror: true#不同1：训练集会randomly mirrors the input image
crop_size: 227
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
source: "examples/imagenet/ilsvrc12_train_lmdb"#不同2：来源不同
batch_size: 32#原文很大，显卡比较弱的会内存不足，这里改为了32，这里根据需要更改，验证集和训练集的设置也不一样
backend: LMDB
另外在输出层也有不同，训练时的loss需要用来进行反向传递，而val就不需要了。
solver.protxt的改动：
net: "/home/beatree/caffe-rc3/examples/imagenet/train_val.prototxt"#网络配置存放地址
test_iter: 4，每个批次是50，一共200个
test_interval: 300 #每300次测试一次
base_lr: 0.01 #是基础学习率,因为数据量小,0.01 就会下降太快了,因此可以改成 0.001，这里博主没有改
lr_policy: "step" #lr可以变化
gamma: 0.1 #学习率变化的比率
stepsize: 300
display: 20 #20层显示一次
max_iter: 1200 一共迭代1200次
momentum: 0.9
weight_decay: 0.0005
snapshot: 600 #每600存一个状态
snapshot_prefix: "/home/beatree/caffe-rc3/examples/imagenet/"#状态存放地址
使用上面的配置训练，得到的结果准确率仅仅是0.2+，数据集的制作者迭代了12000次得到0.5的准确率
1.5.1杀掉正在运行的caffe进程：
kill -9 代码
1.5.2 查看gpu的使用情况
nvidia-sim -l
（NVIDIA System Management Interface）
1.5.3 查看时间使用情况
./build/tools/caffe time
我的时间使用情况
Average Forward pass: 3490.86 ms.
Average Backward pass: 5666.73 ms.
Average Forward-Backward: 9157.66 ms.
Total Time: 457883 ms.
1.5.4 恢复数据
如果我们在训练途中就停电或者有了其他的情况，我们可以通过之前保存的状态恢复数据，使用的时候直接添加–snapshot参数即可，如：
这时候运行会从snapshot开始继续运行，如从第600迭代时运行：
1.5.5 c++ 提取特征
when everything necessary is in place:
./build/tools/extract_features.bin models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel examples/_temp/imagenet_val.prototxt fc7 examples/_temp/features 10 leveldb
the features are stored to LevelDB examples/_temp/features.
1.5.6 使用c++分类
对于c++的学习应该读读tools/caffe.cpp里的代码。
其分类命令如下：
./build/examples/cpp_classification/classification.bin \ models/bvlc_reference_caffenet/deploy.prototxt \ models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel \ data/ilsvrc12/imagenet_mean.binaryproto \ data/ilsvrc12/synset_words.txt \ examples/images/cat.jpg
2 使用pycaffe分类
2.1 import
首先载入环境：
# set up Python environment: numpy for numerical routines, and matplotlib for plotting
import numpy as np
import matplotlib.pyplot as plt
# display plots in this notebook
%matplotlib inline#这里由于ipython启动时移除了 pylab 启动参数，所以需要使用这种格式查看，官网介绍http://ipython.org/ipython-doc/stable/interactive/reference.html#plotting-with-matplotlib：
#To start IPython with matplotlib support, use the --matplotlib switch. If IPython is already running, you can run the %matplotlib magic. If no arguments are given, IPython will automatically detect your choice of matplotlib backend. You can also request a specific backend with %matplotlib backend, where backend must be one of: ‘tk’, ‘qt’, ‘wx’, ‘gtk’, ‘osx’. In the web notebook and Qt console, ‘inline’ is also a valid backend value, which produces static figures inlined inside the application window instead of matplotlib’s interactive figures that live in separate windows.
# set display defaults
#关于rcParams函数http://matplotlib.org/api/matplotlib_configuration_api.html#matplotlib.rcParams
plt.rcParams['figure.figsize'] = (10, 10)
# large images
plt.rcParams['image.interpolation'] = 'nearest'
# don't interpolate: show square pixels
plt.rcParams['image.cmap'] = 'gray'
# use grayscale output rather than a (potentially misleading) color heatmap
import caffe#如果没有设置好路径可能发现不了caffe，需要import sys cafe_root='你的路径'，sys.path.insert(0,caffe_root+'python')之后再import caffe
下面下载模型，由于上面刚开始我们用的数据不是imagenet，现在我们直接下载一个模型，可能你的python中没有yaml，这里可以用pip安装（终端里）：
sudo apt-get install python-pip
pip install pyyaml
./scripts/download_model_binary.py /home/beatree/caffe-rc3/model
/bvlc_reference_caffenet
2.2 模型载入
caffe.set_mode_cpu()
model_def='/home/beatree/caffe-rc3/models/bvlc_reference_caffenet/deploy.prototxt'
model_weights='/home/beatree/caffe-rc3/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'
net=caffe.Net(model_def,
model_weights,
caffe.TEST)
mu=np.load('/home/beatree/caffe-rc3/python/caffe/imagenet/ilsvrc_2012_mean.npy')
mu=mu.mean(1).mean(1)
mu长成下面这个样子：
array([[[ 110.,
110., ...,
110.9342804 ,
110.5134201 ],
111., ...,
110.6951828 ],
[ 110.525177
111., ...,
得到bgr的均值
print 'mean-subtracted values:', zip('BGR', mu)
mean-subtracted values: [('B', 104.9), ('G', 116.67), ('R', 122.6)]
matplotlib加载的image是像素[0-1],图片的数据格式[weight,high,channels]，RGB
而caffe加载的图片需要的是[0-255]像素，数据格式[channels,weight,high],BGR，那么就需要转换，这里用了 caffe.io.Transformer,可以使用help()来获得相关信息，他的功能有
preprocess(self, in_, data)
set_channel_swap(self, in_, order)
set_input_scale(self, in_, scale)
set_mean(self, in_, mean)
set_raw_scale(self, in_, scale)
set_transpose(self, in_, order)
# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})#net.blobs['data'].data.shape=(10, 3, 227, 227)
transformer.set_transpose('data', (2,0,1))
# move image channels to outermost dimension第一个变成了channels
transformer.set_mean('data', mu)
# subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)
# rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0))
# swap channels from RGB to BGR
2.3 cpu 分类
这里可以准备开始分类了，下面改变输入size的步骤也可以跳过，这里batchsize设置为50只是为了演示用，实际我们只对一张图片进行分类。
# set the size of the input (we can skip this if we're happy
with the default; we can also change it later, e.g., for different batch sizes)
net.blobs['data'].reshape(50,
# batch size
# 3-channel (BGR) images
# image size is 227x227
image = caffe.io.load_image( 'path/to/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)
plt.imshow(image)
得到一个可爱的小猫,接下来看一看模型是不是认为她是不是小猫
net.blobs['data'].data[...] = transformed_image
output = net.forward()
output_prob = output['prob'][0]
print 'predicted class is:', output_prob.argmax(),output_prob[output_prob.argmax()]
得到结果：
predicted calss is 281 0.312436
也就是第281种最有可能，概率比重是0.312436
那么第231种是不是猫呢，让我们接着看
labels_file = caffe_root + 'data/ilsvrc12/synset_words.txt'
labels = np.loadtxt(labels_file, str, delimiter='\t')
print 'output label:', labels[output_prob.argmax()]
结果是answer is
n tabby, tabby cat连花纹都判断对了。接下来让我们进一步观察判断的结果：
top_inds = output_prob.argsort()[::-1][:5]
# reverse sort and take five largest items
print 'probabilities and labels:'
zip(output_prob[top_inds], labels[top_inds])
得到的结果是：
[(0., 'n tabby, tabby cat'),
(0.2379715, 'n tiger cat'),
(0., 'n Egyptian cat'),
(0., 'n red fox, Vulpes vulpes'),
(0., 'n lynx, catamount')]
2.4 对比GPU
现在对比下GPU与CPU的性能表现
首先看看cpu每次（50 batch size）向前运行的时间：
net.forward（）
%timeit能自动选择运行的次数求平均运行时间，这里我的运行时间是1 loops, best of 3: 5.29 s per loop，官网的是1.42,差距
接下来看GPU的运行时间：
caffe.set_device(0)
caffe.set_mode_gpu()
net.forward()
%timeit net.forward()
1 loops, best of 3: 507 ms per loop（官网是70.2ms）,慢了好多的说
2.5 查看中间输出
首先我们看下网络的结构及每层输出的shape，其形式应该是（batchsize，channeldim，height，weight）
# for each layer, show the output shape
for layer_name, blob in net.blobs.iteritems():
print layer_name + '\t' + str(blob.data.shape)
得到的结果如下：
(50, 3, 227, 227)
(50, 96, 55, 55)
(50, 96, 27, 27)
(50, 96, 27, 27)
(50, 256, 27, 27)
(50, 256, 13, 13)
(50, 256, 13, 13)
(50, 384, 13, 13)
(50, 384, 13, 13)
(50, 256, 13, 13)
(50, 256, 6, 6)
fc6 (50, 4096)
fc7 (50, 4096)
fc8 (50, 1000)
(50, 1000)
现在看其参数的样子，函数为net.params,其中weight的样子应该是（output_channels,input_channels,filter_height,flier_width）, biases的形状只有一维（output_channels,）
for layer_name,parame in net.params.iteritems():
print layer_name+'\t'+str(param[0].shape),str(param[1].data.shape)#可以看出param里0为weight1为biase
(96, 3, 11, 11) (96,)
(256, 48, 5, 5) (256,)
(384, 256, 3, 3) (384,)
(384, 192, 3, 3) (384,)
(256, 192, 3, 3) (256,)
fc6 (4096, 9216) (4096,)
fc7 (4096, 4096) (4096,)
fc8 (1000, 4096) (1000,)
可以看出只有卷基层和全连接层有参数
既然后了各个参数我们就初步解读下caffenet：
首先第一层conv1其输出结果的变化
（图片来自博客）
这一步应该可以理解，其权重的形式为(96, 3, 11, 11)
但是第二层的卷积层为什么为(256, 48, 5, 5)，因为这里多了一个group选项，在cs231n里没有提及，这里的group=2，把输入输出分为了两个组也就是输入变成了96/2=48，
全连接层fc6的数据流图：
这是一张特拉维夫大学的ppt
下面进行可视化操作，首先要定义一个函数方便以后调用，可视化各层参数和结果:
def vis_square(data):
"""Take an array of shape (n, height, width) or (n, height, width, 3)
and visualize each (height, width) thing in a grid of size approx. sqrt(n) by sqrt(n)"""
data = (data - data.min()) / (data.max() - data.min())
n = int(np.ceil(np.sqrt(data.shape[0])))
padding = (((0, n ** 2 - data.shape[0]),
(0, 1), (0, 1))
+ ((0, 0),) * (data.ndim - 3))
data = np.pad(data, padding, mode='constant', constant_values=0)
data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
plt.imshow(data)
plt.axis('off')
以conv1为例，探究如果reshape的
filters = net.params['conv1'][0].data
vis_square(filters.transpose(0, 2, 3, 1))
得到的结果
这里conv1的权重，原来的shape是(96, 3, 11, 11)，其中输出为96层，每个filter的大小是11 11 3（注意后面的3噢），每个filter经过滑动窗口（卷积）得到一张output，一共得到96个。（下图是错误的，请去官网看正确的）
首先进入vissquare之前要transpose–》（96，11，11，3）
输入vis_square得到的padding是（0，4），（0，1），（0，1），（0，0）也就是经过padding之后变为（100，12，12，3），这时的12多出了一个边框，第一个reshape(10,10,12,12,3),相当于原来100个图片一排变为矩阵式排列，然后又经过transpose（0，2，1，3，4）—&（10，12，10，12，3）又经过第二个reshape（120，120，3）
下面展示第一层filter输出的特征：
feat = net.blobs['conv1'].data[0, :36]
vis_square(feat)
如果取全部的96张会出现下面的情况：中间的分割线没有了，为什么呢？
用上面的方法也可以查看其他几层的输出。
对于全连接层的输出需要用直方图的形式：
feat = net.blobs['fc6'].data[0]
plt.subplot(2, 1, 1)
plt.plot(feat.flat)
plt.subplot(2, 1, 2)
_ = plt.hist(feat.flat[feat.flat & 0], bins=100)#bin统计某一个数段之间的数量
输出分类结果：
feat = net.blobs['prob'].data[0]
plt.figure(figsize=(15, 3))
plt.plot(feat.flat)
大体就是这样了，我们可以用自己的图片来分类看看结果
主要分类过程代码主要步骤：
1. 载入工具包
2. 设置显示设置
3. 设置求解其set_mode_cup()/gpu()
4. 载入模型 net=caffe.Net(,,caffe.TEST)
5. transformer（包括载入均值）
6. 设置分类输入size（batch size等）
7. 载入图片并转换（io.load_image(‘path’), transformer.preprocesss）
8. net.blobs[‘data’],data[…]=transformed_image
9. 向前计算output=net.forward
10. output_prob=output[‘prob’][0]
11. 载入synset_words.txt(np.loadtxt(,,))
12. 分类结果输出 output_prob.argsort()[::-1][] ?????
13. 展示各层输出net.blobs.iteritems()
14. 展示各层参数net.params.iteritems()
15. 可视化注意pad和reshape，transpose的运用
16. net.params[‘name’][0].data
17. net.blobs[‘name’].data[0,:36]
18. net.blobs[‘prob’].data[0]#每个图片都有不同的输出所以后面加了个【0】
3 Fine-tuning
Now we will fine-tune the model we trained above on a different dataset to predict image style. we have 80000 images to train on. There will some changes :
1. we will change the name of the last layer form fc8 to fc8_flickr in our prototxt, it will begin training with random weights.
2. decrease base_lr andboost the lr_mult on the newly introduced layer.
3. set stepsize to a lower value. So the learning rate to go down faster
4. So in the solver.prototxt,we can find the base_lr is 0.001 from 0.01,and the stepsize is become to 20000 from 100000.
3.1 cmdcaffe
3.1.1 download dataset & model
we will only download 2000 images
.. --- -- --
we have already download the model in the previous step
3.1.2 fine tune
let’s see some information in the new train_val.prototxt:
1. ImageData later
name:"data"
type:"ImageData"
transform_param{
mirror=true
crop_size:227
mean_file:"yourpath.binaryproto"}
image_data_param{
batch_size:
new_height:
new_width: }}
另外加了一层规则化的。
在fc8_flickr层的lr_mult分别为10和20
./build/tools/caffe train -solver models/finetune_flick_style/solver.prototxt -weithts
models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel -gpu 0
3.2 pycaffe
some functions in python ：
import tempfile
image=np.around()
image=np.require(image,dtype=np.uint8)
assert os.path.exists(weights)#声明，如果路径不存在会报错
在这一部分，通过ipython notebook定义了完整的网络与solver，并比较了微调模型与直接训练模型的差异，代码相对来说更加具体，由于下一边博客相关叙述比较仔细，这里就不重复了，但是还是很有必要按照官网来一遍的。
3.3 主要步骤
3.3.1 下载caffenet模型，下载Flickr数据
weights=’…..caffemodel’
3.3.2 defining and runing the nets
def caffenet():
n=caffe.NetSpec()
n.data=data
n.conv1,n.relu1=
n.drop6=fc7input=L.Dropout(n.relu6,in_place=True)
fc7input=n.relu6
if...else...
fc8=L.InnerProduct(fc8input,num_output=num_clsasses,param=learned_param)
n.__setattr__(classifier_name,fc8)
if not train:
n.probs=L.Softmax(fc8)
if label is not None:
n.label=label
n.loss=L.SoftmaxWithLoss(fc8,n.label)
n.acc=L.Accuracy(fc8,n.label)
with tempfile.NamedTemporaryFile(delete=False)as f:
f.write(str(n.to_proto()))
return f.name
3.3.3 dummy data imagenet
L.DummyData(shape=dict(dim=[1,3,227,227]))
imagenet_net_filename=caffenet(data,train=False)
imagenet_net=caffe.Net(imagenet_net_filename,weights,caffe.TEST)
3.3.4 style_net
have the same architecture as CaffeNet,but with differences in the input and output：
def style_net(traih=True,Learn_all=False,subset=None):
if subset is None:
subset ='train' if train else 'test'
source='path/%s.txt'%subset
trainsfor_param=dict(mirror=train,crop_size=227,meanfile='path/xx.binaryproto')
style_data,style_label=L.ImageData(transform_param=,source=,batch_size=,new_height=,new_width=,ntop=2)
return caffenet(data=style_data,label=style_label,train=train,num_classes=20,classifier_name='fc8_filcker',learn_all=learn_all)
3.3.5 对比untrained_style_net,imagenet_net
3.3.6 training the style classifier
from caffe.proto import caffe_pb2
def solver():
s=caffe_pb2.SloverParameter()
s.train_net=train_net_path
if test_net_path is not None:
with temfile.Nxx as f:
f.write(str(s))
return f.name
bulit/tools/caffe train \ -solver models/path/sovler.prototxt\ -weights /path/.caffemodel\ gpu 0
def run_solvers():
for it in range(niter):
for name, s in solvers:
loss[][],acc[][]=(s.net.blobs[b].data.copy()for b in blobs)
if it % disp_interval==0 or it+1
...print ...
weight_dir=tempfile.mkdtemp()
weights={}
for name,s in solvers:
weights[name]=os.path.join(weight_dir,filename)
s.net.save(weights[name])
return loss,acc,weights
3.3.7 对比预训练效果
预训练多了一步：style_solver.net.copy_from(weights)
3.3.8 end-to-end finetuning for style
learn_all=Ture
我的热门文章
即使是一小步也想与你分享

用caffe faster rcnn训练cnn怎么调参

我要回帖

更多关于 caffe rcnn 的文章

随机推荐