opencv traincascade源码解析 训练时间多长

1252人阅读
openCV相关(13)
1、首先,问什么要训练这个.xml文件?
&&这个的用途有很多,就是利用Adaboost算法,进行训练一些数据,典型应用是:进行人脸识别。本次就利用人脸识别进行说明,训练的流程。其实进行人脸识别还可以用opencv_haartraining分类,但是经过一些看网上的人的经验(本人没试),opencv_haartraining存在很多问题,比如有:检测效果很一般,识别率低。经验证,利用opencv_traincascade训练效果要好,而且支持Haar和LBP两种特征,并且易于增加其他特征,与Haar特征相比,LBP特征是整数特征,因此训练和检测都比Haar特征快几倍。LBP和Haar特征用于检测准确率,是依赖于训练过程中的训练数据的质量和训练参数。训练一个与Haar特征同样准确率的LBP特征的分类器是可能的。
如下是用的LBP特征:
step1:首先进行人脸识别嘛,要有正负样本图片。正样本就是人脸,副样本就是非人脸,一些背景什么的东西。正样本整成一样的大小的,方便后面的使用(至于不整成一样的大小行不行,本人没试,但是后面会很麻烦,不知道会不会有简单办法)。可以写程序实现,当然也可以使用软件,我试了,美图秀秀软件可以用(可以批量处理,很给力)。
step2:正负样本有了,现在提取正负样本的图片所在的路径,分别放到两个txt文件中,正样本pos.txt,副样本neg.txt.(这要注意,这的路径是相对于dos里面的路径的。假如图片路径为d:\picture\pos\图片数,而dos里面的路径为d:\picture,则txt文件中只需要写pos\图片名(我感觉自己都说不清楚了))。然后在txt的每张图片的后面加上1+图片大小,提取图片路径有个很简单的办法是:在dos里面敲命令dir/b & pos.txt。那么该路径下的所有的图片的名字都会保存到pox.txt下。可以试一试,一试就明白了。
例如图片为30*30的,则为 1 0 0 30 30.
创建vec文件:将opencv_createsamples.exe和opencv_traincascade.exe放到图片文件夹的上层目录,利用opencv_createsamples.exe应用程序在该目录下使用如图cmd命令:
其中的-vec是指定后面输出vec文件的文件名,-info指定正样本描述文件,-bg指定负样本描述文件,-w和-h分别指正样本的宽和高,-num表示正样本的个数。执行完该命令后就会在当前目录下生产一个pos.vec文件了。
所以这个.exe可执行文件的作用就是生成.vec文件。而这个生成的.vec文件有什么作用呢?
答:opencv_createsamples用来准备训练用的正样本数据和测试数据。opencv_createsamples能够生成能被opencv_haartraining和opencv_traincascade程序支持的正样本数据。他的输出是以.vec为扩展名的文件,该文件以二进制方式存储图像。现在知道.vec文件的作用了吧。简单点说就是:它存储了一些可以被opencv_haartraining和opencv_traincascade使用的数据,而这些数据在下面要用到。
使用opencv_traincascade.exe文件进行训练
首先在当前目录下新建一个dt文件夹用于存放生成的.xml文件。
在当前目录使用cmd命令:
D:\&opencv_traincascade.exe -data dt -vec pos.vec -bgneg/neg.txt -numPos 67 -n
umNeg 300 -numStages 16 -precalcValbufSize 200 -precalcdxBufSize1000 -featureTy
pe LBP -w 100 -h 50
其中-data 输出目录,-numPos正样本数目-numNeg负样本数目-numStages训练级数
如果想知道详细的各个参数和信息和每个参数代表的意思以及更详细的训练步骤。请参考http://blog.csdn.net/liulina603/article/details/8598681
训练的时候需要注意的一些事项
http://blog.csdn.net/zwlq1314521/article/details/9789897
参考知识库
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
访问:66749次
积分:1481
积分:1481
排名:千里之外
原创:78篇
转载:19篇
评论:14条
(1)(1)(3)(2)(2)(1)(1)(4)(2)(2)(1)(3)(4)(1)(2)(5)(5)(6)(4)(1)(2)(4)(9)(31)OpenCV traincascade训练器 源代码_百度文库
两大类热门资源免费畅读
续费一年阅读会员,立省24元!
OpenCV traincascade训练器 源代码
上传于|0|0|暂无简介
你可能喜欢OpenCV---一个神奇的国度(9)
在使用命令“opencv_traincascade -data data -vec pos/pos.vec -bg neg/neg.txt -numPos 433 -numNeg 5449
-numStages 1 -w 30 -h 30 -minHitRate 0.7 -maxFalseAlarmRate 0.02”后
1、如果出现“Parameters can not be written, because file data/params.xml can not be opened”&错误,则自己需要手动创建一个文件夹
2、出现“Train dataset for temp stage can not be filled. Branch training terminated.&Cascade
classifier can't be trained. Check the used training parameters.”错误,将neg.txt文件里的图片路径全部写成绝对路径,同时,将neg.txt文件夹挪到当前使用该命令的路径下。
部分摘要:
Ok downloading and inspecting your data, I have found already tons of problems
When inspecting your&Training_Negativas.txt&file,
I see the following structure&Images/UMD_001.jpg&which is asking for trouble. Start by changing those
to absolute paths, so that the software reading it will always find the exact image. For example it could be&/data/Images/UMD_001.jpg.
Relative paths always generate problems...Same goes for the&Training.txt,
which has the same problem, but somehow you seemed to have trained the *.vec file so it might have been going good.The data inside&Training.txt&is
seperated using tabs, while it is stated that data should be seperated by spaces. If you do this with tabs, I am afraid it is already possible that your vec file is actually filled with rubbish.More a tip, avoid capitals in filenames. If you forget them somewhere some OS will not handle your data correctly.The file training.txt got 500 entries but the folder has only 490 images, where are the other 10?
Could you supply the missing 10 images so I can perform tests to see if it works when fixing all this?
After changing the data structure as follows [WILL ADD HERE LATER] I ran the&opencv_createsamples&tool
first with the following result
Then I ran the training command like this
opencv_traincascade -data cascade/ -vec positivas.vec -bg negativas.txt -numPos 400 -numNeg 2500 -numStages 15 -w 200 -h 50 -featureType LBP -precalcValBufSize 4048 -precalcIdxBufSize 4048 -numThreads 24
Watch the fact that I increased the memory consumption ... because your system can take more than the standard 1GB per buffer AND I set the number of threads to take advantage from that.
Training starts for me and features are being evaluated. However due to the amount of unique features and the size of the training samples this will take long...
Looking at the memory consumption, this data uses a full blown 8.6GB, so you might want to lower those buffers to ensure that no swapping will happen and that it will cripple the system.
will update once a first stage is succesfully trained, and will increase memory to speed the process up
I increased my buffers to 8GB each, since I have 32 GB available using both fully would lead to a max allowed memory usage of 16GB. When looking at memory it halts around 13 GB now, the space needed to represent
all the multiscale features calculated for a single training stage....&
I am guessing this is one of the main reasons why your program is running extremely slow! I would suggest to reduce the dimensions of your model to like&-w
100 -h 25&for a starter, which will reduce the memory footprint drastically. Else it will indeed take ages to train.
Using this memory you can see that weak classifiers and stages are being constructed
Training finished on this 32GB RAM 24 core system in about 1 hour and half. The resulting model has 7 stages until it reaches satisfactory performance given the training parameters. The model and the data structure
used by me can be found&.
FInally I used the model to perform a detection output, generated by this command in C++ code
cascade.detectMultiScale( frame_gray, objects, reject_levels, level_weights, 1.1, 1, 0, Size(), Size(), true );
If I then filter out all detections with at least a certainty of 0, then you get the detections stored inside results.txt. The format is&filename
#detections x1 y1 w1 h1 score1 x2 y2 w2 h2 score2 ... xN yN wN hN scoreN. For now the&#detections&is
still wrong because I postfilter the certainty score. But will update the dropbox file when I fixed this.
Didnt visualise the detections on the training data yet. I will add those too once it works.
Added visualisations with their score. Basically if you take the best hitting detection per image, you keep the numberplate. Better detectors will require better and more training data.
Good luck with the application!
Like promised, the best hits&.
、错误“segmentation
fault 11”,增大正负样本的数目,同时,负样本的大小是正样本的2-4倍。
4、Tip on 11/8/23,于某论坛上:
a.&opencv_traincascade. opencv_haartraining is an obsolete utility and probably will be removed from trunk (actually better
to do it ASAP, because it confuses people). opencv_traincascade is a newer version and it has 2 important features: it supports LBP features and it supports multi-threading. So, we should forget about opencv_haartraining and switch completely to&opencv_traincascade.
b.&LBP. In short words, LBP has one significant advantage over Haar - speed. Both training and detection are much faster,
you will see the difference. At the same time LBP may have a bit lower quality of detection. But if you do training well (good training dataset first of all), you can get almost the same quality. In any case we recommend to work with LBP features during training.
Later you can run training process on the final training set with Haar features and compare the results. But the most important step is to create a proper training dataset, and better to use LBP here.
c.&Multi-threading. If you want
to enable multi-threading in OpenCV, you should build it from sources with Intel TBB support. OpenMP is removed completely from OpenCV. To compile everything from sources, just look at&&and
then follow&.
After that you should see that opencv_traincascade uses several cores during training.
5、错误Bad argument (Can not get new positive sample.
The most possible reason is insufficient count of samples in given&veg-file.
部分摘要如下:
First of all, I have to note that you copied my formula description incompletely. I wrote at that issue: &S&is
a count of samples from vec-file that can be recognized as background right away&. With the partial description of&S&from
the question, the formula does not make sense at all :)
For the document you asked.. I don't remember that I wrote this formula anywhere except the issue. The formula is not from any paper of course, it just follows from how traincascade application selects a set of
positive samples to train each stage of a cascade classifier. Ok, I'll describe my formula in more details as you ask.
numPose&- a count of positive samples
which is used to train each stage (do not confuse it with a count of all samples in vec-file!).
numStages&- a stages count which a cascade
classifier will have after the training.
minHitRate&- training constraint for
each stage which means the following. Suppose a positive samples subset of size&numPose&was
selected to train current&i-stage (i&is
a zero-based index). After the training of current stage, at least&minHitRate * numPose&samples
from this subset have to pass this stage, i.e. current cascade classifier with&i+1&stages
has to recognize this part (minHitRate) of the selected samples as positive.
If some positive samples (falseNegativeCount&pieces)
from the set (of size&numPose) which was used to train&i-stage
were recognized as negative (i.e. background) after the stage training, then&numPose - falseNegativeCount&pieces
of correctly recognized positive samples will be kept to train&i+1-stage and&falseNegativeCount&pices
of new positive samples (unused before) will be selected from vec-file to get a set of size numPose again.
One more important note: to train next&i+1-stage
we select only the samples that are passed a current cascade classifier with&i+1&stages.
Now we are ready to derive the formula. For the 0-stage training we just get numPose positive samples from vec-file. In the worse case&(1
- minHitRate) * numPose&of these samples are recognized as negative by the cascade with 0-stage only. So in this case to get a training set of positive samples for the 1-stage training we have to select&(1
- minHitRate) * numPose&new samples from vec-file that are recognized as positive by the cascade with 0-stage only. While the selection, some new positive samples from vec-file can be recognized as background right away by the current cascade and we
skip such samples. The count of skipped sample depends on your vec-file (how different samples are in it) and other training parameters. By analogy, for each&i-stage
training (i=1,..numStages-1) in the worse case we have to select&(1
- minHitRate) * numPose&new positive samples and several positive samples will be skipped in the process. As result to train all stages we need&numPose
+ (numStages - 1) * (1 - minHitRate) * numPose + Spositive samples, where&S&is
a count of all the skipped samples from vec-file (for all stages).&
Of course this formula does not allow to know an exact size of vec-file (samples count) we must have, because S depends on vec-file samples quality. But the formula gives an estimation of this size and understanding
its reason. I hope it is so :) If you disagree or have more questions, please let me know.
r8913 was a link to the SVN revision 8913, but we migrated on GIT. The corresponding fix is in OpenCV &= 2.4.2. And yes, 2.4.3 is actual version (you can always check this&).
6.在训练到1到2层之后,一直会报错 “Train dataset for temp stage can not be filled. Branch training terminated.” 于是,重新研究公式
vec-file&number&&=&(numPos&+&(numStages-1)&*&(1&–&minHitRate)&*&numPos)&+&S
如条目5所解释的,vec-file number &就是我们的正样本总数,需要注意的是,S
表示负样本总数,也即numNeg。如此,根据上述公式,重新计算numPos, numNeg。
to be continue。。。
参考知识库
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
访问:19103次
排名:千里之外
原创:12篇
转载:13篇
(6)(3)(3)(1)(4)(1)(1)(6)查看: 911|回复: 3
求opencv 中traincascade 的HOG特征代码解析
哪位大大读过opencv 中traincascade 的HOG特征代码,我通过看文献得知HOG特征向量归一化用的函数为L2-HYS最好,但是我读opencv训练源码,发现其貌似用的是L1-norm。我怕我读错了,所以想问问哪位大大知道opencv源码用的是哪个归一化函数,以及operator()函数的调用过程,我单步调试并没有发现其是怎么被调用的。求各位大大帮帮小弟!万谢!!
跟你一样 遇到同样的问题 不知道采用的是哪个函数进行归一化
跟你一样 遇到同样的问题 不知道采用的是哪个函数进行归一化
您现在知道了吗?
跟你一样 遇到同样的问题 不知道采用的是哪个函数进行归一化
我还是opencv应该是用的L1-norm归一化
Powered by温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!&&|&&
LOFTER精选
网易考拉推荐
用微信&&“扫一扫”
将文章分享到朋友圈。
用易信&&“扫一扫”
将文章分享到朋友圈。
2、样本准备& & 和上一篇haartraining相同,利用脚本和"opencv_createsamplesd.exe"生成"pos.txt","neg.txt"和'pos.vec'三个文件。3、训练& & 基本和haartraining相同,可以选择使用HAAR特征还是LBP特征-featureType&{HAAR(default), LBP}&Type of features: HAAR - Haar-like features, LBP - local binary patterns.& & 命令参数如下:& & 其中 -numPos 是每个Stage的正样本个数,注意这个数字不能等于正样本个数,一般设为正样本数的80%-90%。否则会出现异常。Traincascade Error: Bad argument (Can not get new positive sample. The most possible reason is insufficient count of samples in given vec-file.最终命令参数参考下面:-data xml -vec pos.vec -bg neg.txt -numPos 251 -numStages 5 -minhitrate 0.999 -maxfalsealarm 0.5 -featureType LBP -w 48 -h 64& & 编译执行前注意开启OpenMP选项,可以充分利用多核CPUproject--&properties--&c++--&language--&OpenMp support& & 另外在训练过程中应关注是否已经导致过拟合,此时要适当减小训练层数The precision of your cascade is shown by&acceptanceRatio&on the last stage it should be around this value&0.&or less.But if you get&acceptanceRatio&like this&7.83885e-07&your cascade is probably overtrained and it wont find anything, try to set less stages.注:HR 击中率, FA; 虚警& & 如果出现错误"Train dataset for temp stage can not be filled. Branch training terminated."原因是负样本数量不足,需要增加负样本数量,更新 -bg 参数对应的负样本文件夹参考:[1]&[2]&[3]&[5]&[6]&[7]&[8]&
阅读(334)|
用微信&&“扫一扫”
将文章分享到朋友圈。
用易信&&“扫一扫”
将文章分享到朋友圈。
历史上的今天
在LOFTER的更多文章
loftPermalink:'',
id:'fks_',
blogTitle:'traincascade in OpenCV',
blogAbstract:'1、Why&traincascade?& & 鉴于OpenCV已经不再更新haartraining项目,且在3.0Beta版本中已经删除了haartraining,因此使用traincascade是大势所趋。traincascade项目除包含Haar特征之外,还包含LBP特征及HOG特征,因此具有更强适应性。2、样本准备& & 和上一篇haartraining相同,利用脚本和\"opencv_createsamplesd.exe\"生成\"pos.txt\",\"neg.txt\"和\'pos.vec\'三个文件。',
blogTag:'opencv,人脸识别',
blogUrl:'blog/static/',
isPublished:1,
istop:false,
modifyTime:3,
publishTime:5,
permalink:'blog/static/',
commentCount:0,
mainCommentCount:0,
recommendCount:0,
bsrk:-100,
publisherId:0,
recomBlogHome:false,
currentRecomBlog:false,
attachmentsFileIds:[],
groupInfo:{},
friendstatus:'none',
followstatus:'unFollow',
pubSucc:'',
visitorProvince:'',
visitorCity:'',
visitorNewUser:false,
postAddInfo:{},
mset:'000',
remindgoodnightblog:false,
isBlackVisitor:false,
isShowYodaoAd:false,
hostIntro:'',
hmcon:'1',
selfRecomBlogCount:'0',
lofter_single:''
{list a as x}
{if x.moveFrom=='wap'}
{elseif x.moveFrom=='iphone'}
{elseif x.moveFrom=='android'}
{elseif x.moveFrom=='mobile'}
${a.selfIntro|escape}{if great260}${suplement}{/if}
{list a as x}
推荐过这篇日志的人:
{list a as x}
{if !!b&&b.length>0}
他们还推荐了:
{list b as y}
转载记录:
{list d as x}
{list a as x}
{list a as x}
{list a as x}
{list a as x}
{if x_index>4}{break}{/if}
${fn2(x.publishTime,'yyyy-MM-dd HH:mm:ss')}
{list a as x}
{if !!(blogDetail.preBlogPermalink)}
{if !!(blogDetail.nextBlogPermalink)}
{list a as x}
{if defined('newslist')&&newslist.length>0}
{list newslist as x}
{if x_index>7}{break}{/if}
{list a as x}
{var first_option =}
{list x.voteDetailList as voteToOption}
{if voteToOption==1}
{if first_option==false},{/if}&&“${b[voteToOption_index]}”&&
{if (x.role!="-1") },“我是${c[x.role]}”&&{/if}
&&&&&&&&${fn1(x.voteTime)}
{if x.userName==''}{/if}
网易公司版权所有&&
{list x.l as y}
{if defined('wl')}
{list wl as x}{/list}

我要回帖

更多关于 opencv traincascade 的文章

 

随机推荐