
564 条评论分享收藏感谢收起面壁者系列:Logistic回归有问题,上知乎。知乎作为中文互联网最大的知识分享平台,以「知识连接一切」为愿景,致力于构建一个人人都可以便捷接入的知识分享网络,让人们便捷地与世界分享知识、经验和见解,发现更大的世界。什么是逻辑回归?是一种广义的线性回归分析模型,因此与多重线性回归分析有很多相同之处。它们的模型形式基本上相同,都具有
,然后根据h 与1-h的大小决定因变量的值。如果L是logistic函数,就是logistic回归,如果L是多项式函数就是多项式回归。适用条件:因变量为二分类的分类变量或某事件的发生率,并且是数值型变量。但是需要注意,重复计数现象指标不适用于Logistic回归。残差和因变量都要服从二项分布。二项分布对应的是分类变量,所以不是正态分布,进而不是用最小二乘法,而是最大似然法来解决方程估计和检验问题。自变量和Logistic概率是线性关系各观测对象间相互独立。模型特点:优点:计算代价不高,易于理解和实现缺点: 容易欠拟合,分类精度可能不高适用数据类型: 数值型 和 标称行Hypothesis Function 假设方程 w 是参数/系数, x是自变量。
根据自身特性,它的输出范围在0和1之间。Sigmoid Function/Logistic Function公式(1) 就是 Sigmoid 函数。可以理解为 给定数据X,参数 theta, y=1的概率是多少Decision Boundary 决定边界定义: the line separates the area where y=0 and where y = 1 一般情况来说:
任何大于 0.5 的数据被分入 1 类,小于 0.5 即被归入 0 类。Cost Function 我们的目标是尽量让我们hypothesis function得出的预测值和真实值接近。那么 Cost Function就是衡量接近多少的一个函数。越近,这个函数就越小。因为这是一个凸函数,那也就意味着我们可以用梯度下降法来解决得到最优解。Gradient Decent 梯度下降梯度下降就不过多介绍了,是一个一阶最优化算法,通常也称为最速下降法。 要使用梯度下降法找到一个函数的局部极小值,必须向函数上当前点对应梯度(或者是近似梯度)的反方向的规定步长距离点进行迭代搜索。如果相反地向梯度正方向迭代进行搜索,则会接近函数的局部极大值点;这个过程则被称为梯度上升法。大致意思就是用梯度来得到最优解。算法就是: Multi-class classification one-vs-all逻辑回归也可以用作多元分类/多标签分类。 思路如下:假设我们标签A中有a0,a1,a2....an个标签,对于每个标签 ai (ai 是标签A之一),我们训练一个逻辑回归分类器。即,训练该标签的逻辑回归分类器的时候,将ai看作一类标签,非ai的所有标签看作一类标签。那么相当于整个数据集里面只有两类标签:ai 和其他。剩下步骤就跟我们训练正常的逻辑回归分类器一样了。测试数据的时候,将查询点套用在每个逻辑回归分类器中的Sigmoid 函数,取值最高的对应标签为查询点的标签。避免过拟合的方法减少多余的特征:手动选择那些特征保留下来使用算法来选择特征。Regularization 正则化:就是保留所有特征,但是需要加入在cost function一个正则化项,也就是下面公式加号右边那个求和式子。
叫做正则化参数, 太大太小都不太好。太大会 underfitting(欠拟合),太小可能会 overfitting(过拟合)。刚刚好,就是fitting拟合。正则化项中的 j是从1开始的。因为我们不需要对
进行惩罚。m前面 2 是为了求导方便,无其他实际意义。当所有特征都有点用的时候,删除特征未免太过可惜。这个场景下,正则化就比较适用。这时候梯度下降算法就会更改为: Logistic回归 和 最大熵模型Logistic 回归和最大熵模型 都属于对数线性模型 (log linear model)。 学习它们的模型一般采用极大似然估计或者正则化的极大似然估计。Logistic 回归和最大熵模型学习可以形式化为无约束最优化问题。除了使用梯度下降法来解决,可以使用改进的迭代尺度法和拟牛顿法来解决。赞同 分享收藏文章被以下专栏收录逻辑回归与梯度下降
在讲解Logistic回归理论之前,我们先从LR分类器说起。LR分类器,即Logistic Regression Classifier。
这个比值称为事件的发生比(the odds of experiencing an event),简记为odds。
1 0 0 1 0 1
0 0 1 2 0 0
1 0 0 1 1 0
0 0 0 0 1 0
0 0 1 0 0 0
0 0 1 0 1 0
0 0 1 2 1 0
1 0 0 0 0 0
0 0 1 0 1 0
1 0 1 0 0 0
0 0 1 0 1 0
0 0 1 0 0 0
0 0 1 0 1 0
1 0 0 1 0 0
1 0 0 0 1 0
2 0 0 0 1 0
1 0 0 2 1 0
2 0 0 0 1 0
2 0 1 0 0 0
0 0 1 0 1 0
0 0 1 2 0 0
0 0 0 0 0 0
0 0 1 0 1 0
1 0 1 0 1 1
0 0 1 2 1 0
1 0 1 0 0 0
0 0 1 0 0 0
0 0 0 2 0 0
1 0 0 0 1 0
2 0 1 0 0 0
2 0 1 1 1 0
1 0 1 1 0 0
1 0 1 2 0 0
1 0 0 1 1 0
0 0 0 0 1 0
1 1 0 0 1 0
1 0 1 2 1 0
0 0 0 0 1 0
0 0 1 0 0 0
1 0 1 1 1 0
1 0 1 0 1 0
2 0 1 2 0 0
0 0 1 2 1 0
0 0 1 0 1 0
2 0 1 0 1 0
0 0 1 0 1 0
1 0 0 0 0 0
1 0 0 0 1 0
0 0 0 0 1 0
0 0 1 2 1 0
0 1 1 0 0 0
0 1 0 0 1 0
2 1 0 0 0 0
2 1 0 0 0 0
1 1 0 2 0 0
1 1 0 0 0 1
0 1 0 0 0 0
2 1 0 0 1 0
0 1 0 0 1 0
2 1 0 2 1 0
2 1 0 2 1 0
1 1 0 2 1 0
0 1 0 0 0 1
2 1 1 0 1 0
2 1 0 1 1 0
1 1 0 0 0 1
2 1 0 0 0 0
1 1 0 0 1 0
1 1 0 0 0 0
2 1 0 1 1 0
1 1 0 0 1 0
1 0 1 1 0 1
2 1 0 1 1 0
0 1 0 0 1 0
1 0 1 0 0 0
0 0 1 0 0 1
1 0 0 0 0 0
0 0 0 2 1 0
1 0 1 2 0 1
1 0 0 1 1 0
2 0 1 2 1 0
2 0 0 0 1 0
1 0 0 1 1 0
1 0 1 0 1 0
0 0 1 0 0 0
1 0 0 2 1 0
2 0 1 1 1 0
0 0 1 0 1 0
0 0 0 0 1 0
2 0 0 1 0 1
0 0 1 0 0 0
0 0 0 0 0 0
1 0 1 1 1 1
2 0 1 0 1 0
0 0 0 0 0 0
1 0 1 0 1 0
0 0 0 0 1 0
0 0 0 2 0 0
0 0 0 0 0 0
0 0 1 2 0 0
0 0 1 0 1 0
0 0 1 0 0 1
0 0 0 2 1 0
1 0 1 1 1 0
1 0 0 1 1 0
0 0 1 0 1 0
1 0 0 0 0 0
1 0 1 0 1 0
2 0 0 0 1 0
1 0 0 0 1 0
2 0 0 1 1 0
0 0 1 2 1 0
1 0 1 2 0 0
0 0 1 2 1 0
1 0 0 0 0 0
0 0 1 0 1 0
0 0 0 1 1 0
1 0 0 0 1 0
2 0 0 1 1 0
1 0 0 1 1 0
1 0 1 0 0 0
1 1 0 1 1 0
2 1 0 0 1 0
0 1 0 0 0 0
1 1 0 1 0 1
1 1 0 2 1 0
0 1 0 0 0 0
1 1 0 2 0 0
0 1 0 0 1 0
1 1 0 0 1 1
1 1 0 2 1 0
1 0 0 2 1 0
2 1 1 1 1 0
0 1 0 0 1 0
0 1 0 0 1 0
2 1 0 0 0 1
1 1 0 2 1 0
1 1 0 0 1 0
1 1 1 0 0 0
2 1 0 2 1 0
2 1 1 1 0 0
0 1 0 0 1 0
1 1 0 2 1 0
0 1 0 0 1 0
1 1 0 1 1 0
0 1 0 0 1 0
0 1 0 0 0 0
1 1 0 0 0 0
1 1 0 2 1 0
1 1 0 0 0 0
0 1 1 2 0 0
2 1 0 0 1 0
2 0 1 0 0 1
0 0 1 0 1 0
1 0 1 0 0 0
0 0 1 2 1 0
0 0 1 0 0 0
1 0 1 0 1 0
0 0 1 0 1 0
0 0 1 0 1 0
1 0 1 0 1 0
0 0 0 0 0 1
0 0 1 2 1 0
0 0 1 0 1 0
0 0 1 0 1 0
0 0 1 0 0 0
0 0 1 0 0 1
0 0 1 2 1 0
2 0 1 2 1 0
0 0 1 0 1 0
0 0 1 0 1 0
0 0 1 0 1 0
1 0 0 0 0 0
2 0 1 1 1 0
0 0 1 0 0 1
1 0 1 0 0 0
1 0 1 1 1 0
1 0 1 1 0 0
0 0 1 0 0 0
1 0 1 1 1 0
1 0 1 2 0 0
2 0 0 0 1 0
0 0 1 0 0 1
0 0 1 0 1 0
0 0 1 0 1 0
1 0 1 0 0 0
0 0 1 0 0 0
2 0 1 1 0 0
0 0 1 2 0 0
1 0 0 1 1 1
0 0 0 0 1 0
0 0 0 0 0 1
0 0 1 0 1 0
2 0 1 2 1 0
1 0 0 1 0 0
0 0 1 0 0 0
2 0 0 1 1 1
0 0 1 0 0 0
0 0 1 0 1 0
2 0 1 0 1 0
0 0 1 0 1 0
2 0 0 0 1 0
1 0 1 0 1 0
1 0 0 0 1 0
0 0 1 0 0 1
2 0 0 0 0 0
2 0 0 1 1 0
0 0 1 0 1 0
0 0 0 0 1 0
2 0 1 0 0 0
1 0 1 0 1 0
0 0 0 0 1 0
1 0 1 0 1 0
0 0 1 0 0 0
1 0 1 0 1 0
1 0 1 0 1 0
1 0 1 0 1 0
0 0 1 2 0 0
2 0 1 0 1 1
0 0 1 0 1 0
0 0 1 2 1 0
0 0 0 0 0 0
0 0 1 0 1 0
1 0 1 0 1 0
0 0 1 0 1 0
1 0 1 0 0 0
0 0 1 0 1 0
0 0 1 0 0 0
1 0 1 0 0 0
0 0 1 0 1 0
0 0 1 0 1 0
1 0 0 0 1 0
0 0 1 0 0 0
0 0 0 0 1 0
1 0 1 1 1 0
0 0 0 2 0 0
0 0 1 0 1 0
0 0 1 0 1 0
0 0 1 0 1 0
1 0 0 1 1 0
2 0 0 0 1 0
1 0 0 0 0 0
2 0 0 2 1 0
0 0 1 2 1 0
1 0 1 0 0 1
0 0 1 2 1 0
0 0 1 2 1 0
0 0 1 0 1 0
1 0 1 2 1 0
0 0 0 2 0 0
1 0 0 0 0 0
0 0 0 2 1 0
0 0 1 0 1 0
2 0 0 0 1 0
1 0 0 0 0 0
1 0 0 1 1 0
1 0 1 1 1 0
1 0 1 0 1 1
0 0 1 0 1 0
1 1 0 2 1 0
1 1 0 1 0 0
2 1 0 2 1 0
1 1 1 0 0 0
0 1 1 0 0 0
0 1 1 0 0 1
0 1 0 0 1 0
1 1 1 0 0 0
1 1 1 0 1 0
0 1 0 0 1 0
0 1 1 0 0 1
1 1 1 1 1 0
1 1 0 2 1 0
0 1 0 2 0 0
1 1 0 2 1 0
0 0 1 2 1 0
2 1 1 1 1 0
0 1 0 0 1 0
0 0 1 0 1 0
2 1 0 1 1 0
0 1 0 0 1 0
1 1 0 0 0 0
1 1 0 0 1 0
0 1 0 0 0 0
0 1 1 0 0 0
2 1 0 0 1 0
2 1 0 0 0 0
1 1 0 0 1 0
2 1 0 1 1 0
#include &iostream&
#include &string.h&
#include &fstream&
#include &string&
#include &stdio.h&
int main()
string filename = "data.in";
ifstream file(filename.c_str());
char s[1024];
int x,y,z;
sscanf(s,"%d %d %d",&x,&y,&z);
cout&&x&&" "&&y&&" "&&z&&
拿到每一行后,可以把它们提取出来,进行系统输入。 Logistic回归的梯度上升算法实现如下
#include &iostream&
#include &string.h&
#include &fstream&
#include &stdio.h&
#include &math.h&
#include &vector&
#define Type double
#define Vector vector
struct Data
void PreProcessData(Vector&Data&& data, string path)
string filename =
ifstream file(filename.c_str());
char s[1024];
while(file.getline(s, 1024))
Type x1, x2, x3, x4, x5, x6, x7;
sscanf(s,"%lf %lf %lf %lf %lf %lf %lf", &x1, &x2, &x3, &x4, &x5, &x6, &x7);
tmp.y = x7;
void Init(Vector&Data& &data, Vector&Type& &w)
PreProcessData(data, "TrainData.txt");
for(int i = 0; i & data[0].x.size(); i++)
Type WX(const Data& data, const Vector&Type&& w)
Type ans = 0;
for(int i = 0; i & w.size(); i++)
ans += w[i] * data.x[i];
Type Sigmoid(const Data& data, const Vector&Type&& w)
Type x = WX(data, w);
Type ans = exp(x) / (1 + exp(x));
Type Lw(const Vector&Data&& data, Vector&Type& w)
Type ans = 0;
for(int i = 0; i & data.size(); i++)
Type x = WX(data[i], w);
ans += data[i].y * x - log(1 + exp(x));
void Gradient(const Vector&Data&& data, Vector&Type& &w, Type alpha)
for(int i = 0; i & w.size(); i++)
Type tmp = 0;
for(int j = 0; j & data.size(); j++)
tmp += alpha * data[j].x[i] * (data[j].y - Sigmoid(data[j], w));
void Display(int cnt, Type objLw, Type newLw, Vector&Type& w)
ojLw = "&&objLw&&"
两次迭代的目标差为: "&&(newLw - objLw)&&
cout&&"参数w为: ";
for(int i = 0; i & w.size(); i++)
cout&&w[i]&&" ";
void Logistic(const Vector&Data&& data, Vector&Type& &w)
int cnt = 0;
Type alpha = 0.1;
Type delta = 0.00001;
Type objLw = Lw(data, w);
Gradient(data, w, alpha);
Type newLw = Lw(data, w);
while(fabs(newLw - objLw) & delta)
objLw = newLw;
Gradient(data, w, alpha);
newLw = Lw(data, w);
Display(cnt,objLw,newLw, w);
void Separator(Vector&Type& w)
PreProcessData(data, "TestData.txt");
for(int i = 0; i & data.size(); i++)
Type p0 = 0;
Type p1 = 0;
Type x = WX(data[i], w);
p1 = exp(x) / (1 + exp(x));
p0 = 1 - p1;
cout&&"实例: ";
for(int j = 0; j & data[i].x.size(); j++)
cout&&data[i].x[j]&&" ";
if(p1 &= p0) cout&&1&&
else cout&&0&&
int main()
Init(data, w);
Logistic(data, w);
1. 寻找预测函数(h函数,hypothesis)
2. 构造损失函数(J函数)
3. 使损失函数最小,获得回归系数θ
1. 梯度下降
2. 牛顿迭代算法
3. 拟牛顿迭代算法(BFGS算法和L-BFGS算法)
其中随机梯度下降和L-BFGS在spark mllib中已经实现,梯度下降是最简单和容易理解的。
为何LR模型偏偏选择sigmoid 函数呢?逻辑回归不是回归问题,而是二分类问题,因变量不是0就是1,那么我们很自然的认为概率函数服从伯努利分布,而伯努利分布的指数形式就是个sigmoid 函数。
同样是取极小值,思想与损失函数一致,即我们把如上的J(θ)作为逻辑回归的损失函数。Andrew Ng的课程中,上式乘了一个系数1/m,我怀疑就是为了和线性回归的损失函数保持一致吧。
import org.apache.spark.SparkContext
import org.apache.spark.mllib.classification.{LogisticRegressionWithLBFGS, LogisticRegressionModel}
import org.apache.spark.mllib.evaluation.MulticlassMetrics
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.util.MLUtils
val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
val training = splits(0).cache()
val test = splits(1)
val model = new LogisticRegressionWithLBFGS()
val predictionAndLabels = test.map { case LabeledPoint(label, features) =&
val prediction = model.predict(features)
(prediction, label)
val metrics = new MulticlassMetrics(predictionAndLabels)
val precision = metrics.precision
println("Precision = " + precision)
model.save(sc, "myModelPath")
val sameModel = LogisticRegressionModel.load(sc, "myModelPath")
* Train a logistic regression model given an RDD of (label, features) pairs. We run a fixed
* number of iterations of gradient descent using the specified step size. Each iteration uses
* `miniBatchFraction` fraction of the data to calculate the gradient. The weights used in
* gradient descent are initialized using the initial weights provided.
* NOTE: Labels used in Logistic Regression should be {0, 1}
input RDD of (label, array of features) pairs.
numIterations Number of iterations of gradient descent to run.迭代次数
stepSize Step size to be used for each iteration of gradient descent.步长
miniBatchFraction Fraction of data to be used per iteration.用于模型预估数据的比例
initialWeights Initial set of weights to be used. Array should be equal in size to the number of features in the data.初始化权重
def train(
input: RDD[LabeledPoint],
numIterations: Int,
stepSize: Double,
miniBatchFraction: Double,
initialWeights: Vector): LogisticRegressionModel = {
new LogisticRegressionWithSGD(stepSize, numIterat2 Aions, 0.0, miniBatchFraction)
.run(input, initialWeights)
def run(input: RDD[LabeledPoint], initialWeights: Vector): M = {
if (numFeatures & 0) {
numFeatures = input.map(_.features.size).first()
if (input.getStorageLevel == StorageLevel.NONE) {
logWarning("The input data is not directly cached, which may hurt performance if its"
+ " parent RDDs are also uncached.")
if (validateData && !validators.forall(func =& func(input))) {
throw new SparkException("Input validation failed.")
* Scaling columns to unit variance as a heuristic to reduce the condition number:
* During the optimization process, the convergence (rate) depends on the condition number of
* the training dataset. Scaling the variables often reduces this condition number
* heuristically, thus improving the convergence rate. Without reducing the condition number,
* some training datasets mixing the columns with different scales may not be able to converge.
* GLMNET and LIBSVM packages perform the scaling to reduce the condition number, and return
* the weights in the original scale.
* See page 9 in http://cran.r-project.org/web/packages/glmnet/glmnet.pdf
* Here, if useFeatureScaling is enabled, we will standardize the training features by dividing
* the variance of each column (without subtracting the mean), and train the model in the
* scaled space. Then we transform the coefficients from the scaled space to the original scale
* as GLMNET and LIBSVM do.
* Currently, it's only enabled in LogisticRegressionWithLBFGS
val scaler = if (useFeatureScaling) {
new StandardScaler(withStd = true, withMean = false).fit(input.map(_.features))
val data =
if (addIntercept) {
if (useFeatureScaling) {
input.map(lp =& (lp.label, appendBias(scaler.transform(lp.features)))).cache()
input.map(lp =& (lp.label, appendBias(lp.features))).cache()
if (useFeatureScaling) {
input.map(lp =& (lp.label, scaler.transform(lp.features))).cache()
input.map(lp =& (lp.label, lp.features))
* TODO: For better convergence, in logistic regression, the intercepts should be computed
* from the prior probability distribu for linear regression,
* the intercept should be set as the average of response.
val initialWeightsWithIntercept = if (addIntercept && numOfLinearPredictor == 1) {
/** If `numOfLinearPredictor & 1`, initialWeights already contains intercepts. */
val weightsWithIntercept = optimizer.optimize(data, initialWeightsWithIntercept)
createModel(weights, intercept)
def runMiniBatchSGD(
data: RDD[(Double, Vector)],
gradient: Gradient,
updater: Updater,
stepSize: Double,
numIterations: Int,
regParam: Double,
miniBatchFraction: Double,
initialWeights: Vector,
convergenceTol: Double): (Vector, Array[Double]) = {
val stochasticLossHistory = new ArrayBuffer[Double](numIterations)
var weights = Vectors.dense(initialWeights.toArray)
val n = weights.size
* For the first iteration, the regVal will be initialized as sum of weight squares
* if it's L2 for L1 updater, the same logic is followed.
var regVal = updater.compute(
weights, Vectors.zeros(weights.size), 0, 1, regParam)._2
var converged = false
while (!converged && i &= numIterations) {
val bcWeights = data.context.broadcast(weights)
val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i)
.treeAggregate((BDV.zeros[Double](n), 0.0, 0L))(
seqOp = (c, v) =& {
val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1))
(c._1, c._2 + l, c._3 + 1)
combOp = (c1, c2) =& {
(c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3)
if (miniBatchSize & 0) {
* lossSum is computed using the weights from the previous iteration
* and regVal is the regularization value computed in the previous iteration as well.
stochasticLossHistory.append(lossSum / miniBatchSize + regVal)
val update = updater.compute(
weights, Vectors.fromBreeze(gradientSum / miniBatchSize.toDouble),
stepSize, i, regParam)
weights = update._1
regVal = update._2
previousWeights = currentWeights
currentWeights = Some(weights)
if (previousWeights != None && currentWeights != None) {
converged = isConverged(previousWeights.get,
currentWeights.get, convergenceTol)
logWarning(s"Iteration ($i/$numIterations). The size of sampled batch is zero")
logInfo("GradientDescent.runMiniBatchSGD finished. Last 10 stochastic losses %s".format(
stochasticLossHistory.takeRight(10).mkString(", ")))
(weights, stochasticLossHistory.toArray)
* For Binary Logistic Regression.
* Although the loss and gradient calculation for multinomial one is more generalized,
* and multinomial one can also be used in binary case, we still implement a specialized
* binary version for performance reason.
val margin = -1.0 * dot(data, weights)
val multiplier = (1.0 / (1.0 + math.exp(margin))) - label
axpy(multiplier, data, cumGradient)
if (label & 0) {
MLUtils.log1pExp(margin) - margin
margin 就是(-θTx),而multiplier就是hθ(xi)-yi.axpy方法就是(hθ(xi)-yi))xi.


更多关于 极大似然估计法的步骤 的文章

