How do 用deal with造句 pr...

羽毛球技术 | 体育赛事 | 英文歌曲 | 住宅风水 | 用户界面设计师 | 六爻 | 书籍改编电影 | 德国足球甲级联赛 | 欧美明星 | PLC | 中国足球 | aj1 | 国家队 | 拜仁慕尼黑足球俱乐部 | 小说创作 | 配音 | iOS应用 | NBA 2K | 古典音乐 | 面相 | 火影忍者 | 武汉大学 | 土拨鼠 | 营销策划 | 秦时明月之天行九歌 | 设计师 | 巴塞罗那足球俱乐部 | 尤文图斯 | 实况足球（游戏） | 少帅 | 罗玉凤 | 比利时 | 跑鞋 | 冷知识 | 肖战 | 李元胜 | 古琴 | 按键精灵 | 罗兰 | 徐波 | 激光手术 | 角色扮演 | 关晓彤 | 微电影 | safari | 北京国安 | 古汉语 | 曼彻斯特联 | 玄幻小说 | 科幻小说 | 双眼皮手术 | 主题曲 | 年会 | 检测仪 | 徒步 | 互联网公司 | 百度输入法 | 镜头 | 宜昌市 | 自拍 | 金蝶 | 电子烟 | 网站建设 | 广播体操 | 文身 | nba篮球 | 索尼(sony) | 天体物理学 | 痛风 | 象棋 | 牛皮癣 | 皮肤护理 | 周星驰（人物） | 试管婴儿 | 亚足联亚洲杯（AFC Asian Cup） | 健美 | 美术生 | 迅雷（软件） | 战斗机 | 穿越小说 | 张璐 | 姓氏 | 诸葛亮 | 后宫·甄嬛传（书籍） | 虎牙直播 | snh48 | 阿迪达斯 | 投影仪 | 组装机 | 微信群 | 阿迪达斯(adidas) | 网球王子 | 分子生物学 | 耽美 | 武磊 | 婚礼 | 表演 | 中国武术 | 动画电影 | Air Jordan | 张子枫 | 免费软件 | 相声演员 | 摩羯座 | 宿舍 | ansys | 法国足球甲级联赛 | 户外 | 剧场版 | 杨凡 | 科幻电影 | galgame | 融资 | 关节炎 | NBA季后赛 | 神话 | 王力宏（人物） | 建模 | 计算机病毒 | 广州恒大淘宝足球俱乐部 | 北京奥运会 | 电脑电源 | 百度翻译 | 字幕 | 讯飞输入法 | 海关 | 易烊千玺 | 深度学习 | 编辑器 | 澳门特别行政区 | 直播 | 流氓软件 | 事故 | 大片 | 李景亮 | 郭富城 | 日语歌曲 | 卡牌游戏 | 小品 | 东京 | 花卉 | 音乐剧 | 互联网创业 | 占卜 | 羽毛球拍 | 婆媳关系 | 日本动画 | 巴黎 | 拳击比赛 | 东南亚 | 足球经理（FM）（游戏） | youtube | 胡歌（演员） | 地铁跑酷 | 植发 | 张继科 | 三国 | 用户界面 | 演技 | 百度竞价 | 青梅竹马 | 移动硬盘 | 韩晓鹏 | 马龙 | 瘦腿 | 宠物医疗 | 巨蟹座 | 徐峥 | 天蝎座 | 胸肌 | 赵丽颖（演员） | adidas阿迪达斯 | 低音炮 | 星际争霸（游戏） | 豆瓣电影 | 微信开放平台 | 手绘 | 吉他学习 | 江苏卫视 | 模特 | 创意 | 团队管理 | 奢侈品 | 王源 | TANK | 笛子 | 偶像 | 莱斯特城 | 维生素 | 新百伦 | 国际物流 | 前女友 | 李小龙 | 华语流行音乐 | 猎头公司 | crm | 搏击项目 | 网站运营 | 鼻炎 | 篮球游戏 |

你的位置：网站首页 >> 频道首页 >>英语考试 >>How do 用deal with造句 pr...

How do 用deal with造句 pr...

来源：蜘蛛抓取(WebSpider) 时间：2011-10-07 11:39 标签：用deal with造句

FAQ: Complete or quasi-complete separation and some strategies for dealing with it
What is complete or quasi-complete separation in logistic/probit regression and
how do we deal with them?
Occasionally when running a logistic/probit&
regression we run into the problem of so-called complete separation or
quasi-complete separation. On this page, we will discuss what complete or
quasi-complete separation is and how to deal with the problem when it occurs.
Notice that the example data set used for this page is extremely small. It is
for the purpose of illustration only.
What is complete separation and what do some of the most commonly used
software packages do when it happens?
A complete separation happens when the outcome variable separates a predictor
variable or a combination of predictor variables completely.
Albert and Anderson (1984) define this
as, "there is a vector & that correctly allocates all observations to their group." Below is a small example.
In this example, Y is the outcome variable, X1 and X2 are predictor variables. We can see that
observations with Y = 0 all have values of X1&=3 and observations with
Y = 1 all have values of X1&3.
In other words, Y separates X1 perfectly. The other way to see it is
that X1 predicts Y perfectly since X1&=3 corresponds to Y
= 0 and X1 & 3 corresponds to Y = 1. By chance, we have found a perfect predictor X1 for
the outcome variable Y.&In terms of predicted probabilities, we have Prob(Y
= 1 | X1&=3) = 0 and Prob(Y=1 X1&3) = 1, without the need for estimating a
Complete separation or perfect prediction can occur for several reasons. One common
example is when using several categorical variables whose categories are coded by indicators.
For example, if one is studying an age-related disease (present/absent) and age is one of the
predictors, there may be subgroups (e.g., women over 55) all of whom have the disease.
Complete separation also may occur if there is a coding error or you mistakingly included
another version of the outcome as a predictor. For example, we might have dichotomized a
continuous variable X into a binary variable Y. We then wanted to study the
relationship between Y and some predictor variables. If we would include X as a
predictor variable, we would run into the problem of perfect prediction, since
by definition, Y separates X completely. . The other possible scenario for
complete separation to happen is when the sample size is very small. In our
example data above, there is no reason for why Y has to be 0 when X1 is
&=3. If the sample were large enough, we would probably have some observations
with Y = 1 and X1 &=3, breaking up the complete separation of X1.
What happens when we try to fit a logistic
or a probit regression model of Y on X1 and X2? Mathematically the maximum
likelihood estimate for X1 does not exist. In particular with this example, the
larger the coefficient for X1, the larger the likelihood. In other words, the
coefficient for X1 should be as large as it can be, which would be infinity!&
In terms of the behavior of statistical software packages, below is what SAS (version
9.2), SPSS (version 18), Stata
(version 11) and R (version 2.11.1) do when we run the model on the sample data. We present these results here in the
hope that some level of understanding of the behavior of logistic/probit& regression
when using our familiar software package might help us identify
the problem of complete separation more efficiently.
input Y X1 X2;
proc logistic data =
model y = x1 x2;
(some output omitted)
Model Convergence Status
Complete separation of data points detected.
WARNING: The maximum likelihood estimate does not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based
on the last maximum likelihood iteration. Validity of the model fit is questionable.
Model Fit Statistics
Covariates
WARNING: The validity of the model fit is questionable.
Testing Global Null Hypothesis: BETA=0
Chi-Square
Pr & ChiSq
Likelihood Ratio
Analysis of Maximum Likelihood Estimates
Chi-Square
Pr & ChiSq
We can see that the first related message is that SAS detected complete
separation of data points,& it gives further warning messages indicating
that the maximum likelihood estimate does not exist and continues to finish the
computation. Also notice that SAS does not tell us which variable is or which
variables are being separated completely by the outcome variable and the
parameter estimate for X1 is incorrect.
data list list
begin data.
logistic regression variable Y
/method = enter X1 X2.
Logistic Regression
|-----------------------------------------------------------------------------------------|
|The parameter covariance matrix cannot be computed. Remaining statistics will be omitted.|
|-----------------------------------------------------------------------------------------|
(some output omitted)
Block 1: Method = Enter
Model Summary
|----|-----------------|--------------------|-------------------|
|Step|-2 Log likelihood|Cox & Snell R Square|Nagelkerke R Square|
|----|-----------------|--------------------|-------------------|
|----|-----------------|--------------------|-------------------|
a. Estimation terminated at iteration number 20 because a perfect fit is detected. This solution is not unique.
We see that SPSS detects a perfect fit and immediately stops the rest of the
computation. It does not provide any parameter estimates. Neither does it& provide us
with any further information on the set of variables that gives the perfect fit.
input Y X1 X2
logit Y X1 X2
outcome = X1 & 3 predicts data perfectly
We see that Stata detects the perfect prediction by X1 and stops computation
immediately.
y&- c(0,0,0,0,1,1,1,1)
x1&-c(1,2,3,3,5,6,10,11)
x2&-c(3,2,-1,-1,2,4,1,0)
m1&- glm(y~ x1+x2, family=binomial)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(m1)
glm(formula = y ~ x1 + x2, family = binomial)
Deviance Residuals:
-2.107e-08
-1.404e-05
-2.522e-06
-2.522e-06
Coefficients:
Estimate Std. Error
z value Pr(&|z|)
(Intercept)
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1.1090e+01
degrees of freedom
Residual deviance: 4.5454e-10
degrees of freedom
Number of Fisher Scoring iterations: 24
The only warning message that R gives is right after fitting the logistic model. It&
says that &fitted probabilities numerically 0 or 1 occurred&.
Combining this piece of information with the parameter estimate for x1 being
really large (&15), we suspect that there is a problem of complete or quasi-complete separation. The standard errors
for the parameter estimates are way too large. This
usually indicates a convergence issue or some degree of data separation.
What is quasi-complete separation and what do some of the most commonly used
software packages do when it happens?
Quasi-complete separation in a logistic/probit regression happens when the outcome
variable separates a predictor variable or a combination of predictor variables
to certain degree. Here is an example.
Notice that the outcome variable Y separates the predictor variable X1 pretty
well except for values of X1 equal to 3. In other words, X1 predicts Y perfectly
when X1 &3 (Y = 0) or X1 &3 (Y=1), leaving only when X1 = 3 as cases with
uncertainty.& In terms of expected probabilities, we have Prob(Y=1 |
X1&3) = 0 and Prob(Y=1 | X1&3) = 1, nothing to be estimated, except for Prob(Y =
1 | X1 = 3).
What happens when we try to fit a logistic or a probit regression model of Y on X1 and X2
using the data above? It turns out that the maximum likelihood estimate for X1
does not exist. With this example, the larger the parameter for X1, the larger
the likelihood. In practice,
a value of 15 or larger does not make much difference and they all basically
correspond to predicted probability of 1. The behavior of different statistical
software packages differ at how they deal with the issue of quasi-complete
separation. Below is what each package of SAS, SPSS, Stata
and R does with our sample data and the logistic regression model of Y on X1 and
X2. We present these results here in the
hope that some level of understanding of the behavior of logistic/probit regression
within our familiar software package might help us identify
the problem of separation more efficiently.
input Y X1 X2;
proc logistic data = t2
model y = x1 x2;
(some output omitted)
Response Profile
Probability modeled is Y=1.
Model Convergence Status
Quasi-complete separation of data points detected.
WARNING: The maximum likelihood estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based
on the last maximum likelihood iteration. Validity of the model fit is questionable.
Model Fit Statistics
Covariates
WARNING: The validity of the model fit is questionable.
Testing Global Null Hypothesis: BETA=0
Chi-Square
Pr & ChiSq
Likelihood Ratio
Analysis of Maximum Likelihood Estimates
Chi-Square
Pr & ChiSq
We see that SAS used all 10 observations and it gave warnings at various
points. It informed us that it detected quasi-complete separation of the data
points. It is worth noticing that neither the parameter estimate for X1 or for
the intercept mean much at all.
input y x1 x2
logit y x1 x2
note: outcome = x1 & 3 predicts data perfectly except for
x1 == 3 subsample:
x1 dropped and 7 obs not used
Iteration 0:
log likelihood = -1.9095425
Iteration 1:
log likelihood = -1.8896311
Iteration 2:
log likelihood = -1.8895913
Iteration 3:
log likelihood = -1.8895913
Logistic regression
Number of obs
LR chi2(1)
Prob & chi2
Log likelihood = -1.8895913
------------------------------------------------------------------------------
[95% Conf. Interval]
-------------+----------------------------------------------------------------
------------------------------------------------------------------------------
Stata detected that there was a quasi-separation and informed us which
predict variable was part of the issue. It tells us that predictor variable x1
predicts the data perfectly except when x1 = 3. It therefore drops all the cases
when x1 predicts the outcome variable perfectly, keeping only the three
observations when x1 = 3. Since x1 is a constant (=3) on this small sample, it
is also dropped out of the analysis.
data list list
begin data.
logistic regression variable y
/method = enter x1 x2.
(Some output omitted)
Block 1: Method = Enter
Model Summary
|----|-----------------|--------------------|-------------------|
|Step|-2 Log likelihood|Cox & Snell R Square|Nagelkerke R Square|
|----|-----------------|--------------------|-------------------|
|----|-----------------|--------------------|-------------------|
a. Estimation terminated at iteration number 20 because maximum iterations has been reached. Final solution cannot be found.
Classification Table(a)
|------|-----------------------|---------------------------------|
|Predicted
|----|--------------|------------------|
|Percentage Correct|
|---------|----|
|------|------------------|----|---------|----|------------------|
|----|---------|----|------------------|
|------------------|----|---------|----|------------------|
|Overall Percentage
|------|-----------------------|---------|----|------------------|
a. The cut value is .500
Variables in the Equation
|----------------|-------|---------|----|--|----|-------|
|Wald|df|Sig.|Exp(B) |
|-------|--------|-------|---------|----|--|----|-------|
|Step 1a|x1
|17.923 | |.000|1 |.997|6.082E7|
|--------|-------|---------|----|--|----|-------|
|.039|1 |.843|.886
|--------|-------|---------|----|--|----|-------|
|Constant|-54.313||.000|1 |.997|.000
|-------|--------|-------|---------|----|--|----|-------|
a. Variable(s) entered on step 1: x1, x2.
SPSS tried to iterate to the default number of iterations and couldn't
reach a solution and thus stopped the iteration process. It didn't tell us
anything about quasi-complete separation. So it is up to us to figure out why
the computation didn't converge. One obvious evidence in this example is the
large magnitude of the
parameter estimate for x1. It is really large and its standard error is even
larger. Based on this piece of evidence, we should look at the relationship
between the outcome variable y and x1.& For instance, we can take a look at
the cross tabulation of x1 by y as follows.
/tables = x1 by y.
x1 * y Crosstabulation
.00 1.00 Total
x1 1.00 1 0 1
2.00 1 0 1
3.00 2 1 3
4.00 0 1 1
5.00 0 1 1
6.00 0 1 1
10.00 0 1 1
11.00 0 1 1
The visual inspection reveals that there is a problem of quasi-complete
separation involving x1. In practice, this process of identifying the issue could be very
lengthy& since there may
be multiple predictor variables involved.&
y&- c(0,0,0,0,1,1,1,1,1,1)
x1&-c(1,2,3,3,3,4,5,6,10,11)
x2&-c(3,0,-1,4,1,0,2,7,3,4)
m1&- glm(y~ x1+x2, family=binomial)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(m1)
(some output omitted)
glm(formula = y ~ x1 + x2, family = binomial)
Deviance Residuals:
-1.004e+00
-5.538e-05
Coefficients:
Estimate Std. Error z value Pr(&|z|)
(Intercept)
The only warning we get from R is right after the glm command about
predicted probabilities being 0 or 1. From the parameter estimates we can see
that the coefficient for x1 is very large and its standard error is even larger,
an indication that the model might have some issues with x1. Based on this piece
of evidence, we should look at the relationship between the outcome variable y
and x1 descriptively as shown below. Visual inspection tells us that there is a problem with
quasi-complete separation involving variable x1.
table(x1, y)
What are the techniques for dealing with complete separation or quasi-complete separation?
Now we have some understanding of what complete or quasi-complete separation
is, an immediate question is what the techniques are for dealing with the issue.
We will give a general and brief description about a few techniques for dealing
with the issue with illustration sample code in SAS. Note that these techniques
may be available in other packages, for example, Stata's user written
firthlogit command. Let's say that the
predictor variable involved in complete quasi-complete separation is called X.&
In the case of complete separation, make sure that the outcome variable
is not a dichotomous version of a variable in the model.
If it is quasi-complete separation, the easiest strategy is the &Do nothing& strategy. This is because that the maximum likelihood for other predictor variables are
still valid. The drawback is that we don't get any reasonable estimate for
the variable X that actually predicts the outcome variable effectively.&
This strategy does not work well for the situation of complete separation.
Another simple strategy is to not include X in the model. The problem is
that this leads to biased estimates for the other predictor variables in
the model. Thus, this is not a recommended strategy.
Possibly we might be able to collapse some categories of X if X is a categorical variable
and if it makes substantive sense to do so.
Exact method is a good strategy when the data set is small and the model
is not very large. Below is a sample code in SAS.&
proc logistic data = t2
model y = x1 x2;
exact x1 / estimate=
Firth logistic regression is another good strategy. It uses a penalized likelihood estimation method.
Firth bias-correction is considered as an ideal solution to separation issue
for logistic regression. For more information on logistic regression using
Firth bias-correction,& we refer our readers to the article by Georg
Heinze and Michael Schemper.
proc logistic data = t2
model y = x1 x2 /
Bayesian method can be used when we have some additional information on the
parameter estimates of the predictor va.
input _type_ $ Intercept x1 x2;
Var 1 100 100
Mean 0 1 2
proc genmod descending data=t2;
x1 x2 /dist=binomial link=
bayes seed=34367 plots=all nbi=2000 nmc=10000
coeffprior=normal(input=myprior);
ods output PosteriorSample=P
References
SAS Notes:
P. Allison, ,
SAS Global Forum 2008
Robert E. Err, SAS Institute Inc,
Georg Heinze and Michael Schemper,
, Statistics
in Medicine, 2002, vol. 21 .
Albert A. and Anderson, J. A. (1984).
On the existence of maximum likelihood estimates in logistic regression models. Biometrika, 71, 1.
Thanks to Maureen Lahiff for suggestions to improve this page.
The content of this web site should not be construed as an endorsement
of any particular web site, book, or software product by the
University of California.

How do 用deal with造句 pr...

我要回帖

更多关于用deal with造句的文章

随机推荐

How do 用deal with造句 pr...

我要回帖

更多关于 用deal with造句 的文章

随机推荐

更多关于用deal with造句的文章