R

R:具有family =“二項式”和“重量”規範的glm函數

  • March 12, 2015

我對重量如何glmfamily="binomial". 據我了解,glmwith 的可能性family = "binomial"指定如下: $$ f(y) = {n\choose{ny}} p^{ny} (1-p)^{n(1-y)} = \exp \left(n \left[ y \log \frac{p}{1-p} - \left(-\log (1-p)\right) \right] + \log {n \choose ny}\right) $$ 在哪裡 $ y $ 是“觀察到的成功的比例”和 $ n $ 是已知的試驗次數。

在我的理解中,成功的概率 $ p $ 用一些線性係數參數化 $ \beta $ 作為 $ p=p(\beta) $ 並glm具有family = "binomial"搜索功能: $$ \textrm{arg}\max_{\beta} \sum_i \log f(y_i). $$ 那麼這個優化問題可以簡化為:

$$ \textrm{arg}\max_{\beta} \sum_i \log f(y_i)= \textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} - \left(-\log (1-p(\beta))\right) \right] + \log {n_i \choose n_iy_i}\

\textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} - \left(-\log (1-p(\beta))\right) \right] \ $$

因此,如果我們讓 $ n_i^*=n_ic $ 對所有人 $ i=1,…,N $ 對於一些常數 $ c $ ,那麼它也必須是真的: $$ \textrm{arg}\max_{\beta} \sum_i \log f(y_i)

\textrm{arg}\max_{\beta} \sum_i n^*_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} - \left(-\log (1-p(\beta))\right) \right] \ $$ 由此,我認為試驗次數的縮放 $ n_i $ 常數不影響最大似然估計 $ \beta $ 給定成功的比例 $ y_i $ .

的幫助文件glm說:

"For a binomial GLM prior weights are used to 
give the number of trials when the response is 
the proportion of successes" 

因此,我預計重量的縮放不會影響估計的 $ \beta $ 給定成功的比例作為響應。但是,以下兩個代碼返回不同的係數值:

Y <- c(1,0,0,0) ## proportion of observed success
w <- 1:length(Y) ## weight= the number of trials
glm(Y~1,weights=w,family=binomial)

這產生:

Call:  glm(formula = Y ~ 1, family =  
           "binomial", weights = w)

Coefficients:
(Intercept)  
     -2.197     

而如果我將所有權重乘以 1000,估計的係數是不同的:

glm(Y~1,weights=w*1000,family=binomial)

Call:  glm(formula = Y ~ 1, family = binomial,  
           weights = w * 1000)

Coefficients:
(Intercept)  
   -3.153e+15  

即使權重適度縮放,我也看到了許多其他類似的示例。這裡發生了什麼?

您的示例僅導致 R 中的捨入誤差。大權重在glm. 確實,w按幾乎任何較小的數字(例如 100)進行縮放都會導致與未縮放的w.

如果您希望使用 weights 參數獲得更可靠的行為,請嘗試使用包中的svyglm函數survey

看這裡:

   > svyglm(Y~1, design=svydesign(ids=~1, weights=~w, data=data.frame(w=w*1000, Y=Y)), family=binomial)
Independent Sampling design (with replacement)
svydesign(ids = ~1, weights = ~w, data = data.frame(w = w * 1000, 
   Y = Y))

Call:  svyglm(formula = Y ~ 1, design = svydesign(ids = ~1, weights = ~w2, 
   data = data.frame(w2 = w * 1000, Y = Y)), family = binomial)

Coefficients:
(Intercept)  
    -2.197  

Degrees of Freedom: 3 Total (i.e. Null);  3 Residual
Null Deviance:      2.601 
Residual Deviance: 2.601    AIC: 2.843

引用自:https://stats.stackexchange.com/questions/141412

comments powered by Disqus