具有強殘差模式的泊松回歸

May 11, 2017

我正在運行這個泊松回歸，並且在偏差和皮爾遜殘差方面面臨非常強烈的模式，什麼是糾正模型的合適方法？

這是我的數據的負責人：

> head(bob_poisson_aggreg)
 ticketCount artistVotes capacity ticketsRemain
1         120        1168      169            49
2          21        4365      379           358
3         153        3710     2352          2199
4         158        8766      615           457
5          25         622       50            25
6         314        7700      700           386
 bob_poisson_mean_aggreg.artistRating
1                                 4.57
2                                 4.67
3                                 4.90
4                                 4.49
5                                 4.38
6                                 4.42

這是我運行的模型：

mod_poi_1 <- glm(ticketCount ~. , family = poisson , data = bob_poisson_aggreg)
summary(mod_poi_1)
Call:
glm(formula = ticketCount ~ ., family = poisson, data = bob_poisson_aggreg)

Deviance Residuals: 
    Min        1Q    Median        3Q       Max  
-10.5927   -2.5578    0.1436    2.0250    7.7396  

Coefficients:
                                      Estimate Std. Error  z value
(Intercept)                           2.699e+00  1.260e-01   21.418
artistVotes                          -1.124e-05  4.252e-07  -26.435
capacity                              8.464e-03  7.584e-05  111.604
ticketsRemain                        -8.449e-03  8.109e-05 -104.188
bob_poisson_mean_aggreg.artistRating  1.914e-01  2.823e-02    6.781
                                    Pr(>|z|)    
(Intercept)                           < 2e-16 ***
artistVotes                           < 2e-16 ***
capacity                              < 2e-16 ***
ticketsRemain                         < 2e-16 ***
bob_poisson_mean_aggreg.artistRating 1.19e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

   Null deviance: 18688.0  on 274  degrees of freedom
Residual deviance:  3388.1  on 270  degrees of freedom
AIC: 4972

Number of Fisher Scoring iterations: 4

我什至不確定泊松模型在這裡是否合適，但如果是，我應該如何繼續？我會對響應變量進行轉換：“ticketCount”。

是否有一些一般程序可以遵循？

我將非常感謝任何見解或參考！

我不會從轉換響應變量 (DV) 開始。

我會首先考慮您是否具有正確的鏈接函數，或者是否應該轉換一些 x（自變量）。

如果您希望 ticketCount 與其中一些預測變量成比例（我肯定會），您可能想要使用身份鏈接或輸入一些相關預測變量的日誌，可能將它們作為偏移量放入；選擇取決於您是否將 IV 與響應相關的方式視為在未轉換的 ticketCount 尺度上的加法或乘法。

您還可以考慮其他事項，但仔細考慮 DV 和 IV 之間的關係對於選擇好的模型至關重要。

這是一個模擬泊松數據的示例，其中模擬使用了身份鏈接（即是線性的- 在這種情況下實際上與它成正比）而擬合使用默認的日誌鏈接：

這與你所看到的相當一致。當我看到您的情節時，我的第一個想法是“可能需要一個身份鏈接”，然後在查看您的變量名之後，這似乎很有意義；我可能會先嘗試身份鏈接。

這是我使用我提出的兩個解決方案時得到的結果：

您可以看到它們都解決了缺乏擬合的問題，但是在我將不同的鏈接擬合到我用來生成數據的鏈接的情況下存在一些異方差（這是預期的）。
# example code included as requested
#(assumes we have already used `par` to get 1x2 grid for plots like above)

# generate data
x=runif(1000,1,20)
y=rpois(1000,33*x)

#fit (incorrect in this case) log-link function
pfit=glm(y~x,family=poisson)

# first plot
plot(x,y)
plot(pfit,which=1) #shows bowed appearance 

# identity link (correct in this case)
#  generalizes to additive in predictors
pfiti=glm(y~x,family=poisson(link="identity"))

# log link, log predictor, suits *multiplicative* model
#  and general case is in powers - could fit a model like
#   E(Y) = a . X1 . X2 . X3^b3 . X4^b4 using offsets
pfitl=glm(y~log(x),family=poisson)

# second plot
plot(pfiti,which=1)
plot(pfitl,which=1)

引用自：https://stats.stackexchange.com/questions/279067

具有強殘差模式的泊松回歸

相關問答

非正態的混合是正態的嗎？

在構建 ML 模型時不檢查殘差的原因是什麼？

泊松回歸合適嗎？

計數樣本似乎不是泊松分佈，需要進行完整性檢查

GLM 中的過度分散測試真的“有用”嗎？

在其他回歸器上回歸 Logistic 回歸殘差