> head(bob_poisson_aggreg) ticketCount artistVotes capacity ticketsRemain 1 120 1168 169 49 2 21 4365 379 358 3 153 3710 2352 2199 4 158 8766 615 457 5 25 622 50 25 6 314 7700 700 386 bob_poisson_mean_aggreg.artistRating 1 4.57 2 4.67 3 4.90 4 4.49 5 4.38 6 4.42
mod_poi_1 <- glm(ticketCount ~. , family = poisson , data = bob_poisson_aggreg) summary(mod_poi_1) Call: glm(formula = ticketCount ~ ., family = poisson, data = bob_poisson_aggreg) Deviance Residuals: Min 1Q Median 3Q Max -10.5927 -2.5578 0.1436 2.0250 7.7396 Coefficients: Estimate Std. Error z value (Intercept) 2.699e+00 1.260e-01 21.418 artistVotes -1.124e-05 4.252e-07 -26.435 capacity 8.464e-03 7.584e-05 111.604 ticketsRemain -8.449e-03 8.109e-05 -104.188 bob_poisson_mean_aggreg.artistRating 1.914e-01 2.823e-02 6.781 Pr(>|z|) (Intercept) < 2e-16 *** artistVotes < 2e-16 *** capacity < 2e-16 *** ticketsRemain < 2e-16 *** bob_poisson_mean_aggreg.artistRating 1.19e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 18688.0 on 274 degrees of freedom Residual deviance: 3388.1 on 270 degrees of freedom AIC: 4972 Number of Fisher Scoring iterations: 4
我不會從轉換響應變量 (DV) 開始。
我會首先考慮您是否具有正確的鏈接函數,或者是否應該轉換一些 x(自變量)。
如果您希望 ticketCount 與其中一些預測變量成比例(我肯定會),您可能想要使用身份鏈接或輸入一些相關預測變量的日誌,可能將它們作為偏移量放入;選擇取決於您是否將 IV 與響應相關的方式視為在未轉換的 ticketCount 尺度上的加法或乘法。
您還可以考慮其他事項,但仔細考慮 DV 和 IV 之間的關係對於選擇好的模型至關重要。
這是一個模擬泊松數據的示例,其中模擬使用了身份鏈接(即是線性的- 在這種情況下實際上與它成正比)而擬合使用默認的日誌鏈接:
# example code included as requested #(assumes we have already used `par` to get 1x2 grid for plots like above) # generate data x=runif(1000,1,20) y=rpois(1000,33*x) #fit (incorrect in this case) log-link function pfit=glm(y~x,family=poisson) # first plot plot(x,y) plot(pfit,which=1) #shows bowed appearance # identity link (correct in this case) # generalizes to additive in predictors pfiti=glm(y~x,family=poisson(link="identity")) # log link, log predictor, suits *multiplicative* model # and general case is in powers - could fit a model like # E(Y) = a . X1 . X2 . X3^b3 . X4^b4 using offsets pfitl=glm(y~log(x),family=poisson) # second plot plot(pfiti,which=1) plot(pfitl,which=1)