比較 R 中 GLM 後的因子水平

May 29, 2013

以下是關於我的情況的一點背景：我的數據是指捕食者成功吃掉的獵物數量。由於每次試驗的獵物數量有限（25 個可用），我有一個“樣本”列表示可用獵物的數量（因此，每次試驗 25 個），另一個稱為“計數”，它是成功的數量（吃了多少獵物）。我的分析基於 R 書中關於比例數據的示例（第 578 頁）。解釋變量是溫度（4 個水平，我將其視為因素）和捕食者的性別（顯然，男性或女性）。所以我最終得到了這個模型：
model <- glm(y ~ Temperature+Sex+Temperature*Sex, 
         data=predator, family=quasibinomial) 
在得到偏差分析表後，事實證明溫度和性別（但不是交互作用）對獵物的消耗有顯著影響。現在，我的問題是：我需要知道哪些溫度不同，即我必須將這 4 個溫度相互比較。如果我有一個線性模型，我會使用該TukeyHSD函數，但因為我使用的是 GLM，所以我不能。我一直在查看 MASS 包並嘗試設置對比度矩陣，但由於某種原因它不起作用。有什麼建議或參考嗎？

這是我從我的模型中得到的摘要，如果這有助於使其更清晰……
y <- cbind(data$Count, data$Sample-data$Count)
model <- glm(y ~ Temperature+Sex+Temperature*Sex, 
            data=predator, family=quasibinomial) 
> summary(model)

# Call:
# glm(formula = y ~ Temperature + Sex + Temperature * Sex, 
      family=quasibinomial, data=data)

# Deviance Residuals: 
# Min 1Q Median 3Q Max 
# -3.7926 -1.4308 -0.3098 0.9438 3.6831 
   
# Coefficients:
# Estimate Std. Error t value Pr(>|t|) 
# (Intercept) -1.6094 0.2672 -6.024 3.86e-08 ***
# Temperature8 0.3438 0.3594 0.957 0.3414 
# Temperature11 -1.0296 0.4803 -2.144 0.0348 * 
# Temperature15 -1.2669 0.5174 -2.449 0.0163 * 
# SexMale 0.3822 0.3577 1.069 0.2882 
# Temperature8:SexMale -0.2152 0.4884 -0.441 0.6606 
# Temperature11:SexMale 0.4136 0.6093 0.679 0.4990 
# Temperature15:SexMale 0.4370 0.6503 0.672 0.5033 
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
   
# (Dispersion parameter for quasibinomial family taken to be 2.97372) 
# Null deviance: 384.54 on 95 degrees of freedom
# Residual deviance: 289.45 on 88 degrees of freedom
# AIC: NA 
# Number of Fisher Scoring iterations: 5

安妮，我會簡短地解釋一下如何進行這種多重比較。為什麼這在您的特定情況下不起作用，我不知道；抱歉。

但通常，您可以使用multcomppackage 和 function來做到這一點glht。這是一個例子：
mydata      <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)
my.mod      <- glm(admit~gre+gpa*rank, data=mydata, family=quasibinomial)

summary(my.mod)
# 
# Coefficients:
# Estimate Std. Error t value Pr(>|t|) 
# (Intercept) -4.985768 2.498395 -1.996 0.0467 *
# gre 0.002287 0.001110 2.060 0.0400 *
# gpa 1.089088 0.731319 1.489 0.1372 
# rank2 0.503294 2.982966 0.169 0.8661 
# rank3 0.450796 3.266665 0.138 0.8903 
# rank4 -1.508472 4.202000 -0.359 0.7198 
# gpa:rank2 -0.342951 0.864575 -0.397 0.6918 
# gpa:rank3 -0.515245 0.935922 -0.551 0.5823 
# gpa:rank4 -0.009246 1.220757 -0.008 0.9940 
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
如果您想計算rank使用 Tukey 的 HSD 之間的成對比較，您可以這樣做：
library(multcomp)
summary(glht(my.mod, mcp(rank="Tukey")))
# 
# Simultaneous Tests for General Linear Hypotheses
# 
# Multiple Comparisons of Means: Tukey Contrasts
# 
# Fit: glm(formula = admit ~ gre + gpa * rank, family = quasibinomial, data = mydata) 
# 
# Linear Hypotheses:
# Estimate Std. Error z value Pr(>|z|)
# 2 - 1 == 0 0.5033 2.9830 0.169 0.998
# 3 - 1 == 0 0.4508 3.2667 0.138 0.999
# 4 - 1 == 0 -1.5085 4.2020 -0.359 0.984
# 3 - 2 == 0 -0.0525 2.6880 -0.020 1.000
# 4 - 2 == 0 -2.0118 3.7540 -0.536 0.949
# 4 - 3 == 0 -1.9593 3.9972 -0.490 0.960
# (Adjusted p values reported -- single-step method)
# 
# Warning message:
# In mcp2matrix(model, linfct = linfct) :
# covariate interactions found -- default contrast might be inappropriate
所有成對比較都與 -價值。警告：這些比較只涉及主效應。相互作用被忽略。如果存在交互，將給出警告（如上面的輸出所示）。有關在存在交互時如何繼續進行的更廣泛的教程，請參閱這些額外的multcomp 示例。

注意：正如@gung 在評論中指出的那樣，您應該 - 只要有可能 - 將溫度作為連續變量而不是分類變量。關於交互：您可以執行似然比檢驗來檢查交互項是否顯著改善了模型擬合。在您的情況下，代碼看起來像這樣：
# Original model
model <- glm(y ~ Temperature+Sex+Temperature*Sex, data=predator, family=quasibinomial) 

# Model without an interaction
model2 <- glm(y ~ Temperature+Sex data=predator, family=quasibinomial) 

# Likelihood ratio test
anova(model, model2, test="LRT")
如果此測試不顯著，您可以從模型中刪除交互。也許glht會工作？

引用自：https://stats.stackexchange.com/questions/60352

比較 R 中 GLM 後的因子水平

相關問答

年齡部分作為連續變量，部分作為分類變量

為什麼我們使用術語“人口”而不是“數據生成過程”？

參考請求：單位根理論書籍

為什麼我們在線性回歸中建模噪聲而不是邏輯回歸？

時間序列預測：從 ARIMA 到 LSTM

計量經濟學家將如何回答 Chen 和 Pearl (2013) 提出的反對意見和建議？