R

只有 1 個觀察值的隨機效應將如何影響廣義線性混合模型?

  • October 27, 2016

我有一個數據集,其中我想用作隨機效應的變量對於某些級別只有一個觀察值。根據對先前問題的回答,我認為原則上這沒問題。

我可以用只有 1 個觀察值的受試者擬合混合模型嗎?

隨機截距模型 - 每個受試者一次測量

但是,在第二個鏈接中,第一個答案指出:

“…假設您沒有使用廣義線性混合模型GLMM,在這種情況下,過度分散的問題就會發揮作用”

我正在考慮使用 GLMM,但我真的不明白單一觀察的隨機效應水平將如何影響模型。


這是我正在嘗試擬合的模型之一的示例。我正在研究鳥類,我想模擬人口和季節對遷徙期間停留次數的影響。我想使用個人作為隨機效應,因為對於某些個人,我有長達 5 年的數據。

library(dplyr)
library(lme4)
pop <- as.character(c("BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA"))
id <- "2 2 4 4 7 7 9 9 10 10 84367 84367 84367 84368 84368 84368 84368 84368 84368 84369 84369 33073 33073 33073 33073 33073 33073 33073 33073 33073 80149 80149 80149 80150 80150 80150 57140 57141 126674 126677 126678 126680 137152 137152 137157 115925 115925 115925 115925 115925 115925 115925 115925 115926 115926 115926 115926 115926 115926 115927 115928 115929 115929 115929 115930 115930 115930 115930 115931 115931 115931 115932 115932 115932"
id <- strsplit(id, " ")
id <- as.numeric(unlist(id))
year <- "2014 2015 2014 2015 2014 2015 2014 2015 2014 2015 2009 2010 2010 2009 2010 2010 2011 2011 2012 2009 2010 2009 2009 2010 2010 2011 2011 2012 2012 2013 2008 2008 2009 2008 2008 2009 2008 2008 2013 2013 2013 2013 2014 2015 2014 2012 2013 2013 2014 2014 2015 2015 2016 2012 2013 2013 2014 2014 2015 2013 2012 2012 2013 2013 2012 2013 2013 2014 2013 2014 2014 2013 2014 2014"
year <- strsplit(year, " ")
year <- as.numeric(unlist(year))
season <- as.character(c("fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "fall", "spring", "fall", "fall", "spring", "fall", "spring", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "spring", "fall", "spring", "spring", "fall", "spring", "spring", "fall", "fall", "fall", "fall", "fall", "fall", "fall", "spring", "fall", "fall", "fall", "spring", "fall", "spring", "fall", "spring", "spring", "fall", "fall", "spring", "fall", "spring", "spring", "fall", "fall", "fall", "fall", "spring", "fall", "fall", "spring", "spring","fall", "fall", "spring", "fall", "fall", "spring"))
stops <- "0 0 0 0 0 0 1 0 2 1 1 0 0 3 2 0 1 1 0 1 1 2 0 1 0 2 0 4 0 0 2 1 1 2 5 2 1 0 9 6 2 3 4 7 2 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0"
stops <- strsplit(stops, " ")
stops <- as.numeric(unlist(stops))

stopdata <- data.frame(pop = pop, id = id, year = year, season = season, stops = stops, stringsAsFactors = FALSE)


stopdata <- group_by(stopdata, pop, id)
summary1 <- summarise(stopdata, n.years = length(year))
table(summary1$n.years)

有27個人。9個人有一個單一的觀察。18 個人有 2-9 次觀察。

如果 1/3 的隨機效應水平只有一個觀察值,應該關注什麼?


我一直在考慮:

選項 1:如上所述的 GLMM

stops.glmm <- glmer(stops ~ pop + season + (1|id), data=stopdata, family = poisson)

選項 2:加權廣義線性模型GLM,使用具有多個觀察值的個體的平均值

aggfun <- function(data, idvars=c("pop", "season", "id"), response){
#select id variables, response variable, and year
sub1 <- na.omit(data[,c(idvars, "year", response)])
#aggregate for mean response by year
agg1 <- aggregate(sub1[names(sub1) == response],by=sub1[idvars],FUN=mean)
#sample size for each aggregated group
aggn <- aggregate(sub1[response],by=sub1[idvars],FUN=length)
#rename sample size column
names(aggn)[4] <- "n"
agg2 <- merge(agg1, aggn)
agg2}


#Create weighted dataset
stops.weight <- aggfun(data = stopdata, response = "stops")
stops.weight$stops <- round(stops.weight$stops)

#Weighted GLM
stops.glm <- glm(stops~pop + season, data=stops.weight, family = poisson, weights = n)

一般來說,您有可識別性的問題。將隨機效應分配給只有一次測量的參數的線性模型無法區分隨機效應和殘差。

典型的線性混合效應方程如下所示:

在哪裡是固定效應,是水平的隨機效應, 和是剩餘變異性測量。當您對具有隨機效應的水平只有一次觀察時,很難區分和. 您將(通常)將方差或標準差擬合到和,所以每個人只有一個測量值,你將不能確定你有一個準確的估計和, 但方差總和的估計值 () 應該是相對穩健的。

關於實際答案:如果您有大約 1/3 的觀察結果,每個人只有一次觀察結果,那麼總體上您可能還可以。其餘人口應提供合理的估計和,並且這些人總體上應該是次要的貢獻者。另一方面,如果您讓所有個體都具有特定的固定效應和隨機效應,並且採用單一測量(例如,對於您的示例,可能是整個人口 - 也許這對您來說意味著物種),那麼您會不太相信結果.

引用自:https://stats.stackexchange.com/questions/242821

comments powered by Disqus