R

為什麼我在 R 中得到 OLS 和 GLS 相同的結果?

  • May 20, 2013

當我運行此代碼時:

require(nlme)

a <- matrix(c(1,3,5,7,4,5,6,4,7,8,9))

b <- matrix(c(3,5,6,2,4,6,7,8,7,8,9))

res <- lm(a ~ b)

print(summary(res))

res_gls <- gls(a ~ b)

print(summary(res_gls))

我得到相同的係數和相同的係數統計顯著性:

Loading required package: nlme

Call:
lm(formula = a ~ b)

Residuals:
   Min      1Q  Median      3Q     Max 
-2.7361 -1.1348 -0.2955  1.2463  3.8234 

Coefficients:
           Estimate Std. Error t value Pr(>|t|)  
(Intercept)   2.0576     1.8732   1.098   0.3005  
b             0.5595     0.2986   1.874   0.0937 .
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 2.088 on 9 degrees of freedom
Multiple R-squared: 0.2807, Adjusted R-squared: 0.2007 
F-statistic: 3.512 on 1 and 9 DF,  p-value: 0.09371 

Generalized least squares fit by REML
 Model: a ~ b 
 Data: NULL 
     AIC      BIC    logLik
 51.0801 51.67177 -22.54005

Coefficients:
               Value Std.Error  t-value p-value
(Intercept) 2.0576208 1.8731573 1.098477  0.3005
b           0.5594796 0.2985566 1.873948  0.0937

Correlation: 
 (Intr)
b -0.942

Standardized residuals:
      Min         Q1        Med         Q3        Max 
-1.3104006 -0.5434780 -0.1415446  0.5968911  1.8311781 

Residual standard error: 2.087956 
Degrees of freedom: 11 total; 9 residual

為什麼會這樣?在什麼情況下 OL​​S 估計與 GLS 估計相同?

**您得到了相同的結果,因為您沒有在gls函數中指定特殊的方差或相關結構。**如果沒有這些選項,GLS 的行為就像 OLS。GLS 模型優於正態回歸的優點是能夠指定相關結構(選項correlation)或允許殘差方差不同(選項weights)。讓我用一個例子來說明這一點。

library(nlme)

set.seed(1500)

x <- rnorm(10000,100,12) # generate x with arbitrary values

y1 <- 10 + 15*x + rnorm(10000,0,5) # the first half of the dataset

y2 <-  -2 - 5*x + rnorm(10000,0,15) # the 2nd half of the data set with 3 times larger residual SD (15 vs. 5)

y <- c(y1, y2)
x.new <- c(x, x)

dummy.var <- c(rep(0, length(y1)), rep(1, length(y2))) # dummy variable to distinguish the first half of the dataset (y1) from the second (y2)

# Calculate a normal regression model 

lm.mod <- lm(y~x.new*dummy.var)

summary(lm.mod)

Coefficients:
                Estimate Std. Error   t value Pr(>|t|)    
(Intercept)      10.27215    0.94237    10.900   <2e-16 ***
x.new            14.99691    0.00935  1603.886   <2e-16 ***
dummy.var       -12.07076    1.33272    -9.057   <2e-16 ***
x.new:dummy.var -19.99891    0.01322 -1512.387   <2e-16 ***

# Calculate a GLS without any options

gls.mod.1 <- gls(y~x.new*dummy.var)

summary(gls.mod.1)

Coefficients:
                   Value Std.Error    t-value p-value
(Intercept)      10.27215 0.9423749    10.9003       0
x.new            14.99691 0.0093504  1603.8857       0
dummy.var       -12.07076 1.3327194    -9.0572       0
x.new:dummy.var -19.99891 0.0132234 -1512.3868       0

# GLS again, but allowing different residual variance for y1 and y2

gls.mod.2 <- gls(y~x.new*dummy.var, weights=varIdent(form=~1|dummy.var))

summary(gls.mod.2)

Parameter estimates:
      0        1 
1.000000 2.962565 

Coefficients:
                   Value Std.Error   t-value p-value
(Intercept)      10.27215 0.4262268    24.100       0
x.new            14.99691 0.0042291  3546.144       0
dummy.var       -12.07076 1.3327202    -9.057       0
x.new:dummy.var -19.99891 0.0132234 -1512.386       0

# Perform a likelihood ratio test

anova(gls.mod.1, gls.mod.2)

         Model df      AIC      BIC    logLik   Test  L.Ratio p-value
gls.mod.1     1  5 153319.4 153358.9 -76654.69                        
gls.mod.2     2  6 143307.2 143354.6 -71647.61 1 vs 2 10014.15  <.0001

第一個 GLS 模型 ( gls.mod.1) 和正態線性回歸模型 ( lm.mod) 產生完全相同的結果。允許不同殘差標準差 ( gls.mod.2) 的 GLS 模型估計殘差 SD比我們在生成數據時指定y2的殘差 SD 大約大 3 倍。y1回歸係數實際上是相同的,但標準誤差發生了變化。似然比檢驗(和 AIC)表明,具有不同殘差的 GLS 模型 ( gls.mod.2) 比正態模型 (lm.modgls.mod.1) 更適合數據。


方差和相關結構gls

gls您可以在函數和選項中指定多個方差結構weights。請參閱此處獲取列表。有關該選項的相關結構列表,correlation請參見此處

引用自:https://stats.stackexchange.com/questions/59561

comments powered by Disqus