使用中心變量分層回歸分析的交互項？我們應該以哪些變量為中心？

July 20, 2013

我正在運行分層回歸分析，我有一些小疑問：

我們是否使用中心變量計算交互項？

除了因變量，我們是否必須將數據集中的所有連續變量居中？

當我們必須記錄一些變量時（因為它們的 sd 遠高於它們的平均值），我們是否將剛剛記錄的變量或初始變量居中？

例如：變量“營業額”—> 記錄營業額（因為與平均值相比，sd 太高）—> Centered_Turnover？

或者直接是 Turnover –> Centered_Turnover （我們使用這個）

謝謝！！

您應該將交互中涉及的術語居中以減少共線性，例如

set.seed(10204)
x1 <- rnorm(1000, 10, 1)
x2 <- rnorm(1000, 10, 1)
y <- x1 + rnorm(1000, 5, 5)  + x2*rnorm(1000) + x1*x2*rnorm(1000) 

x1cent <- x1 - mean(x1)
x2cent <- x2 - mean(x2)
x1x2cent <- x1cent*x2cent

m1 <- lm(y ~ x1 + x2 + x1*x2)
m2 <- lm(y ~ x1cent + x2cent + x1cent*x2cent)

summary(m1)
summary(m2)

輸出：

> summary(m1)

Call:
lm(formula = y ~ x1 + x2 + x1 * x2)

Residuals:
   Min      1Q  Median      3Q     Max 
-344.62  -66.29   -1.44   66.05  392.22 

Coefficients:
           Estimate Std. Error t value Pr(>|t|)
(Intercept)  193.333    335.281   0.577    0.564
x1           -15.830     33.719  -0.469    0.639
x2           -14.065     33.567  -0.419    0.675
x1:x2          1.179      3.375   0.349    0.727

Residual standard error: 101.3 on 996 degrees of freedom
Multiple R-squared:  0.002363,  Adjusted R-squared:  -0.0006416 
F-statistic: 0.7865 on 3 and 996 DF,  p-value: 0.5015

> summary(m2)

Call:
lm(formula = y ~ x1cent + x2cent + x1cent * x2cent)

Residuals:
   Min      1Q  Median      3Q     Max 
-344.62  -66.29   -1.44   66.05  392.22 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)     12.513      3.203   3.907 9.99e-05 ***
x1cent          -4.106      3.186  -1.289    0.198    
x2cent          -2.291      3.198  -0.716    0.474    
x1cent:x2cent    1.179      3.375   0.349    0.727    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 101.3 on 996 degrees of freedom
Multiple R-squared:  0.002363,  Adjusted R-squared:  -0.0006416 
F-statistic: 0.7865 on 3 and 996 DF,  p-value: 0.5015


library(perturb)
colldiag(m1)
colldiag(m2)

是否以其他變量為中心取決於您；將不參與交互的變量居中（而不是標準化）將改變截距的含義，但不會改變其他事物，例如

x1 <- rnorm(1000, 10, 1)
x2 <- x1 - mean(x1)
y <- x1 + rnorm(1000, 5, 5) 
m1 <- lm(y ~ x1)
m2 <- lm(y ~ x2)

summary(m1)
summary(m2)

輸出：

> summary(m1)

Call:
lm(formula = y ~ x1)

Residuals:
    Min       1Q   Median       3Q      Max 
-16.5288  -3.3348   0.0946   3.4293  14.0678 

Coefficients:
           Estimate Std. Error t value Pr(>|t|)    
(Intercept)   6.5412     1.6003   4.087 4.71e-05 ***
x1            0.8548     0.1591   5.373 9.63e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.082 on 998 degrees of freedom
Multiple R-squared:  0.02812,   Adjusted R-squared:  0.02714 
F-statistic: 28.87 on 1 and 998 DF,  p-value: 9.629e-08

> summary(m2)

Call:
lm(formula = y ~ x2)

Residuals:
    Min       1Q   Median       3Q      Max 
-16.5288  -3.3348   0.0946   3.4293  14.0678 

Coefficients:
           Estimate Std. Error t value Pr(>|t|)    
(Intercept)  15.0965     0.1607  93.931  < 2e-16 ***
x2            0.8548     0.1591   5.373 9.63e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.082 on 998 degrees of freedom
Multiple R-squared:  0.02812,   Adjusted R-squared:  0.02714 
F-statistic: 28.87 on 1 and 998 DF,  p-value: 9.629e-08

但是你應該記錄變量的日誌，因為這樣做是有意義的，或者因為模型的殘差表明你應該這樣做，而不是因為它們有很多可變性。回歸不對變量的分佈做出假設，而是對殘差的分佈做出假設。

引用自：https://stats.stackexchange.com/questions/64949

使用中心變量分層回歸分析的交互項？我們應該以哪些變量為中心？

相關問答

為什麼多重共線性與相關性不同？

多重共線性會增加每個協變量的 beta 方差還是僅增加那些共線的協變量？

多重共線性導致的模型不穩定性究竟是什麼？

變量和共線性的標準化

多重共線性和預測性能

(multi)collinear/colinear的由來和拼寫