Interaction
使用中心變量分層回歸分析的交互項?我們應該以哪些變量為中心?
我正在運行分層回歸分析,我有一些小疑問:
- 我們是否使用中心變量計算交互項?
- 除了因變量,我們是否必須將數據集中的所有連續變量居中?
- 當我們必須記錄一些變量時(因為它們的 sd 遠高於它們的平均值),我們是否將剛剛記錄的變量或初始變量居中?
例如:變量“營業額”—> 記錄營業額(因為與平均值相比,sd 太高)—> Centered_Turnover?
或者直接是 Turnover –> Centered_Turnover (我們使用這個)
謝謝!!
您應該將交互中涉及的術語居中以減少共線性,例如
set.seed(10204) x1 <- rnorm(1000, 10, 1) x2 <- rnorm(1000, 10, 1) y <- x1 + rnorm(1000, 5, 5) + x2*rnorm(1000) + x1*x2*rnorm(1000) x1cent <- x1 - mean(x1) x2cent <- x2 - mean(x2) x1x2cent <- x1cent*x2cent m1 <- lm(y ~ x1 + x2 + x1*x2) m2 <- lm(y ~ x1cent + x2cent + x1cent*x2cent) summary(m1) summary(m2)
輸出:
> summary(m1) Call: lm(formula = y ~ x1 + x2 + x1 * x2) Residuals: Min 1Q Median 3Q Max -344.62 -66.29 -1.44 66.05 392.22 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 193.333 335.281 0.577 0.564 x1 -15.830 33.719 -0.469 0.639 x2 -14.065 33.567 -0.419 0.675 x1:x2 1.179 3.375 0.349 0.727 Residual standard error: 101.3 on 996 degrees of freedom Multiple R-squared: 0.002363, Adjusted R-squared: -0.0006416 F-statistic: 0.7865 on 3 and 996 DF, p-value: 0.5015 > summary(m2) Call: lm(formula = y ~ x1cent + x2cent + x1cent * x2cent) Residuals: Min 1Q Median 3Q Max -344.62 -66.29 -1.44 66.05 392.22 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 12.513 3.203 3.907 9.99e-05 *** x1cent -4.106 3.186 -1.289 0.198 x2cent -2.291 3.198 -0.716 0.474 x1cent:x2cent 1.179 3.375 0.349 0.727 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 101.3 on 996 degrees of freedom Multiple R-squared: 0.002363, Adjusted R-squared: -0.0006416 F-statistic: 0.7865 on 3 and 996 DF, p-value: 0.5015 library(perturb) colldiag(m1) colldiag(m2)
是否以其他變量為中心取決於您;將不參與交互的變量居中(而不是標準化)將改變截距的含義,但不會改變其他事物,例如
x1 <- rnorm(1000, 10, 1) x2 <- x1 - mean(x1) y <- x1 + rnorm(1000, 5, 5) m1 <- lm(y ~ x1) m2 <- lm(y ~ x2) summary(m1) summary(m2)
輸出:
> summary(m1) Call: lm(formula = y ~ x1) Residuals: Min 1Q Median 3Q Max -16.5288 -3.3348 0.0946 3.4293 14.0678 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.5412 1.6003 4.087 4.71e-05 *** x1 0.8548 0.1591 5.373 9.63e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.082 on 998 degrees of freedom Multiple R-squared: 0.02812, Adjusted R-squared: 0.02714 F-statistic: 28.87 on 1 and 998 DF, p-value: 9.629e-08 > summary(m2) Call: lm(formula = y ~ x2) Residuals: Min 1Q Median 3Q Max -16.5288 -3.3348 0.0946 3.4293 14.0678 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 15.0965 0.1607 93.931 < 2e-16 *** x2 0.8548 0.1591 5.373 9.63e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.082 on 998 degrees of freedom Multiple R-squared: 0.02812, Adjusted R-squared: 0.02714 F-statistic: 28.87 on 1 and 998 DF, p-value: 9.629e-08
但是你應該記錄變量的日誌,因為這樣做是有意義的,或者因為模型的殘差表明你應該這樣做,而不是因為它們有很多可變性。回歸不對變量的分佈做出假設,而是對殘差的分佈做出假設。