估計𝑏1𝑥1+𝑏2𝑥2b1X1+b2X2b_1 x_1+b_2 x_2代替𝑏1𝑥1+𝑏2𝑥2+𝑏3𝑥3b1X1+b2X2+b3X3b_1 x_1+b_2 x_2+b_3x_3

May 11, 2013

我有一個理論經濟模型如下，

所以理論說有,和估計的因素.

現在我有了真實的數據，我需要估計,,. 問題是真實數據集只包含和; 沒有數據. 所以我實際上可以擬合的模型是：

可以估計這個模型嗎？

我會失去任何估計它的東西嗎？

如果我估計,，那麼在哪裡學期去？

是否由誤差項解釋?

我們想假設不相關和.

你需要擔心的問題叫做內生性。更具體地說，這取決於是否在人群中與或者. 如果是，則相關聯s 會有偏差。那是因為OLS回歸方法強制殘差，，與你的協變量不相關，s。但是，您的殘差由一些不可約的隨機性組成，，以及未觀察到的（但相關的）變量，, 根據規定與和/或. 另一方面，如果兩者和不相關在人口中，那麼他們的s 不會因此而產生偏見（當然，它們很可能會受到其他事物的偏見）。計量經濟學家試圖處理這個問題的一種方法是使用工具變量。

為了更清楚起見，我在 R 中編寫了一個快速模擬，演示了是無偏的/以真實值為中心的, 當它與. 但是，在第二次運行中，請注意與，但不是. 並非巧合，是公正的，但是是有偏見的。

library(MASS)                          # you'll need this package below
N     = 100                            # this is how much data we'll use
beta0 = -71                            # these are the true values of the
beta1 = .84                            # parameters
beta2 = .64
beta3 = .34

############## uncorrelated version

b0VectU = vector(length=10000)         # these will store the parameter
b1VectU = vector(length=10000)         # estimates
b2VectU = vector(length=10000)
set.seed(7508)                         # this makes the simulation reproducible

for(i in 1:10000){                     # we'll do this 10k times
 x1 = rnorm(N)
 x2 = rnorm(N)                        # these variables are uncorrelated
 x3 = rnorm(N)
 y  = beta0 + beta1*x1 + beta2*x2 + beta3*x3 + rnorm(100)
 mod = lm(y~x1+x2)                    # note all 3 variables are relevant
                                      # but the model omits x3
 b0VectU[i] = coef(mod)[1]            # here I'm storing the estimates
 b1VectU[i] = coef(mod)[2]
 b2VectU[i] = coef(mod)[3]
}
mean(b0VectU)  # [1] -71.00005 # all 3 of these are centered on the
mean(b1VectU)  # [1] 0.8399306 # the true values / are unbiased
mean(b2VectU)  # [1] 0.6398391 # e.g., .64 = .64

############## correlated version

r23 = .7                               # this will be the correlation in the
b0VectC = vector(length=10000)         # population between x2 & x3
b1VectC = vector(length=10000)
b2VectC = vector(length=10000)
set.seed(2734)

for(i in 1:10000){
 x1 = rnorm(N)
 X  = mvrnorm(N, mu=c(0,0), Sigma=rbind(c(  1, r23),
                                        c(r23,   1)))
 x2 = X[,1]
 x3 = X[,2]                           # x3 is correated w/ x2, but not x1
 y  = beta0 + beta1*x1 + beta2*x2 + beta3*x3 + rnorm(100)
                                      # once again, all 3 variables are relevant
 mod = lm(y~x1+x2)                    # but the model omits x3
 b0VectC[i] = coef(mod)[1]
 b1VectC[i] = coef(mod)[2]            # we store the estimates again
 b2VectC[i] = coef(mod)[3]
}
mean(b0VectC)  # [1] -70.99916 # the 1st 2 are unbiased
mean(b1VectC)  # [1] 0.8409656 # but the sampling dist of x2 is biased
mean(b2VectC)  # [1] 0.8784184 # .88 not equal to .64

引用自：https://stats.stackexchange.com/questions/58709

估計𝑏1𝑥1+𝑏2𝑥2b1X1+b2X2b_1 x_1+b_2 x_2代替𝑏1𝑥1+𝑏2𝑥2+𝑏3𝑥3b1X1+b2X2+b3X3b_1 x_1+b_2 x_2+b_3x_3

相關問答

我已經在回歸中使用了我的整個數據集，我不應該將其用作預測模型嗎？

為什麼我們要匹配因果推理與回歸混雜因素？

統計學習要素中的圖 3.6 是否正確？

證明嶺回歸是嚴格凸的

在臨床解釋的最佳截止值處對連續變量進行二分法

多重共線性導致的模型不穩定性究竟是什麼？