Regression

估計𝑏1𝑥1+𝑏2𝑥2b1X1+b2X2b_1 x_1+b_2 x_2代替𝑏1𝑥1+𝑏2𝑥2+𝑏3𝑥3b1X1+b2X2+b3X3b_1 x_1+b_2 x_2+b_3x_3

  • May 11, 2013

我有一個理論經濟模型如下,

所以理論說有,和估計的因素.

現在我有了真實的數據,我需要估計,,. 問題是真實數據集只包含和; 沒有數據. 所以我實際上可以擬合的模型是:

  • 可以估計這個模型嗎?
  • 我會失去任何估計它的東西嗎?
  • 如果我估計,,那麼在哪裡學期去?
  • 是否由誤差項解釋?

我們想假設不相關和.

你需要擔心的問題叫做內生性。更具體地說,這取決於是否在人群中與或者. 如果是,則相關聯s 會有偏差。那是因為OLS回歸方法強制殘差,,與你的協變量不相關,s。但是,您的殘差由一些不可約的隨機性組成,,以及未觀察到的(但相關的)變量,, 根據規定和/或. 另一方面,如果兩者 和不相關在人口中,那麼他們的s 不會因此而產生偏見(當然,它們很可能會受到其他事物的偏見)。計量經濟學家試圖處理這個問題的一種方法是使用工具變量

為了更清楚起見,我在 R 中編寫了一個快速模擬,演示了是無偏的/以真實值為中心的, 當它與. 但是,在第二次運行中,請注意與, 但不是. 並非巧合,是公正的,但是 有偏見的。

library(MASS)                          # you'll need this package below
N     = 100                            # this is how much data we'll use
beta0 = -71                            # these are the true values of the
beta1 = .84                            # parameters
beta2 = .64
beta3 = .34

############## uncorrelated version

b0VectU = vector(length=10000)         # these will store the parameter
b1VectU = vector(length=10000)         # estimates
b2VectU = vector(length=10000)
set.seed(7508)                         # this makes the simulation reproducible

for(i in 1:10000){                     # we'll do this 10k times
 x1 = rnorm(N)
 x2 = rnorm(N)                        # these variables are uncorrelated
 x3 = rnorm(N)
 y  = beta0 + beta1*x1 + beta2*x2 + beta3*x3 + rnorm(100)
 mod = lm(y~x1+x2)                    # note all 3 variables are relevant
                                      # but the model omits x3
 b0VectU[i] = coef(mod)[1]            # here I'm storing the estimates
 b1VectU[i] = coef(mod)[2]
 b2VectU[i] = coef(mod)[3]
}
mean(b0VectU)  # [1] -71.00005 # all 3 of these are centered on the
mean(b1VectU)  # [1] 0.8399306 # the true values / are unbiased
mean(b2VectU)  # [1] 0.6398391 # e.g., .64 = .64

############## correlated version

r23 = .7                               # this will be the correlation in the
b0VectC = vector(length=10000)         # population between x2 & x3
b1VectC = vector(length=10000)
b2VectC = vector(length=10000)
set.seed(2734)

for(i in 1:10000){
 x1 = rnorm(N)
 X  = mvrnorm(N, mu=c(0,0), Sigma=rbind(c(  1, r23),
                                        c(r23,   1)))
 x2 = X[,1]
 x3 = X[,2]                           # x3 is correated w/ x2, but not x1
 y  = beta0 + beta1*x1 + beta2*x2 + beta3*x3 + rnorm(100)
                                      # once again, all 3 variables are relevant
 mod = lm(y~x1+x2)                    # but the model omits x3
 b0VectC[i] = coef(mod)[1]
 b1VectC[i] = coef(mod)[2]            # we store the estimates again
 b2VectC[i] = coef(mod)[3]
}
mean(b0VectC)  # [1] -70.99916 # the 1st 2 are unbiased
mean(b1VectC)  # [1] 0.8409656 # but the sampling dist of x2 is biased
mean(b2VectC)  # [1] 0.8784184 # .88 not equal to .64

引用自:https://stats.stackexchange.com/questions/58709

comments powered by Disqus