Regression
估計𝑏1𝑥1+𝑏2𝑥2b1X1+b2X2b_1 x_1+b_2 x_2代替𝑏1𝑥1+𝑏2𝑥2+𝑏3𝑥3b1X1+b2X2+b3X3b_1 x_1+b_2 x_2+b_3x_3
我有一個理論經濟模型如下,
所以理論說有,和估計的因素.
現在我有了真實的數據,我需要估計,,. 問題是真實數據集只包含和; 沒有數據. 所以我實際上可以擬合的模型是:
- 可以估計這個模型嗎?
- 我會失去任何估計它的東西嗎?
- 如果我估計,,那麼在哪裡學期去?
- 是否由誤差項解釋?
我們想假設不相關和.
你需要擔心的問題叫做內生性。更具體地說,這取決於是否在人群中與或者. 如果是,則相關聯s 會有偏差。那是因為OLS回歸方法強制殘差,,與你的協變量不相關,s。但是,您的殘差由一些不可約的隨機性組成,,以及未觀察到的(但相關的)變量,, 根據規定與和/或. 另一方面,如果兩者 和不相關在人口中,那麼他們的s 不會因此而產生偏見(當然,它們很可能會受到其他事物的偏見)。計量經濟學家試圖處理這個問題的一種方法是使用工具變量。
為了更清楚起見,我在 R 中編寫了一個快速模擬,演示了是無偏的/以真實值為中心的, 當它與. 但是,在第二次運行中,請注意與, 但不是. 並非巧合,是公正的,但是 是有偏見的。
library(MASS) # you'll need this package below N = 100 # this is how much data we'll use beta0 = -71 # these are the true values of the beta1 = .84 # parameters beta2 = .64 beta3 = .34 ############## uncorrelated version b0VectU = vector(length=10000) # these will store the parameter b1VectU = vector(length=10000) # estimates b2VectU = vector(length=10000) set.seed(7508) # this makes the simulation reproducible for(i in 1:10000){ # we'll do this 10k times x1 = rnorm(N) x2 = rnorm(N) # these variables are uncorrelated x3 = rnorm(N) y = beta0 + beta1*x1 + beta2*x2 + beta3*x3 + rnorm(100) mod = lm(y~x1+x2) # note all 3 variables are relevant # but the model omits x3 b0VectU[i] = coef(mod)[1] # here I'm storing the estimates b1VectU[i] = coef(mod)[2] b2VectU[i] = coef(mod)[3] } mean(b0VectU) # [1] -71.00005 # all 3 of these are centered on the mean(b1VectU) # [1] 0.8399306 # the true values / are unbiased mean(b2VectU) # [1] 0.6398391 # e.g., .64 = .64 ############## correlated version r23 = .7 # this will be the correlation in the b0VectC = vector(length=10000) # population between x2 & x3 b1VectC = vector(length=10000) b2VectC = vector(length=10000) set.seed(2734) for(i in 1:10000){ x1 = rnorm(N) X = mvrnorm(N, mu=c(0,0), Sigma=rbind(c( 1, r23), c(r23, 1))) x2 = X[,1] x3 = X[,2] # x3 is correated w/ x2, but not x1 y = beta0 + beta1*x1 + beta2*x2 + beta3*x3 + rnorm(100) # once again, all 3 variables are relevant mod = lm(y~x1+x2) # but the model omits x3 b0VectC[i] = coef(mod)[1] b1VectC[i] = coef(mod)[2] # we store the estimates again b2VectC[i] = coef(mod)[3] } mean(b0VectC) # [1] -70.99916 # the 1st 2 are unbiased mean(b1VectC) # [1] 0.8409656 # but the sampling dist of x2 is biased mean(b2VectC) # [1] 0.8784184 # .88 not equal to .64