95% t 分佈 CI 沒有獲得 95% 的覆蓋率

July 28, 2021

我正在模擬一組從正態分佈中提取的樣本的 95% 置信區間。既然數據是正常的，那麼我認為，我 95% 的置信度應該轉化為 95% 的覆蓋概率。但是，我得到了大約 94% 的結果。具體來說，我採用 1000 個大小為 n=10 的樣本來製作一堆 CI 並估計覆蓋概率，然後重複1000次以獲得覆蓋概率的 CI。我的覆蓋概率的 5 sigma CI 原來是 ~(0.9384, 0.9408)。是否有一些統計原因，或者我做錯了什麼？

這是我的模擬代碼：

   import numpy as np
   import scipy.stats as stats

   def CI_coverage(alpha, dist, n, n_samples):
       ''' creates n_samples samples of size n
creates an 1-alpha confidence interval for each
computes the fraction of those that contain mu '''
       # get samples
       samples = np.stack([dist.rvs(size=n) for i in range(n_samples)])
       
       # summary stats
       mu = dist.mean()
       xbar = samples.mean(axis=1)
       s = samples.std(axis=1)
       
       # compute CIs... note that xbar, s, CI_low, CI_high are arrays size n_samples
       t = stats.t.ppf(1 - alpha/2, n-1)
       interval_width = t * s / np.sqrt(n)
       CI_low = xbar - interval_width
       CI_high = xbar + interval_width
       
       coverage_p = np.sum(np.logical_and(CI_low < mu, mu < CI_high)) / samples.shape[0]
       return coverage_p

   mu = 1
   sigma = 0.5
   norm_dist = stats.norm(loc=mu, scale=sigma)

   n = 10
   n_samples = 1000
   n_CI_samples = 1000
   # compute the empirical coverage probability many times
   CI_coverages = [CI_coverage(0.05, norm_dist, n, n_samples) for i in range(n_CI_samples)]

   # use this to get a CI for the coverage probabilities
   CI_c_mean = np.mean(CI_coverages)
   CI_c_std = np.std(CI_coverages)

   print(CI_c_mean - 5*CI_c_std / np.sqrt(n_CI_samples), CI_c_mean + 5*CI_c_std / np.sqrt(n_CI_samples))

根據@whuber 的評論，np.std()提供了樣本標準差的有偏估計。幸運的是，該函數允許您通過使用參數指定多個自由度來糾正這一點ddof：
s = samples.std(axis=1, ddof=1)
修復此問題可提供與預期 95% CI 一致的覆蓋概率：(0.9485, 0.9508)

引用自：https://stats.stackexchange.com/questions/536315

comments powered by Disqus

95% t 分佈 CI 沒有獲得 95% 的覆蓋率

相關問答

絕對標準化 t 分佈的期望值

限制噸噸t-分佈為nnn走向無窮大

為什麼我們不使用 t 分佈來構建比例的置信區間？

證明噸n-1,α/2/n–√→0噸n-1,α/2/n→0t_{n-1, alpha/2} / sqrt{n} rightarrow 0作為n→∞n→∞n rightarrow infty

線性回歸的 MLE，student-t 分佈誤差

為什麼 T 分佈用於假設檢驗線性回歸係數？