T-Test
CDF的置信區間
我正在嘗試確定下圖中顯示的累積概率密度曲線之間是否存在統計學意義的區別。
做一個很簡單 $ t $ -測試這些分佈的均值。但我也想看看這種處理是否對密度分佈的更極端值有影響。例如,如果均值相同但第 85 個百分位數不同,那我會感興趣。
均值的 95% 置信區間大致為 $ \bar{x} \pm 1.95 \sigma_x $ . 但是在 CDF 的每個級別上使用相同的方差感覺並不正確,尤其是當經驗分佈在很大程度上是非正態的時。
您可以使用與 4 個組相對應的一組假人的同時分位數回歸來執行類似的操作。這允許您測試和構建置信區間,比較描述您關心的不同分位數的係數。
這是一個玩具示例,我們不能拒絕在所有 4 個 MPG 組中第 25、50 和 75 個四分位數的汽車價格都相等的聯合零值(p 值為 0.374):
. sysuse auto, clear (1978 Automobile Data) . xtile mpg_quartile = mpg, nq(4) . distplot price, over(mpg_quartile) legend(rows(1)) ylab(.25 .5 .75, angle(0) grid) xlab(#10, grid) /// > plotregion(fcolor(white) lcolor(white)) graphregion(fcolor(white) lcolor(white)) . . sqreg price i.mpg_quart, quantile(.25 .5 .75) reps(500) (fitting base model) Bootstrap replications (500) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200 .................................................. 250 .................................................. 300 .................................................. 350 .................................................. 400 .................................................. 450 .................................................. 500 Simultaneous quantile regression Number of obs = 74 bootstrap(500) SEs .25 Pseudo R2 = 0.0909 .50 Pseudo R2 = 0.1228 .75 Pseudo R2 = 0.2639 ------------------------------------------------------------------------------ | Bootstrap price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- q25 | mpg_quartile | 2 | -1297 528.3106 -2.45 0.017 -2350.682 -243.3178 3 | -1192 447.9346 -2.66 0.010 -2085.377 -298.6225 4 | -1484 458.6527 -3.24 0.002 -2398.754 -569.2459 | _cons | 5379 414.9198 12.96 0.000 4551.468 6206.532 -------------+---------------------------------------------------------------- q50 | mpg_quartile | 2 | -1442 1253.755 -1.15 0.254 -3942.535 1058.535 3 | -1086 1414.436 -0.77 0.445 -3907.004 1735.004 4 | -1776 1232.862 -1.44 0.154 -4234.867 682.8667 | _cons | 6165 1221.461 5.05 0.000 3728.873 8601.127 -------------+---------------------------------------------------------------- q75 | mpg_quartile | 2 | -6213 1591.987 -3.90 0.000 -9388.118 -3037.882 3 | -4535 1847.591 -2.45 0.017 -8219.904 -850.0963 4 | -6796 1592.095 -4.27 0.000 -9971.334 -3620.666 | _cons | 11385 1556.486 7.31 0.000 8280.686 14489.31 ------------------------------------------------------------------------------ . test /// > ([q25]2.mpg_quart=[q25]3.mpg_quart=[q25]4.mpg_quart) /// > ([q50]2.mpg_quart=[q50]3.mpg_quart=[q50]4.mpg_quart) /// > ([q75]2.mpg_quart=[q75]3.mpg_quart=[q75]4.mpg_quart) ( 1) [q25]2.mpg_quartile - [q25]3.mpg_quartile = 0 ( 2) [q25]2.mpg_quartile - [q25]4.mpg_quartile = 0 ( 3) [q50]2.mpg_quartile - [q50]3.mpg_quartile = 0 ( 4) [q50]2.mpg_quartile - [q50]4.mpg_quartile = 0 ( 5) [q75]2.mpg_quartile - [q75]3.mpg_quartile = 0 ( 6) [q75]2.mpg_quartile - [q75]4.mpg_quartile = 0 F( 6, 70) = 1.10 Prob > F = 0.3740
ECDF 如下所示:
儘管對於圖中的 3 個分位數,第 1 組和第 2-4 組之間似乎存在很大差異。但是,這並不是很多數據,因此由於“微數”,無法通過正式測試拒絕可能並不令人驚訝。
有趣的是,Kruskal-Wallis 檢驗拒絕了 4 個組來自同一群體的假設:
. kwallis price , by(mpg_quartile) Kruskal-Wallis equality-of-populations rank test +---------------------------+ | mpg_qu~e | Obs | Rank Sum | |----------+-----+----------| | 1 | 27 | 1397.00 | | 2 | 11 | 286.00 | | 3 | 22 | 798.00 | | 4 | 14 | 294.00 | +---------------------------+ chi-squared = 23.297 with 3 d.f. probability = 0.0001 chi-squared with ties = 23.297 with 3 d.f. probability = 0.0001