

  • December 19, 2019



做一個很簡單 $ t $ -測試這些分佈的均值。但我也想看看這種處理是否對密度分佈的更極端值有影響。例如,如果均值相同但第 85 個百分位數不同,那我會感興趣。

均值的 95% 置信區間大致為 $ \bar{x} \pm 1.95 \sigma_x $ . 但是在 CDF 的每個級別上使用相同的方差感覺並不正確,尤其是當經驗分佈在很大程度上是非正態的時。

您可以使用與 4 個組相對應的一組假人的同時分位數回歸來執行類似的操作。這允許您測試和構建置信區間,比較描述您關心的不同分位數的係數。

這是一個玩具示例,我們不能拒絕在所有 4 個 MPG 組中第 25、50 和 75 個四分位數的汽車價格都相等的聯合零值(p 值為 0.374):

. sysuse auto, clear
(1978 Automobile Data)

. xtile mpg_quartile = mpg, nq(4)

. distplot price, over(mpg_quartile) legend(rows(1)) ylab(.25 .5 .75, angle(0) grid) xlab(#10, grid) ///
> plotregion(fcolor(white) lcolor(white)) graphregion(fcolor(white) lcolor(white))

. sqreg price i.mpg_quart, quantile(.25 .5 .75) reps(500)
(fitting base model)

Bootstrap replications (500)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100
..................................................   150
..................................................   200
..................................................   250
..................................................   300
..................................................   350
..................................................   400
..................................................   450
..................................................   500

Simultaneous quantile regression                    Number of obs =         74
 bootstrap(500) SEs                                .25 Pseudo R2 =     0.0909
                                                   .50 Pseudo R2 =     0.1228
                                                   .75 Pseudo R2 =     0.2639

            |              Bootstrap
      price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
q25          |
mpg_quartile |
         2  |      -1297   528.3106    -2.45   0.017    -2350.682   -243.3178
         3  |      -1192   447.9346    -2.66   0.010    -2085.377   -298.6225
         4  |      -1484   458.6527    -3.24   0.002    -2398.754   -569.2459
      _cons |       5379   414.9198    12.96   0.000     4551.468    6206.532
q50          |
mpg_quartile |
         2  |      -1442   1253.755    -1.15   0.254    -3942.535    1058.535
         3  |      -1086   1414.436    -0.77   0.445    -3907.004    1735.004
         4  |      -1776   1232.862    -1.44   0.154    -4234.867    682.8667
      _cons |       6165   1221.461     5.05   0.000     3728.873    8601.127
q75          |
mpg_quartile |
         2  |      -6213   1591.987    -3.90   0.000    -9388.118   -3037.882
         3  |      -4535   1847.591    -2.45   0.017    -8219.904   -850.0963
         4  |      -6796   1592.095    -4.27   0.000    -9971.334   -3620.666
      _cons |      11385   1556.486     7.31   0.000     8280.686    14489.31

. test ///
> ([q25]2.mpg_quart=[q25]3.mpg_quart=[q25]4.mpg_quart) ///
> ([q50]2.mpg_quart=[q50]3.mpg_quart=[q50]4.mpg_quart) ///
> ([q75]2.mpg_quart=[q75]3.mpg_quart=[q75]4.mpg_quart)

( 1)  [q25]2.mpg_quartile - [q25]3.mpg_quartile = 0
( 2)  [q25]2.mpg_quartile - [q25]4.mpg_quartile = 0
( 3)  [q50]2.mpg_quartile - [q50]3.mpg_quartile = 0
( 4)  [q50]2.mpg_quartile - [q50]4.mpg_quartile = 0
( 5)  [q75]2.mpg_quartile - [q75]3.mpg_quartile = 0
( 6)  [q75]2.mpg_quartile - [q75]4.mpg_quartile = 0

      F(  6,    70) =    1.10
           Prob > F =    0.3740

ECDF 如下所示:


儘管對於圖中的 3 個分位數,第 1 組和第 2-4 組之間似乎存在很大差異。但是,這並不是很多數據,因此由於“微數”,無法通過正式測試拒絕可能並不令人驚訝。

有趣的是,Kruskal-Wallis 檢驗拒絕了 4 個組來自同一群體的假設:

. kwallis price , by(mpg_quartile)

Kruskal-Wallis equality-of-populations rank test

 | mpg_qu~e | Obs | Rank Sum |
 |        1 |  27 |  1397.00 |
 |        2 |  11 |   286.00 |
 |        3 |  22 |   798.00 |
 |        4 |  14 |   294.00 |

chi-squared =    23.297 with 3 d.f.
probability =     0.0001

chi-squared with ties =    23.297 with 3 d.f.
probability =     0.0001


comments powered by Disqus