R

為什麼 cooks.distance() 函數沒有檢測到明顯的異常值?

  • March 30, 2020

我有下一個情節:

在此處輸入圖像描述

我想檢測異常值以刪除它們。我應用下一個代碼來檢測它們並刪除它們:

model <- lm(VeDBA.X16 ~ VeDBA.V13AP, data = data)
cooksD <- cooks.distance(model)
n <- nrow(data)
influential_obs <- as.numeric(names(cooksD)[(cooksD >= (4/n))])
data_removed <- data[-influential_obs, ]

VeDBA.X16之後,我再次繪製~之間的關係VeDBA.V13AP並找出:

在此處輸入圖像描述

我不明白為什麼左上角的點沒有被檢測為有影響的觀察。

有誰知道為什麼?有什麼不對?

我添加數據框以防有人想玩它

data <- read.table(header=TRUE, text="
VeDBA.V13AP VeDBA.X16
1 1.05710000 0.010506809
2 0.05125333 0.007041587
3 0.04183176 0.004206603
4 1.13878500 0.027639652
5 2.64275000 0.013776637
6 0.06406667 0.004456838
7 0.34596000 0.016776946
8 0.26199895 0.008196768
9 0.13749692 0.004797024
10 0.16378783 0.007629357
11 0.15307357 0.007445059
12 0.41879368 0.012262484
13 0.09404071 0.005716708
14 0.61134385 0.016408606
15 0.75038083 0.025541278
16 0.11364870 0.006871933
17 0.14415000 0.005706591
18 0.19220000 0.007479061
19 0.48050000 0.016482735
20 1.55682000 0.047469326
21 1.07774370 0.046832280
22 0.92398370 0.037246095
23 0.81547714 0.035567557
24 0.98637040 0.037141908
25 0.78802000 0.035064956
26 0.88662696 0.036234356
27 0.45459478 0.023970053
28 0.62964720 0.023842654
29 1.44680207 0.037573007
30 0.97925900 0.027740006
31 0.55590154 0.020548773
32 0.54259538 0.021703898
33 0.54056250 0.022075685
34 0.81723440 0.029084513
35 0.86490000 0.031344493
36 0.77456600 0.026346886
37 0.58380750 0.022517729
38 0.15496125 0.009267116
39 0.56561714 0.022334537
40 0.25236696 0.008732695
41 0.62705250 0.022085656
42 0.68281579 0.021273556
43 0.84568000 0.027041421
44 1.52898414 0.035010257
45 1.86250952 0.035567753
46 1.33730737 0.035620930
47 1.01074588 0.043873363
48 0.46937263 0.021688041
49 0.73849154 0.026023571
50 0.51765867 0.018627751
51 0.95878231 0.031115668
52 1.46663385 0.034777541
53 1.54817100 0.039113216
54 0.92773462 0.023049018
55 1.52078250 0.044672864
56 0.76584308 0.022634866
57 1.54144400 0.025643141
58 1.80393429 0.052413459
59 2.24713833 0.069581385
60 2.48899000 0.066594268
61 2.01249417 0.056004661
62 2.50194261 0.062224496
63 1.66445200 0.040947821
64 0.93537333 0.024518350
65 1.40306000 0.037986642
66 0.88228952 0.028022503
67 0.88892500 0.029557877
68 1.15935040 0.030355095
69 0.87643200 0.027532811
70 1.80898640 0.048795144
71 1.52991200 0.047450421
72 1.52914320 0.045071540
73 1.15177630 0.039557676
74 0.89151231 0.028493862
75 0.99245091 0.032069627
76 0.91871600 0.030344117
77 1.08814769 0.030066112
78 1.45431333 0.039615550
79 1.25083760 0.042281118
80 1.03263818 0.028800153
81 1.54094261 0.048479181
82 0.61671130 0.022140652
83 0.78432385 0.023690456
84 1.88561929 0.040069812
85 1.14651478 0.038654436
86 1.09909926 0.032120256
87 1.52799000 0.045198773
88 1.26852000 0.051710038
89 0.04420600 0.004244429
90 0.78252857 0.010350877
91 2.40250000 0.007627478
92 0.81685000 0.025706513
93 1.58565000 0.022370744
94 1.00091846 0.035563172
95 1.08400800 0.028789636
96 0.14415000 0.009583793
97 0.28830000 0.008425635
98 0.30752000 0.006911184
99 1.56730364 0.042121191
100 1.59969538 0.046112747
101 0.88320476 0.027019363
102 1.36862417 0.038251774
103 1.34236526 0.041124148
104 1.02250400 0.020569667
105 0.80724000 0.034182504
106 1.34365273 0.049881018
107 1.60967500 0.049401549
108 1.25955067 0.033999752
109 1.45248286 0.036653933
110 1.64971667 0.046624228
111 1.64715400 0.039119497
112 0.65604267 0.023550171
113 0.69466571 0.025732193
114 1.02862593 0.029998028
115 1.05517800 0.027092325
116 1.56643000 0.047986329
117 2.04430909 0.042543590
118 2.25130267 0.057967486
119 2.16331778 0.056144585
120 1.73172200 0.034704222
121 1.17416727 0.033823668
123 1.23599385 0.019911027
124 1.26240455 0.024325601
125 1.43749583 0.031187043
126 0.81765083 0.017482967
127 1.26160080 0.038708743
128 1.45184923 0.028197584
129 1.62302222 0.038035325
130 1.63517846 0.036308297
131 1.25341857 0.026853402
132 1.21326250 0.024963006
133 1.37926381 0.027407291
134 1.08785200 0.025043999
135 1.85190353 0.049342683
136 1.57315700 0.037928747
137 1.38753615 0.035929334
138 1.37253412 0.070932068
139 1.93374556 0.079448030
140 1.36675556 0.043796092
141 1.83688286 0.046827803
142 2.30365429 0.060091698
143 0.90334000 0.013686740
144 1.40306000 0.062339730
145 0.10763200 0.007276758
146 0.26995364 0.011711775
147 0.51719273 0.041075330
148 0.38440000 0.047655380
150 0.27388500 0.009328511
152 2.24874000 0.026286529
153 0.13454000 0.008932444
154 0.32674000 0.008322436
155 0.17298000 0.012461044
156 1.41074800 0.017904123
157 0.67630375 0.028226427
158 0.93697500 0.032562799
159 1.16789765 0.040773114
160 0.95347913 0.027703995
161 0.77520667 0.018606375
162 0.44526333 0.013713823
163 0.64579200 0.025421117
164 1.14166800 0.015551190
165 0.82774133 0.020122139
166 0.59453867 0.016039311
167 0.93697500 0.023504283
168 1.19035867 0.030955743
169 1.29254500 0.028927788
170 1.12215231 0.026397060
171 0.86329833 0.026113615
172 0.91012353 0.022546978
173 0.81925250 0.023960353
174 0.91935667 0.021455930
175 1.33739167 0.033023607
177 1.50877000 0.020575355
178 0.69992833 0.015537802
179 1.03908125 0.022008474
180 0.50132167 0.013699175
181 1.06840588 0.037788411
182 0.57852200 0.016731543
183 1.28182615 0.037938165
184 1.05406526 0.022118990
185 0.52534667 0.014135345
186 0.77627444 0.020491684
187 0.87771333 0.029328708
188 0.85288750 0.026760620
189 0.27548667 0.015437148
190 0.90718400 0.017377823
191 0.63151429 0.018436981
192 0.95379250 0.016484012
193 1.34426941 0.039659989
194 0.89052667 0.021256031
195 0.57660000 0.013969105
196 0.37959500 0.013734296
197 0.12644737 0.006813713
198 0.38920500 0.014058205
199 1.14118750 0.031947582
200 0.76181091 0.023178526
201 0.36838333 0.013728367
202 0.79675636 0.017578227
203 0.95986941 0.022741177
204 1.10427636 0.029618188
205 1.10087889 0.028094766
206 1.32844118 0.038065181
207 0.75975529 0.026631957
208 1.31056375 0.039171698
209 1.05421700 0.033062699
210 1.11668200 0.035669894
211 1.00136200 0.021974308
212 1.39345000 0.031930357
213 1.24209250 0.033808273
214 1.27591231 0.032722364
215 0.76330857 0.029296236
216 0.99944000 0.025064766
217 1.18377727 0.037457588
218 1.34641158 0.042945194
219 0.88171750 0.027451418
220 0.87400421 0.025875555
221 0.88924533 0.028533462
222 0.92135875 0.031501937
223 1.33491636 0.026283906
224 1.25891000 0.032072474
225 0.87330875 0.030152562
226 0.05200706 0.011501541
227 0.04509308 0.003706931
228 0.05470308 0.003600341
229 0.04345391 0.003583414
230 0.37731895 0.018747462
231 2.23839077 0.077033317
232 1.93968240 0.060249362
233 1.04668917 0.036998997
234 1.95607182 0.064847042
235 1.26935565 0.040054101
236 2.15264000 0.068858710
237 2.16819905 0.073791011
238 2.29563680 0.066262406
239 2.54456087 0.061088788
240 2.21314741 0.050528380
241 2.16402963 0.049389097
242 2.10870857 0.048441677
243 2.07652880 0.047867801
244 1.76824000 0.040602190
245 1.41766720 0.028873391
246 1.33787913 0.028314367
247 1.26691833 0.026998519
248 1.70009636 0.061441406
249 1.59372240 0.035966262
250 0.51355840 0.012283114
251 1.57254545 0.035230538
252 1.14221714 0.030293795
253 0.80163417 0.020106771
254 0.98196727 0.029185467
255 1.06094400 0.025831982
256 1.85566000 0.051023837
257 1.07204889 0.037846351
258 0.82820727 0.027304010
259 0.62290273 0.020949164
260 1.57254545 0.048940718
261 1.72908815 0.051882741
262 0.08008333 0.078916880
263 0.33085857 0.015886360
264 0.92015750 0.015088982
265 0.48656947 0.010595939
266 0.52662800 0.013675643
267 0.18739500 0.008559706
268 0.04522353 0.004203519
269 0.05018556 0.003620940
270 0.31101455 0.004023402
271 0.05087647 0.003122690
272 0.04349789 0.003071765
273 0.09085818 0.005381695
274 0.17734818 0.005037032
275 0.46021222 0.013599517
276 0.11228526 0.012151023
277 0.10318105 0.011409172
278 0.20463647 0.013198637
279 0.19898353 0.018187459
280 0.05539882 0.005360736
281 0.36325800 0.010929157
282 0.04036200 0.006140402
283 0.31713000 0.009615104
284 0.04805000 0.003554966
285 0.11633158 0.009562843
286 0.05598870 0.004136161
287 0.62009789 0.023749987
288 1.11476000 0.042056791
")

這只是一個簡單的編程錯誤。行號與行名不對應。例如,包含異常值的第258 行的行**名稱為262:

> data[258,]
   VeDBA.V13AP  VeDBA.X16
262  0.08008333 0.07891688

在您的代碼中,您將行名轉換為數字並像使用行號 一樣使用這些數字。如果您直接使用行名(即不使用as.numeric()提取行號,一切都會正常工作。所以這些選項中的任何一個都可以工作(我更喜歡第二個):

influential_obs <- names(cooksD)[(cooksD >= (4/n))] # Row names
influential_obs <- which(cooksD >= 4/n)             # Row numbers

以下是您的庫克距離“測試”認為異常值的點:

plot(VeDBA.X16 ~ VeDBA.V13AP, data = data)
points(VeDBA.X16 ~ VeDBA.V13AP, data = data[influential_obs,], col="red", pch=19)

帶有異常值的散點圖。

引用自:https://stats.stackexchange.com/questions/457496

comments powered by Disqus