R
為什麼
為什麼 cooks.distance()
函數沒有檢測到明顯的異常值?
我有下一個情節:
我想檢測異常值以刪除它們。我應用下一個代碼來檢測它們並刪除它們:
model <- lm(VeDBA.X16 ~ VeDBA.V13AP, data = data) cooksD <- cooks.distance(model) n <- nrow(data) influential_obs <- as.numeric(names(cooksD)[(cooksD >= (4/n))]) data_removed <- data[-influential_obs, ]
VeDBA.X16
之後,我再次繪製~之間的關係VeDBA.V13AP
並找出:我不明白為什麼左上角的點沒有被檢測為有影響的觀察。
有誰知道為什麼?有什麼不對?
我添加數據框以防有人想玩它
data <- read.table(header=TRUE, text=" VeDBA.V13AP VeDBA.X16 1 1.05710000 0.010506809 2 0.05125333 0.007041587 3 0.04183176 0.004206603 4 1.13878500 0.027639652 5 2.64275000 0.013776637 6 0.06406667 0.004456838 7 0.34596000 0.016776946 8 0.26199895 0.008196768 9 0.13749692 0.004797024 10 0.16378783 0.007629357 11 0.15307357 0.007445059 12 0.41879368 0.012262484 13 0.09404071 0.005716708 14 0.61134385 0.016408606 15 0.75038083 0.025541278 16 0.11364870 0.006871933 17 0.14415000 0.005706591 18 0.19220000 0.007479061 19 0.48050000 0.016482735 20 1.55682000 0.047469326 21 1.07774370 0.046832280 22 0.92398370 0.037246095 23 0.81547714 0.035567557 24 0.98637040 0.037141908 25 0.78802000 0.035064956 26 0.88662696 0.036234356 27 0.45459478 0.023970053 28 0.62964720 0.023842654 29 1.44680207 0.037573007 30 0.97925900 0.027740006 31 0.55590154 0.020548773 32 0.54259538 0.021703898 33 0.54056250 0.022075685 34 0.81723440 0.029084513 35 0.86490000 0.031344493 36 0.77456600 0.026346886 37 0.58380750 0.022517729 38 0.15496125 0.009267116 39 0.56561714 0.022334537 40 0.25236696 0.008732695 41 0.62705250 0.022085656 42 0.68281579 0.021273556 43 0.84568000 0.027041421 44 1.52898414 0.035010257 45 1.86250952 0.035567753 46 1.33730737 0.035620930 47 1.01074588 0.043873363 48 0.46937263 0.021688041 49 0.73849154 0.026023571 50 0.51765867 0.018627751 51 0.95878231 0.031115668 52 1.46663385 0.034777541 53 1.54817100 0.039113216 54 0.92773462 0.023049018 55 1.52078250 0.044672864 56 0.76584308 0.022634866 57 1.54144400 0.025643141 58 1.80393429 0.052413459 59 2.24713833 0.069581385 60 2.48899000 0.066594268 61 2.01249417 0.056004661 62 2.50194261 0.062224496 63 1.66445200 0.040947821 64 0.93537333 0.024518350 65 1.40306000 0.037986642 66 0.88228952 0.028022503 67 0.88892500 0.029557877 68 1.15935040 0.030355095 69 0.87643200 0.027532811 70 1.80898640 0.048795144 71 1.52991200 0.047450421 72 1.52914320 0.045071540 73 1.15177630 0.039557676 74 0.89151231 0.028493862 75 0.99245091 0.032069627 76 0.91871600 0.030344117 77 1.08814769 0.030066112 78 1.45431333 0.039615550 79 1.25083760 0.042281118 80 1.03263818 0.028800153 81 1.54094261 0.048479181 82 0.61671130 0.022140652 83 0.78432385 0.023690456 84 1.88561929 0.040069812 85 1.14651478 0.038654436 86 1.09909926 0.032120256 87 1.52799000 0.045198773 88 1.26852000 0.051710038 89 0.04420600 0.004244429 90 0.78252857 0.010350877 91 2.40250000 0.007627478 92 0.81685000 0.025706513 93 1.58565000 0.022370744 94 1.00091846 0.035563172 95 1.08400800 0.028789636 96 0.14415000 0.009583793 97 0.28830000 0.008425635 98 0.30752000 0.006911184 99 1.56730364 0.042121191 100 1.59969538 0.046112747 101 0.88320476 0.027019363 102 1.36862417 0.038251774 103 1.34236526 0.041124148 104 1.02250400 0.020569667 105 0.80724000 0.034182504 106 1.34365273 0.049881018 107 1.60967500 0.049401549 108 1.25955067 0.033999752 109 1.45248286 0.036653933 110 1.64971667 0.046624228 111 1.64715400 0.039119497 112 0.65604267 0.023550171 113 0.69466571 0.025732193 114 1.02862593 0.029998028 115 1.05517800 0.027092325 116 1.56643000 0.047986329 117 2.04430909 0.042543590 118 2.25130267 0.057967486 119 2.16331778 0.056144585 120 1.73172200 0.034704222 121 1.17416727 0.033823668 123 1.23599385 0.019911027 124 1.26240455 0.024325601 125 1.43749583 0.031187043 126 0.81765083 0.017482967 127 1.26160080 0.038708743 128 1.45184923 0.028197584 129 1.62302222 0.038035325 130 1.63517846 0.036308297 131 1.25341857 0.026853402 132 1.21326250 0.024963006 133 1.37926381 0.027407291 134 1.08785200 0.025043999 135 1.85190353 0.049342683 136 1.57315700 0.037928747 137 1.38753615 0.035929334 138 1.37253412 0.070932068 139 1.93374556 0.079448030 140 1.36675556 0.043796092 141 1.83688286 0.046827803 142 2.30365429 0.060091698 143 0.90334000 0.013686740 144 1.40306000 0.062339730 145 0.10763200 0.007276758 146 0.26995364 0.011711775 147 0.51719273 0.041075330 148 0.38440000 0.047655380 150 0.27388500 0.009328511 152 2.24874000 0.026286529 153 0.13454000 0.008932444 154 0.32674000 0.008322436 155 0.17298000 0.012461044 156 1.41074800 0.017904123 157 0.67630375 0.028226427 158 0.93697500 0.032562799 159 1.16789765 0.040773114 160 0.95347913 0.027703995 161 0.77520667 0.018606375 162 0.44526333 0.013713823 163 0.64579200 0.025421117 164 1.14166800 0.015551190 165 0.82774133 0.020122139 166 0.59453867 0.016039311 167 0.93697500 0.023504283 168 1.19035867 0.030955743 169 1.29254500 0.028927788 170 1.12215231 0.026397060 171 0.86329833 0.026113615 172 0.91012353 0.022546978 173 0.81925250 0.023960353 174 0.91935667 0.021455930 175 1.33739167 0.033023607 177 1.50877000 0.020575355 178 0.69992833 0.015537802 179 1.03908125 0.022008474 180 0.50132167 0.013699175 181 1.06840588 0.037788411 182 0.57852200 0.016731543 183 1.28182615 0.037938165 184 1.05406526 0.022118990 185 0.52534667 0.014135345 186 0.77627444 0.020491684 187 0.87771333 0.029328708 188 0.85288750 0.026760620 189 0.27548667 0.015437148 190 0.90718400 0.017377823 191 0.63151429 0.018436981 192 0.95379250 0.016484012 193 1.34426941 0.039659989 194 0.89052667 0.021256031 195 0.57660000 0.013969105 196 0.37959500 0.013734296 197 0.12644737 0.006813713 198 0.38920500 0.014058205 199 1.14118750 0.031947582 200 0.76181091 0.023178526 201 0.36838333 0.013728367 202 0.79675636 0.017578227 203 0.95986941 0.022741177 204 1.10427636 0.029618188 205 1.10087889 0.028094766 206 1.32844118 0.038065181 207 0.75975529 0.026631957 208 1.31056375 0.039171698 209 1.05421700 0.033062699 210 1.11668200 0.035669894 211 1.00136200 0.021974308 212 1.39345000 0.031930357 213 1.24209250 0.033808273 214 1.27591231 0.032722364 215 0.76330857 0.029296236 216 0.99944000 0.025064766 217 1.18377727 0.037457588 218 1.34641158 0.042945194 219 0.88171750 0.027451418 220 0.87400421 0.025875555 221 0.88924533 0.028533462 222 0.92135875 0.031501937 223 1.33491636 0.026283906 224 1.25891000 0.032072474 225 0.87330875 0.030152562 226 0.05200706 0.011501541 227 0.04509308 0.003706931 228 0.05470308 0.003600341 229 0.04345391 0.003583414 230 0.37731895 0.018747462 231 2.23839077 0.077033317 232 1.93968240 0.060249362 233 1.04668917 0.036998997 234 1.95607182 0.064847042 235 1.26935565 0.040054101 236 2.15264000 0.068858710 237 2.16819905 0.073791011 238 2.29563680 0.066262406 239 2.54456087 0.061088788 240 2.21314741 0.050528380 241 2.16402963 0.049389097 242 2.10870857 0.048441677 243 2.07652880 0.047867801 244 1.76824000 0.040602190 245 1.41766720 0.028873391 246 1.33787913 0.028314367 247 1.26691833 0.026998519 248 1.70009636 0.061441406 249 1.59372240 0.035966262 250 0.51355840 0.012283114 251 1.57254545 0.035230538 252 1.14221714 0.030293795 253 0.80163417 0.020106771 254 0.98196727 0.029185467 255 1.06094400 0.025831982 256 1.85566000 0.051023837 257 1.07204889 0.037846351 258 0.82820727 0.027304010 259 0.62290273 0.020949164 260 1.57254545 0.048940718 261 1.72908815 0.051882741 262 0.08008333 0.078916880 263 0.33085857 0.015886360 264 0.92015750 0.015088982 265 0.48656947 0.010595939 266 0.52662800 0.013675643 267 0.18739500 0.008559706 268 0.04522353 0.004203519 269 0.05018556 0.003620940 270 0.31101455 0.004023402 271 0.05087647 0.003122690 272 0.04349789 0.003071765 273 0.09085818 0.005381695 274 0.17734818 0.005037032 275 0.46021222 0.013599517 276 0.11228526 0.012151023 277 0.10318105 0.011409172 278 0.20463647 0.013198637 279 0.19898353 0.018187459 280 0.05539882 0.005360736 281 0.36325800 0.010929157 282 0.04036200 0.006140402 283 0.31713000 0.009615104 284 0.04805000 0.003554966 285 0.11633158 0.009562843 286 0.05598870 0.004136161 287 0.62009789 0.023749987 288 1.11476000 0.042056791 ")
這只是一個簡單的編程錯誤。行號與行名不對應。例如,包含異常值的第258 行的行**名稱為262:
> data[258,] VeDBA.V13AP VeDBA.X16 262 0.08008333 0.07891688
在您的代碼中,您將行名轉換為數字並像使用行號 一樣使用這些數字。如果您直接使用行名(即不使用
as.numeric()
)或提取行號,一切都會正常工作。所以這些選項中的任何一個都可以工作(我更喜歡第二個):influential_obs <- names(cooksD)[(cooksD >= (4/n))] # Row names influential_obs <- which(cooksD >= 4/n) # Row numbers
以下是您的庫克距離“測試”認為異常值的點:
plot(VeDBA.X16 ~ VeDBA.V13AP, data = data) points(VeDBA.X16 ~ VeDBA.V13AP, data = data[influential_obs,], col="red", pch=19)