R
R中時間序列的ACF的“手動”計算 - 接近但不完全
我試圖了解在時間序列中計算 ACF 值背後的“機制”。作為“自我驗證練習” [注意:我更新了此鏈接中的代碼以反映已接受答案中的信息],我不關注代碼的優雅,我故意使用循環。問題是,雖然我得到的值與 生成的值接近
acf()
,但它們並不相等,而且在某些時候,並不是那麼接近。有什麼概念上的誤解嗎?正如上面鏈接的在線註釋中所記,數據是根據這個在線示例生成的。
set.seed(0) # To reproduce results x = seq(pi, 10 * pi, 0.1) # Creating time-series as sin function. y = 0.1 * x + sin(x) + rnorm(x) # Time-series sinusoidal points with noise. y = ts(y, start = 1800) # Labeling time axis. model = lm(y ~ I(1801:2083)) # Detrending (0.1 * x) st.y = y - predict(model) # Final de-trended ts (st.y) ACF = 0 # Empty vector to capture the auto-correlations. ACF[1] = cor(st.y, st.y) # The first entry in the ACF is the cor with itself. for(i in 1:24){ # Took 24 points to parallel the output of `acf()` lag = st.y[-c(1:i)] # Introducing lags in the stationary ts. clipped.y = st.y[1:length(lag)] # Compensating by reducing length of ts. ACF[i + 1] = cor(clipped.y, lag) # Storing each correlation. } w = acf(st.y) # Extracting values of acf without plotting. all.equal(ACF, as.vector(w$acf)) # Checking for equality in values. # Pretty close to manual calc: Mean relative difference: 0.03611463"
為了對相對輸出有一個切實的了解,這是手動計算的自相關的樣子:
1.0000000 0.3195564 0.3345448 0.2877745 0.2783382 0.2949996 ... ... -0.1262182 -0.1795683 -0.1941921 -0.1352814 -0.2153883 -0.3423663
與 R
acf()
函數相反:1.0000000 0.3187104 0.3329545 0.2857004 0.2745302 0.2907426 ... ... -0.1144625 -0.1621018 -0.1737770 -0.1203673 -0.1924761 -0.3069342
正如所建議的那樣,這很可能與以下行中的調用中的代碼相比得到了解釋
acf()
:
acf <- .Call(C_acf, x, lag.max, type == "correlation")
如何
C_acf
訪問此功能?
我與 PACF 有類似的問題,我認為這是相關的。PACF 值的代碼在這裡。[注意:在這種情況下,我懷疑它實際上是一個四捨五入的差異]。
顯然
cor
並acf
沒有使用相同的除數。acf
使用觀察次數作為除數,可以如下所示重現。具體可以C_acf
在文件中找到代碼R-3.3.2/src/library/stats/src/filter.c
(proceduresacf
andacf0
):set.seed(0) # To reproduce results x = seq(pi, 10 * pi, 0.1) # Creating time-series as sin function. y = 0.1 * x + sin(x) + rnorm(x) # Time-series sinusoidal points with noise. y = ts(y, start = 1800) # Labeling time axis. model = lm(y ~ I(1801:2083)) # Detrending (0.1 * x) st.y = y - predict(model) lag.max <- 24 ydm <- st.y - mean(st.y) n <- length(ydm) ACF <- rep(NA, lag.max+1) for (tau in seq(0, lag.max)) ACF[tau+1] <- sum(ydm * lag(ydm, -tau)) / n ACF <- ACF/ACF[1] names(ACF) <- paste0("lag", seq.int(0, lag.max)) ACF head(ACF) # lag0 lag1 lag2 lag3 lag4 lag5 # 1.0000000 0.3187104 0.3329545 0.2857004 0.2745302 0.2907426 tail(ACF) # lag19 lag20 lag21 lag22 lag23 lag24 # -0.1144625 -0.1621018 -0.1737770 -0.1203673 -0.1924761 -0.3069342 all.equal(ACF, acf(st.y, lag.max=lag.max, plot=FALSE)$acf[,,1], check.names=FALSE) # [1] TRUE