R

R中時間序列的ACF的“手動”計算 - 接近但不完全

  • January 2, 2017

我試圖了解在時間序列中計算 ACF 值背後的“機制”。作為“自我驗證練習” [注意:我更新了此鏈接中的代碼以反映已接受答案中的信息],我不關注代碼的優雅,我故意使用循環。問題是,雖然我得到的值與 生成的值接近acf(),但它們並不相等,而且在某些時候,並不是那麼接近。有什麼概念上的誤解嗎?

正如上面鏈接的在線註釋中所記,數據是根據這個在線示例生成的。

set.seed(0)                          # To reproduce results
x = seq(pi, 10 * pi, 0.1)            # Creating time-series as sin function.
y = 0.1 * x + sin(x) + rnorm(x)      # Time-series sinusoidal points with noise.
y = ts(y, start = 1800)              # Labeling time axis.
model = lm(y ~ I(1801:2083))         # Detrending (0.1 * x)
st.y = y - predict(model)            # Final de-trended ts (st.y)
ACF = 0                              # Empty vector to capture the auto-correlations.
ACF[1] = cor(st.y, st.y)             # The first entry in the ACF is the cor with itself.
for(i in 1:24){                      # Took 24 points to parallel the output of `acf()`
 lag = st.y[-c(1:i)]                # Introducing lags in the stationary ts.
 clipped.y = st.y[1:length(lag)]    # Compensating by reducing length of ts.
 ACF[i + 1] = cor(clipped.y, lag)   # Storing each correlation.
}
w = acf(st.y)                        # Extracting values of acf without plotting.
all.equal(ACF, as.vector(w$acf))     # Checking for equality in values.
# Pretty close to manual calc: Mean relative difference: 0.03611463"

為了對相對輸出有一個切實的了解,這是手動計算的自相關的樣子:

1.0000000 0.3195564 0.3345448 0.2877745 0.2783382 0.2949996 ... 
... -0.1262182 -0.1795683 -0.1941921 -0.1352814 -0.2153883 -0.3423663

與 Racf()函數相反:

1.0000000 0.3187104 0.3329545 0.2857004 0.2745302 0.2907426 ...
... -0.1144625 -0.1621018 -0.1737770 -0.1203673 -0.1924761 -0.3069342

正如所建議的那樣,這很可能與以下行中的調用中的代碼相比得到了解釋acf()

acf <- .Call(C_acf, x, lag.max, type == "correlation")

如何C_acf訪問此功能?


我與 PACF 有類似的問題,我認為這是相關的。PACF 值的代碼在這裡。[注意:在這種情況下,我懷疑它實際上是一個四捨五入的差異]。

顯然coracf沒有使用相同的除數。acf使用觀察次數作為除數,可以如下所示重現。具體可以C_acf在文件中找到代碼R-3.3.2/src/library/stats/src/filter.c(procedures acfand acf0):

set.seed(0)                          # To reproduce results
x = seq(pi, 10 * pi, 0.1)            # Creating time-series as sin function.
y = 0.1 * x + sin(x) + rnorm(x)      # Time-series sinusoidal points with noise.
y = ts(y, start = 1800)              # Labeling time axis.
model = lm(y ~ I(1801:2083))         # Detrending (0.1 * x)
st.y = y - predict(model)  
lag.max <- 24
ydm <- st.y - mean(st.y)
n <- length(ydm)
ACF <- rep(NA, lag.max+1)
for (tau in seq(0, lag.max))
 ACF[tau+1] <- sum(ydm * lag(ydm, -tau)) / n
ACF <- ACF/ACF[1]
names(ACF) <- paste0("lag", seq.int(0, lag.max))
ACF
head(ACF)
# lag0 lag1 lag2 lag3 lag4 lag5 
# 1.0000000 0.3187104 0.3329545 0.2857004 0.2745302 0.2907426 
tail(ACF)
# lag19 lag20 lag21 lag22 lag23 lag24 
# -0.1144625 -0.1621018 -0.1737770 -0.1203673 -0.1924761 -0.3069342 
all.equal(ACF, acf(st.y, lag.max=lag.max, plot=FALSE)$acf[,,1], check.names=FALSE)
# [1] TRUE

引用自:https://stats.stackexchange.com/questions/254227

comments powered by Disqus