R
使用模擬進行重要性採樣的覆蓋率低於預期
我試圖回答Evaluate integral with Importance sampling method in R的問題。基本上,用戶需要計算
使用指數分佈作為重要性分佈
並找到這可以更好地近似積分(它是
self-study
)。我將問題改寫為對平均值的評估的超過: 積分就是.因此,讓成為的pdf, 然後讓: 現在的目標是估計
使用重要性抽樣。我在 R 中進行了模擬:
# clear the environment and set the seed for reproducibility rm(list=ls()) gc() graphics.off() set.seed(1) # function to be integrated f <- function(x){ 1 / (cos(x)^2+x^2) } # importance sampling importance.sampling <- function(lambda, f, B){ x <- rexp(B, lambda) f(x) / dexp(x, lambda)*dunif(x, 0, pi) } # mean value of f mu.num <- integrate(f,0,pi)$value/pi # initialize code means <- 0 sigmas <- 0 error <- 0 CI.min <- 0 CI.max <- 0 CI.covers.parameter <- FALSE # set a value for lambda: we will repeat importance sampling N times to verify # coverage N <- 100 lambda <- rep(20,N) # set the sample size for importance sampling B <- 10^4 # - estimate the mean value of f using importance sampling, N times # - compute a confidence interval for the mean each time # - CI.covers.parameter is set to TRUE if the estimated confidence # interval contains the mean value computed by integrate, otherwise # is set to FALSE j <- 0 for(i in lambda){ I <- importance.sampling(i, f, B) j <- j + 1 mu <- mean(I) std <- sd(I) lower.CB <- mu - 1.96*std/sqrt(B) upper.CB <- mu + 1.96*std/sqrt(B) means[j] <- mu sigmas[j] <- std error[j] <- abs(mu-mu.num) CI.min[j] <- lower.CB CI.max[j] <- upper.CB CI.covers.parameter[j] <- lower.CB < mu.num & mu.num < upper.CB } # build a dataframe in case you want to have a look at the results for each run df <- data.frame(lambda, means, sigmas, error, CI.min, CI.max, CI.covers.parameter) # so, what's the coverage? mean(CI.covers.parameter) # [1] 0.19
該代碼基本上是重要性採樣的簡單實現,遵循此處使用的符號。然後重複重要性採樣多次估計,並且每次檢查 95% 區間是否覆蓋實際平均值。
如您所見,對於實際覆蓋率僅為 0.19。並且越來越到值,例如沒有幫助(覆蓋範圍更小,0.15)。為什麼會這樣?
重要性抽樣對重要性分佈的選擇非常敏感。既然你選擇了,您繪製的樣本
rexp
的平均值為有方差. 這是你得到的分佈但是,您要評估的積分從 0 到. 所以你想使用一個這給了你這樣的範圍。我用.
使用我將能夠探索0到的完整積分空間, 並且似乎只有幾次平局將被浪費。現在我重新運行你的代碼,只改變.
# clear the environment and set the seed for reproducibility rm(list=ls()) gc() graphics.off() set.seed(1) # function to be integrated f <- function(x){ 1 / (cos(x)^2+x^2) } # importance sampling importance.sampling <- function(lambda, f, B){ x <- rexp(B, lambda) f(x) / dexp(x, lambda)*dunif(x, 0, pi) } # mean value of f mu.num <- integrate(f,0,pi)$value/pi # initialize code means <- 0 sigmas <- 0 error <- 0 CI.min <- 0 CI.max <- 0 CI.covers.parameter <- FALSE # set a value for lambda: we will repeat importance sampling N times to verify # coverage N <- 100 lambda <- rep(1,N) # set the sample size for importance sampling B <- 10^4 # - estimate the mean value of f using importance sampling, N times # - compute a confidence interval for the mean each time # - CI.covers.parameter is set to TRUE if the estimated confidence # interval contains the mean value computed by integrate, otherwise # is set to FALSE j <- 0 for(i in lambda){ I <- importance.sampling(i, f, B) j <- j + 1 mu <- mean(I) std <- sd(I) lower.CB <- mu - 1.96*std/sqrt(B) upper.CB <- mu + 1.96*std/sqrt(B) means[j] <- mu sigmas[j] <- std error[j] <- abs(mu-mu.num) CI.min[j] <- lower.CB CI.max[j] <- upper.CB CI.covers.parameter[j] <- lower.CB < mu.num & mu.num < upper.CB } # build a dataframe in case you want to have a look at the results for each run df <- data.frame(lambda, means, sigmas, error, CI.min, CI.max, CI.covers.parameter) # so, what's the coverage? mean(CI.covers.parameter) #[1] .95
如果你玩,你會看到,如果你把它做得很小(.00001)或很大,覆蓋概率會很差。
編輯 - - - -
關於一旦你離開,覆蓋概率就會降低到,這只是一個隨機事件,基於您使用的事實複製。覆蓋概率的置信區間是,
所以你不能真的說增加 顯著降低覆蓋概率。
事實上,在你的代碼中為同一個種子,改變到,然後與,覆蓋概率為 0.123,並且覆蓋概率為.
現在,0.123 附近的置信區間為
因此,現在有了複製,您會發現覆蓋概率顯著增加。