使用模擬進行重要性採樣的覆蓋率低於預期

February 8, 2017

我試圖回答Evaluate integral with Importance sampling method in R的問題。基本上，用戶需要計算

使用指數分佈作為重要性分佈

並找到這可以更好地近似積分（它是self-study）。我將問題改寫為對平均值的評估的超過: 積分就是.

因此，讓成為的pdf，然後讓: 現在的目標是估計

使用重要性抽樣。我在 R 中進行了模擬：

# clear the environment and set the seed for reproducibility
rm(list=ls())
gc()
graphics.off()
set.seed(1)

# function to be integrated
f <- function(x){
   1 / (cos(x)^2+x^2)
}

# importance sampling
importance.sampling <- function(lambda, f, B){
   x <- rexp(B, lambda) 
   f(x) / dexp(x, lambda)*dunif(x, 0, pi)
}

# mean value of f
mu.num <- integrate(f,0,pi)$value/pi

# initialize code
means  <- 0
sigmas <- 0
error  <- 0
CI.min <- 0
CI.max <- 0
CI.covers.parameter <- FALSE

# set a value for lambda: we will repeat importance sampling N times to verify
# coverage
N <- 100
lambda <- rep(20,N)

# set the sample size for importance sampling
B <- 10^4

# - estimate the mean value of f using importance sampling, N times
# - compute a confidence interval for the mean each time
# - CI.covers.parameter is set to TRUE if the estimated confidence 
# interval contains the mean value computed by integrate, otherwise
# is set to FALSE
j <- 0
for(i in lambda){
   I <- importance.sampling(i, f, B)
   j <- j + 1
   mu <- mean(I)
   std <- sd(I)
   lower.CB <- mu - 1.96*std/sqrt(B)  
   upper.CB <- mu + 1.96*std/sqrt(B)  
   means[j] <- mu
   sigmas[j] <- std
   error[j] <- abs(mu-mu.num)
   CI.min[j] <- lower.CB
   CI.max[j] <- upper.CB
   CI.covers.parameter[j] <- lower.CB < mu.num & mu.num < upper.CB
}

# build a dataframe in case you want to have a look at the results for each run
df <- data.frame(lambda, means, sigmas, error, CI.min, CI.max, CI.covers.parameter)

# so, what's the coverage?
mean(CI.covers.parameter)
# [1] 0.19

該代碼基本上是重要性採樣的簡單實現，遵循此處使用的符號。然後重複重要性採樣多次估計，並且每次檢查 95% 區間是否覆蓋實際平均值。

如您所見，對於實際覆蓋率僅為 0.19。並且越來越到值，例如沒有幫助（覆蓋範圍更小，0.15）。為什麼會這樣？

重要性抽樣對重要性分佈的選擇非常敏感。既然你選擇了，您繪製的樣本rexp的平均值為有方差. 這是你得到的分佈

但是，您要評估的積分從 0 到. 所以你想使用一個這給了你這樣的範圍。我用.

使用我將能夠探索0到的完整積分空間, 並且似乎只有幾次平局將被浪費。現在我重新運行你的代碼，只改變.
# clear the environment and set the seed for reproducibility
rm(list=ls())
gc()
graphics.off()
set.seed(1)

# function to be integrated
f <- function(x){
 1 / (cos(x)^2+x^2)
}

# importance sampling
importance.sampling <- function(lambda, f, B){
 x <- rexp(B, lambda) 
 f(x) / dexp(x, lambda)*dunif(x, 0, pi)
}

# mean value of f
mu.num <- integrate(f,0,pi)$value/pi

# initialize code
means  <- 0
sigmas <- 0
error  <- 0
CI.min <- 0
CI.max <- 0
CI.covers.parameter <- FALSE

# set a value for lambda: we will repeat importance sampling N times to verify
# coverage
N <- 100
lambda <- rep(1,N)

# set the sample size for importance sampling
B <- 10^4

# - estimate the mean value of f using importance sampling, N times
# - compute a confidence interval for the mean each time
# - CI.covers.parameter is set to TRUE if the estimated confidence 
# interval contains the mean value computed by integrate, otherwise
# is set to FALSE
j <- 0
for(i in lambda){
 I <- importance.sampling(i, f, B)
 j <- j + 1
 mu <- mean(I)
 std <- sd(I)
 lower.CB <- mu - 1.96*std/sqrt(B)  
 upper.CB <- mu + 1.96*std/sqrt(B)  
 means[j] <- mu
 sigmas[j] <- std
 error[j] <- abs(mu-mu.num)
 CI.min[j] <- lower.CB
 CI.max[j] <- upper.CB
 CI.covers.parameter[j] <- lower.CB < mu.num & mu.num < upper.CB
}

# build a dataframe in case you want to have a look at the results for each run
df <- data.frame(lambda, means, sigmas, error, CI.min, CI.max, CI.covers.parameter)

# so, what's the coverage?
mean(CI.covers.parameter)
#[1] .95
如果你玩，你會看到，如果你把它做得很小（.00001）或很大，覆蓋概率會很差。

編輯 - - - -

關於一旦你離開，覆蓋概率就會降低到，這只是一個隨機事件，基於您使用的事實複製。覆蓋概率的置信區間是，

所以你不能真的說增加顯著降低覆蓋概率。

事實上，在你的代碼中為同一個種子，改變到，然後與，覆蓋概率為 0.123，並且覆蓋概率為.

現在，0.123 附近的置信區間為

因此，現在有了複製，您會發現覆蓋概率顯著增加。

引用自：https://stats.stackexchange.com/questions/260688

使用模擬進行重要性採樣的覆蓋率低於預期

相關問答

逆變換方法，理論圖與樣本不匹配

指數分佈的隨機變量的指數分佈？

估計 Euler-Mascheroni 常數 (γγgamma) 通過蒙特卡洛模擬

允許計算黃金比例的統計實驗有哪些例子？

在給定的最小值和最大值下生成服從指數分佈的隨機樣本

從指數分佈的矩生成函數計算方差