R

使用模擬進行重要性採樣的覆蓋率低於預期

  • February 8, 2017

我試圖回答Evaluate integral with Importance sampling method in R的問題。基本上,用戶需要計算

使用指數分佈作為重要性分佈

並找到這可以更好地近似積分(它是self-study)。我將問題改寫為對平均值的評估的超過: 積分就是.

因此,讓成為的pdf, 然後讓: 現在的目標是估計

使用重要性抽樣。我在 R 中進行了模擬:

# clear the environment and set the seed for reproducibility
rm(list=ls())
gc()
graphics.off()
set.seed(1)

# function to be integrated
f <- function(x){
   1 / (cos(x)^2+x^2)
}

# importance sampling
importance.sampling <- function(lambda, f, B){
   x <- rexp(B, lambda) 
   f(x) / dexp(x, lambda)*dunif(x, 0, pi)
}

# mean value of f
mu.num <- integrate(f,0,pi)$value/pi

# initialize code
means  <- 0
sigmas <- 0
error  <- 0
CI.min <- 0
CI.max <- 0
CI.covers.parameter <- FALSE

# set a value for lambda: we will repeat importance sampling N times to verify
# coverage
N <- 100
lambda <- rep(20,N)

# set the sample size for importance sampling
B <- 10^4

# - estimate the mean value of f using importance sampling, N times
# - compute a confidence interval for the mean each time
# - CI.covers.parameter is set to TRUE if the estimated confidence 
# interval contains the mean value computed by integrate, otherwise
# is set to FALSE
j <- 0
for(i in lambda){
   I <- importance.sampling(i, f, B)
   j <- j + 1
   mu <- mean(I)
   std <- sd(I)
   lower.CB <- mu - 1.96*std/sqrt(B)  
   upper.CB <- mu + 1.96*std/sqrt(B)  
   means[j] <- mu
   sigmas[j] <- std
   error[j] <- abs(mu-mu.num)
   CI.min[j] <- lower.CB
   CI.max[j] <- upper.CB
   CI.covers.parameter[j] <- lower.CB < mu.num & mu.num < upper.CB
}

# build a dataframe in case you want to have a look at the results for each run
df <- data.frame(lambda, means, sigmas, error, CI.min, CI.max, CI.covers.parameter)

# so, what's the coverage?
mean(CI.covers.parameter)
# [1] 0.19

該代碼基本上是重要性採樣的簡單實現,遵循此處使用的符號。然後重複重要性採樣多次估計,並且每次檢查 95% 區間是否覆蓋實際平均值。

如您所見,對於實際覆蓋率僅為 0.19。並且越來越到值,例如沒有幫助(覆蓋範圍更小,0.15)。為什麼會這樣?

重要性抽樣對重要性分佈的選擇非常敏感。既然你選擇了,您繪製的樣本rexp的平均值為有方差. 這是你得到的分佈

在此處輸入圖像描述

但是,您要評估的積分從 0 到. 所以你想使用一個這給了你這樣的範圍。我用.

在此處輸入圖像描述

使用我將能夠探索0到的完整積分空間, 並且似乎只有幾次平局將被浪費。現在我重新運行你的代碼,只改變.

# clear the environment and set the seed for reproducibility
rm(list=ls())
gc()
graphics.off()
set.seed(1)

# function to be integrated
f <- function(x){
 1 / (cos(x)^2+x^2)
}

# importance sampling
importance.sampling <- function(lambda, f, B){
 x <- rexp(B, lambda) 
 f(x) / dexp(x, lambda)*dunif(x, 0, pi)
}

# mean value of f
mu.num <- integrate(f,0,pi)$value/pi

# initialize code
means  <- 0
sigmas <- 0
error  <- 0
CI.min <- 0
CI.max <- 0
CI.covers.parameter <- FALSE

# set a value for lambda: we will repeat importance sampling N times to verify
# coverage
N <- 100
lambda <- rep(1,N)

# set the sample size for importance sampling
B <- 10^4

# - estimate the mean value of f using importance sampling, N times
# - compute a confidence interval for the mean each time
# - CI.covers.parameter is set to TRUE if the estimated confidence 
# interval contains the mean value computed by integrate, otherwise
# is set to FALSE
j <- 0
for(i in lambda){
 I <- importance.sampling(i, f, B)
 j <- j + 1
 mu <- mean(I)
 std <- sd(I)
 lower.CB <- mu - 1.96*std/sqrt(B)  
 upper.CB <- mu + 1.96*std/sqrt(B)  
 means[j] <- mu
 sigmas[j] <- std
 error[j] <- abs(mu-mu.num)
 CI.min[j] <- lower.CB
 CI.max[j] <- upper.CB
 CI.covers.parameter[j] <- lower.CB < mu.num & mu.num < upper.CB
}

# build a dataframe in case you want to have a look at the results for each run
df <- data.frame(lambda, means, sigmas, error, CI.min, CI.max, CI.covers.parameter)

# so, what's the coverage?
mean(CI.covers.parameter)
#[1] .95

如果你玩,你會看到,如果你把它做得很小(.00001)或很大,覆蓋概率會很差。

編輯 - - - -

關於一旦你離開,覆蓋概率就會降低到,這只是一個隨機事件,基於您使用的事實複製。覆蓋概率的置信區間是,

所以你不能真的說增加 顯著降低覆蓋概率。

事實上,在你的代碼中為同一個種子,改變到,然後與,覆蓋概率為 0.123,並且覆蓋概率為.

現在,0.123 附近的置信區間為

因此,現在有了複製,您會發現覆蓋概率顯著增加。

引用自:https://stats.stackexchange.com/questions/260688

comments powered by Disqus