15 Simulating 95% confidence intervals

16 Load Libraries

library(tidyverse)

17 Simulating 95% frequentist confidence intervals

A good explanation here:

A 95% confidence interval is constructed such that if the model assumptions are correct and if you were to hypothetically repeat the experiment or sampling many many times, 95% of the intervals constructed would contain the true value of the parameter.

My own words: The 95% confidence interval is when the true parameter is contained within the interval 95% of the time from constructing the 95% confidence interval from repeated experiments under the assumption of a correct model.

Let’s gain intuition by what this means:

Simulate data and do it a bunch of times
Then calculate 95% confidence interval with say a t-test
Determine how many times the true parameter (which we set in step 1) is in between the confidence intervals

#1) simulate data 
sim<-10000
#dataset size
n<-100
# sampel data with mean 10, sd =1 
x<-rnorm(n,mean=10,sd=1)
#fit t.test ; grab lower and upper confidence interval
#as.vector(c(t.test(x)$conf.int,t.test(x)$estimate))


## now simulate across sim 

#for loop is prob best 
#prep dataset
#d<-tibble(lower=rep(0,sim),upper=rep(0,sim),mean=rep(0,sim))
d<-array(0,dim=c(sim,2))

for (i in 1:sim){
  x<-rnorm(n,mean=10,sd=1)
  d[i,]<-as.vector(c(t.test(x)$conf.int))
}


#head(d)
d<-data.frame(d)
names(d)<-c("lower","upper")
knitr::kable(head(d))

lower	upper
9.860686	10.22568
9.782582	10.20433
9.824811	10.21975
9.907613	10.36693
9.774117	10.17793
9.747729	10.14650

#count how many times the lower and upper confidence interval is below true value of 10
d<-d|>
  mutate(out=1*(lower<10 & upper>10))
mean(d$out)

[1] 0.946

#cases where confidence interval is does not include true parameter 
d|>
  filter(out==0)|>
  head()

      lower     upper out
1  9.628113  9.997948   0
2 10.022016 10.348628   0
3  9.629094  9.989865   0
4  9.516649  9.916219   0
5  9.579483  9.978175   0
6  9.619746  9.995147   0

Additional notes:

Cementing interpretation: When you have a single 95% CI on a single sample, it doesn’t mean, that the population mean belongs to this particular interval with a particular probability. If you were to repeat the experiment many many times and calculate this interval on each fo the samples, then 95% of the repeated samples would have the true population mean.

18 Session info

sessionInfo()

R version 4.5.0 (2025-04-11 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.4 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4    
 [5] purrr_1.0.4     readr_2.1.5     tidyr_1.3.1     tibble_3.2.1   
 [9] ggplot2_3.5.2   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.6      jsonlite_2.0.0    compiler_4.5.0    tidyselect_1.2.1 
 [5] scales_1.3.0      yaml_2.3.10       fastmap_1.2.0     R6_2.6.1         
 [9] generics_0.1.3    knitr_1.50        htmlwidgets_1.6.4 munsell_0.5.1    
[13] pillar_1.10.2     tzdb_0.5.0        rlang_1.1.6       stringi_1.8.7    
[17] xfun_0.52         timechange_0.3.0  cli_3.6.4         withr_3.0.2      
[21] magrittr_2.0.3    digest_0.6.37     grid_4.5.0        hms_1.1.3        
[25] lifecycle_1.0.4   vctrs_0.6.5       evaluate_1.0.3    glue_1.8.0       
[29] colorspace_2.1-1  rmarkdown_2.29    tools_4.5.0       pkgconfig_2.0.3  
[33] htmltools_0.5.8.1