library(tidyverse)15 Simulating 95% confidence intervals
16 Load Libraries
17 Simulating 95% frequentist confidence intervals
A good explanation here:
A 95% confidence interval is constructed such that if the model assumptions are correct and if you were to hypothetically repeat the experiment or sampling many many times, 95% of the intervals constructed would contain the true value of the parameter.
My own words: The 95% confidence interval is when the true parameter is contained within the interval 95% of the time from constructing the 95% confidence interval from repeated experiments under the assumption of a correct model.
Let’s gain intuition by what this means:
- Simulate data and do it a bunch of times
- Then calculate 95% confidence interval with say a t-test
- Determine how many times the true parameter (which we set in step 1) is in between the confidence intervals
#1) simulate data
sim<-10000
#dataset size
n<-100
# sampel data with mean 10, sd =1
x<-rnorm(n,mean=10,sd=1)
#fit t.test ; grab lower and upper confidence interval
#as.vector(c(t.test(x)$conf.int,t.test(x)$estimate))
## now simulate across sim
#for loop is prob best
#prep dataset
#d<-tibble(lower=rep(0,sim),upper=rep(0,sim),mean=rep(0,sim))
d<-array(0,dim=c(sim,2))
for (i in 1:sim){
x<-rnorm(n,mean=10,sd=1)
d[i,]<-as.vector(c(t.test(x)$conf.int))
}
#head(d)
d<-data.frame(d)
names(d)<-c("lower","upper")
knitr::kable(head(d))| lower | upper |
|---|---|
| 9.860686 | 10.22568 |
| 9.782582 | 10.20433 |
| 9.824811 | 10.21975 |
| 9.907613 | 10.36693 |
| 9.774117 | 10.17793 |
| 9.747729 | 10.14650 |
#count how many times the lower and upper confidence interval is below true value of 10
d<-d|>
mutate(out=1*(lower<10 & upper>10))
mean(d$out)[1] 0.946
#cases where confidence interval is does not include true parameter
d|>
filter(out==0)|>
head() lower upper out
1 9.628113 9.997948 0
2 10.022016 10.348628 0
3 9.629094 9.989865 0
4 9.516649 9.916219 0
5 9.579483 9.978175 0
6 9.619746 9.995147 0
Additional notes:
Cementing interpretation: When you have a single 95% CI on a single sample, it doesn’t mean, that the population mean belongs to this particular interval with a particular probability. If you were to repeat the experiment many many times and calculate this interval on each fo the samples, then 95% of the repeated samples would have the true population mean.
18 Session info
sessionInfo()R version 4.5.0 (2025-04-11 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)
Matrix products: default
LAPACK version 3.12.1
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[5] purrr_1.0.4 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[9] ggplot2_3.5.2 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.5.0 tidyselect_1.2.1
[5] scales_1.3.0 yaml_2.3.10 fastmap_1.2.0 R6_2.6.1
[9] generics_0.1.3 knitr_1.50 htmlwidgets_1.6.4 munsell_0.5.1
[13] pillar_1.10.2 tzdb_0.5.0 rlang_1.1.6 stringi_1.8.7
[17] xfun_0.52 timechange_0.3.0 cli_3.6.4 withr_3.0.2
[21] magrittr_2.0.3 digest_0.6.37 grid_4.5.0 hms_1.1.3
[25] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.3 glue_1.8.0
[29] colorspace_2.1-1 rmarkdown_2.29 tools_4.5.0 pkgconfig_2.0.3
[33] htmltools_0.5.8.1