── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(marginaleffects)library(gtsummary)
15 Demonstrating non-collapsibility of odds ratios
Non-collapsibility refers to the fact that the conditional and marginal OR do not match. The conditional and marginal OR estimate agree in identity link regression model, so it is collapsible.
Simulate data first: outcome, treatment, and one continuous covariate
########----##n <-5000# 1) Simulate covariates (examples)x1 <-rnorm(n, mean =0, sd =1) # continuoustreatment <-rbinom(n, size =1, prob =0.5) # binary indicator, # 2) Specify true effects (log-odds scale). Include factor via contrasts.# For factors, use model.matrix or include as factor in a data.frame and formula.dat <-data.frame(x1, treatment)# Build design matrix with contrasts automaticallyX <-model.matrix(~ x1+treatment, data = dat) # includes intercept and two dummies for x4# Coefficients (choose your truth)beta <-c("(Intercept)"=-2.0, # baseline log-odds"x1"=0.8,"treatment"=-0.7)# 3) Linear predictor and probabilitieslinpred <-as.numeric(X %*% beta)p <-1/ (1+exp(-linpred)) # inverse-logit# 4) Simulate binary outcomey <-rbinom(n, size =1, prob = p)dat|>cbind(y)|>count(treatment,y)
treatment y n
1 0 0 2137
2 0 1 381
3 1 0 2277
4 1 1 205
#(213/2317)-(380/2090)# Inspect prevalence and fit back a model to check recovery#mean(y)mod1<-glm(y ~ x1 + treatment, data = dat, family =binomial())mod1|>tbl_regression(exponentiate =TRUE)
Characteristic
OR
95% CI
p-value
x1
2.34
2.12, 2.59
<0.001
treatment
0.47
0.39, 0.56
<0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
mod2<-glm(y ~ treatment, data = dat, family =binomial())mod2|>tbl_regression(exponentiate =TRUE)
Characteristic
OR
95% CI
p-value
treatment
0.50
0.42, 0.60
<0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
The regression coefficients for x3 (binary treatment effect) differ. (The true OR is 0.5)
17 Let’s simulate more datasets and measure the OR and probability difference (ATE)
n <-5000simdatORG<-function(n){# 1) Simulate covariates (examples)x1 <-rnorm(n, mean =0, sd =1) # continuoustreatment <-rbinom(n, size =1, prob =0.5) # binary indicator, # 2) Specify true effects (log-odds scale). Include factor via contrasts.# For factors, use model.matrix or include as factor in a data.frame and formula.dat <-data.frame(x1, treatment)# Build design matrix with contrasts automaticallyX <-model.matrix(~ x1+treatment, data = dat) # includes intercept and two dummies for x4# Coefficients (choose your truth)beta <-c("(Intercept)"=-2.0, # baseline log-odds"x1"=0.8,"treatment"=-0.7)# 3) Linear predictor and probabilitieslinpred <-as.numeric(X %*% beta)p <-1/ (1+exp(-linpred)) # inverse-logit# 4) Simulate binary outcomey <-rbinom(n, size =1, prob = p)# Inspect prevalence and fit back a model to check recovery#mean(y)mod1<-glm(y ~ x1 + treatment, data = dat, family =binomial())#mod1$coefficients[3]mod2<-glm(y ~ treatment, data = dat, family =binomial())#mod2$coefficients[2]##g-computg1<-avg_comparisons(mod1,newdata=dat,variables="treatment")#g1$estimateg2<-avg_comparisons(mod2,newdata=dat,variables="treatment")#g2$estimateouttbl<-tibble(condbiasOR=exp(mod1$coefficients[3]),margbiasOR=exp(mod2$coefficients[2]),cmm=condbiasOR-margbiasOR,condbiasg=g1$estimate,margbiasg=g2$estimate,atecmm=condbiasg-margbiasg)return(outtbl)}#simulationsim0.1<-tibble(n=1:1000)|>group_by(n)|>do(out=simdatORG(n=1000))|>unnest(out)#sim0.1comptbl<-sim0.1|>select(-n)|>tbl_summary(label =list(condbiasOR~"Conditional OR",margbiasOR~"Marginal OR",condbiasg~"Conditional ATE",margbiasg~"Marginal ATE",cmm~"Condition-Marginal OR",atecmm~"Condition-Marginal ATE"))|>as_gt()comptbl
Across 1,000 simulations, the OR differ between a conditional OR vs marginal OR, but ATE remains the same between models (multivariable vs marginal model).