The 2026 ACIC Data Challenge
Introduction
After a multiyear hiatus, we are excited to announce the return of the ACIC Data Challenge. This year the challenge will focus on the complications that arise in situations with multiple treatments. This will help us to understand how causal inference approaches handle the competing demands of settings with a wide variety of estimands. What’s the best approach to avoiding overconfidence in situations involving multiple, dependent comparisons? Can the same method yield strong results for both ATEs and subgroup effects? What are the tradeoffs between accurately estimating the effects of different treatments versus determining which of those treatments is most efficacious? We hope to address questions like these and more!
In addition, a key goal this year is to create a more inclusive competition by lowering the barriers to entry with regard to computing power and level of automation. To that end, our datasets will all be reasonably small and we provide an optional submission track for only 18 representative datasets.
Submissions will be due on April 20, 2026. We will announce the results at ACIC in May!
Data Background
For the Data Challenge, we have created 9000 different datasets, each of which was drawn from a distinct large population. These populations differ not only in terms of the treatment assignment probabilities but also in terms of the size of the treatment effects and the complexity/heterogeneity/functional form of the (conditional) average treatment effect functions. However, you may assume that treatment was assigned completely at random within each population. So, standard assumptions of ignorability and overlap are satisfied. You may also assume that Stable Unit Treatment Value Assumption (SUTVA) is satisfied. Across all populations, you may assume that \(a\) represents the control or “business as usual” condition and that \(b, c, d, e\) represent different treatment arms.
Each dataset contains \(n\) triplets \((\boldsymbol{\mathbf{x}}_{i}, z_{i}, y_{i})\) of $p = $ covariates \(\boldsymbol{\mathbf{X}}\), treatment indicator \(Z \in \{a,b,c,d,e\},\) and outcome \(Y.\) You may assume that these triplets were drawn from a much larger population of size \(N \gg n.\)
Covariates
There are continuous, binary, and nominal covariates. You may assume that the distributions of the covariates are the same across all populations considered in this challenge.
Target Estimands
Formally, each individual in the population has five potential outcomes \(Y_{i}(a), Y_{i}(b), Y_{i}(c), Y_{i}(d), Y_{i}(e)\), out of which exactly one is observed: \(y_{i} = Y_{i}(z_{i}).\) We will be interested in estimating four types of average treatment effects. First, we would like to estimate for each observed individual in the sample \(i = 1, \ldots, n,\) their individual-level conditional average treatment effect for each \(z \in \{b,c,d,e\}\): \[ \textrm{iCATE}(\boldsymbol{\mathbf{x}}_{i}, z) = \mathbb{E}[Y_{i}(z) - Y_{i}(a) \vert \boldsymbol{\mathbf{X}} = \boldsymbol{\mathbf{x}}_{i}]. \] The second type of effect is the sample-level conditional average treatment effect for each \(z \in \{b,c,d,e\}.\) \[ \textrm{sCATE}(z) = n^{-1}\sum_{i = 1}^{n}{\mathbb{E}[Y_{i}(z) - Y_{i}(a) \vert \boldsymbol{\mathbf{X}} = \boldsymbol{\mathbf{x}}_{i}]}. \]
The third type of effect is the subgroup-specific sample-level conditional average treatment effect within subgroups defined by the value of \(X_{12} \in \{0,1\}\) for each \(z \in \{B,C,D,E\}\): \[ \textrm{subCATE}(z,x) = \left[\sum_{i = 1}^{n}{\mathbf{1}(x_{i,12}= x)}\right]^{-1} \times \sum_{i=1}^{n}{\left[\mathbb{E}[Y_{i}(z) - Y_{i}(a) \vert \boldsymbol{\mathbf{X}} = \boldsymbol{\mathbf{x}}_{i}] \times \mathbf{1}(x_{i,12} = x)\right]}. \] The fourth effect is the population average treatment effect for each \(z \in \{B,C,D,E\}\): \[ \textrm{PATE}(z) = N^{-1}\sum_{i = 1}^{N}{Y_{i}(z) - Y_{i}(a)}. \]
In addition to computing point estimates and pointwise uncertainty intervals for each of these effects, we are interested in determining which treatment is most effective relative to \(a\), where positive treatment effect values are considered better/more effective than negative values. Specifically, we wish to compute the most effective treatment for
- Each individual \(i = 1, \ldots, n\) in the sample: \(\boldsymbol{\mathbf{x}}\), \(\textrm{argmax}_{z} \textrm{iCATE}(\boldsymbol{\mathbf{x}}_{i}, z).\)
- On average, across the observed sample:\(\textrm{argmax}_{z} \textrm{sCATE}(z)\)
- On average, across the observed sample, within each subgroup defined by\(X_{12} \in \{0,1\}:\) \(\textrm{argmax}_{z} \textrm{subCATE}(z,x).\)
- On average, across the whole population: \(\textrm{argmax}_{z} \textrm{PATE}(z).\)
Registration & Competition Timing
To participate in the competition at least one participant of every team needs to be a member of the Society for Causal Inference (see here for details on how to become a member of SCI). Please register for the challenge by April 1, 2026 using this form. Registration will provide you access to the data and allow us to assign you a team number (up to 5 submissions per team).
We will host two submission tracks:
- Curated: this track focuses on 18 datasets that are representative of the challenges created for this competition
- The Works: Submit results for all 9000 datasets.
Preparing Your Submission
For each dataset you analyze, you should create a separate CSV file containing estimates and, in the case of the \(\textrm{iCATE}, \textrm{sCATE},\) \(\textrm{subCATE},\) and \(\textrm{PATE}\) evaluations, uncertainty intervals. Refer to the following sections for details about how to name and organize these files. For all estimands and datasets, you will need to include the following information in the submission filename:
dataID: dataset ID. Each dataset is stored in a file named something like"data_dataID.csv", wheredataIDis a 4 digit number running from0001to9000.teamID: a unique ID number assigned to each submission team. Team numbers will be assigned no later than April 1, 2026 by the organizers.submissionID: if you enter multiple submissions, you will need to include a numeric id so that we can tell your submissions apart. See below for more details about multiple submissions.
Please create a single compressed .zip or .tar.gz archive containing all your submission files, a README file containing a short, narrative description of your solution, and reproducible code. The README file can be in plain-text, (R)Markdown, Quarto, or PDF file. Submissions without a README will not be considered for discussion at the ACIC meeting or in any follow-up article describing the competition. Your compressed archive should named teamID_submissionID.zip or teamID_submission_ID.tar.gz.
\(\textrm{iCATE}\) Evaluations
For each dataset, submit a single CSV file containing point estimates and uncertainty intervals for the individual conditional average treatment effect functions evaluated at each observed covariate vector. These outputs should be arranged in “long” format with the following five named columns:
- “ID”: the observation identifier (numeric)
- “z”: treatment arm (i.e., “b”, “c”, “d”, or “e”) that is being contrasted with “a”, the business as usual/control condition (character).
- “Estimate”: the point estimate for \(\textrm{iCATE}(\boldsymbol{\mathbf{x}}_{\textrm{ID}}, \textrm{z}).\)
- “L95”: the lower bound for a 95% uncertainty interval for the corresponding CATE evaluation
- “U95”: the upper bound for a 95% uncertainty interval for the corresponding CATE evaluation
The CSV file should be named "iCATE_dataID_teamID_submissionID.csv". If the dataset contains \(n\) observations, this file should contain \(4n + 1\) rows, where the first row contains the header/column names.
\(\textrm{sCATE}\) Evaluations
For each dataset, submit a single CSV file containing point estimates and uncertainty intervals for the sample average conditional average treatment effect functions. These outputs should be arranged in “long” format with the following four named columns:
- “z”: treatment arm (i.e., “b”, “c”, “d”, or “e”) that is being contrasted with “a”, the business as usual/control condition (character).
- “Estimate”: the point estimate for \(\textrm{sCATE}(\textrm{z})\) (numeric).
- “L95”: the lower bound for a 95% uncertainty interval for the corresponding CATE evaluation (numeric).
- “U95”: the upper bound for a 95% uncertainty interval for the corresponding CATE evaluation (numeric).
The CSV file should be named "sCATE_dataID_teamID_submissionID.csv" and should contain 5 rows (including the header/column names).
\(\textrm{subCATE}\) Evaluations
For each dataset, submit a single CSV file containing point estimates and uncertainty intervals for the average treatment effect within subgroups defined by \(X_{12} \in \{0,1\}\) These outputs should be arranged in “long” format with the following named columns
- “z”: treatment arm (i.e., “b”, “c”, “d”, or “e”) that is being contrasted with “a”, the business as usual/control condition (character).
- “x”: the value of \(X_{12}\) (i.e., 0 or 1) (numeric)
- “Estimate”: the point estimate for \(\textrm{subCATE}(\textrm{z}, x)\) (numeric).
- “L95”: the lower bound for a 95% uncertainty interval for the corresponding GATE evaluation (numeric).
- “U95”: the upper bound for a 95% uncertainty interval for the corresponding GATE evaluation (numeric).
The CSV file should be named "subCATE_dataID_teamID_submissionID.csv" and should only have 9 rows (including the header/column names).
\(\textrm{PATE}\) Evaluations
For each dataset, submit a single CSV file containing point estimates and uncertainty intervals for the population average treatment effect. These outputs should be arranged into the following columns
- “z”: treatment arm (i.e., “b”, “c”, “d”, or “e”) that is being contrasted with “a”, the business as usual/control condition (character)
- “Estimate”: the point estimate for \(\textrm{PATE}(\textrm{z}).\) (numeric)
- “L95”: the lower bound for a 95% uncertainty interval for the corresponding GATE evaluation (numeric)
- “U95”: the upper bound for a 95% uncertainty interval for the corresponding GATE evaluation (numeric)
The CSV file should be named "PATE_dataID_teamID_submissionID.csv" and should only have 5 rows (including the header/column names).
Best \(\textrm{iCATE}\)
For each dataset, submit a single CSV file identifying the treatment arm that is most effective, on average, relative to \(A\) for each observed individual. These outputs should be arranged in “long” format with the following columns:
- “ID”: the observation identifier (numeric)
- “best_z”: treatment arm \(z\) (i.e., “b”, “c”, “d”, or “e”) that maximizes \(\textrm{iCATE}(\boldsymbol{\mathbf{x}}_{\textrm{ID}}, z)\) (character)
The CSV file should be named "BEST_iCATE_dataID_teamID_submissionID.csv". The dataset contains \(n\) observations, the submission file should have \(n+1\) rows (including the header/column names).
Best \(\textrm{sCATE}\)
For each dataset, submit a single CSV file identifying the treatment arm that is most effective relative to A on average across the observed sample. These outputs should be arranged into the following columns:
- “best_z”: treatment arm \(z\) (i.e., “b”, “c”, “d”, “e”) that maximizes \(\textrm{sCATE}(z)\) (character).
The CSV file should named "BEST_sCATE_dataID_teamID_submissionID.csv" and should contain 2 rows (including the header/column names).
Best \(\textrm{subCATE}\)
For each dataset, submit a single CSV file identifying the treatment arm that is most effective relative to \(A\) within subgroups of the sample defined by \(X_{12}.\) These outputs should be arranged in “long” format with the following columns:
- “x”: the value of \(X_{12}\) (i.e., 0 or 1) (numeric).
- “best_z”: treatment arm \(z\) (i.e., “B”, “C”, “D”, or “E”) that maximizes \(\textrm{subCATE}(z,x)\) (character).
The CSV file should be named "BEST_subCATE_dataID_teamID_submissionID.csv" and should include 3 rows (including the header/column names).
Best \(\textrm{PATE}\)
For each dataset, submit a single CSV file identifying the treatment arm that is most effective relative to \(A\) averaged across the whole population. These outputs should be arranged in “long” format with the following columns:
- “best_z”: treatment arm \(z\) (i.e., “B”, “C”, “D”, or “E”) that maximizes \(\textrm{PATE}(z)\) (character)
The CSV file should be named "BEST_PATE_dataID_teamID_submissionID.csv" and should include 2 rows (including the header/column names).
Multiple Submissions
Each team is welcome to make up to 5 different submissions. Please be sure you explain how these submissions differ in your submitted README files.
Frequently Asked Questions
What evaluation criteria will be used
We will compute relatively standard metrics (e.g., uncertainty interval coverage and width, bias and square errors for point estimates, etc.).
How will you construct a single ranking of methods/submission?
Since different methods perform differently on different tasks, we will not attempt to construct a single ranking of submissions. Instead, we will report how each submission performs according to each evaluation metric.
Can we assume that each dataset was drawn uniformly at random from its respective population?
Yes, each dataset was drawn uniformly from its respective population.
Do we need to compute all estimands for all of the datasets?
Our hope is that you submit estimates for all estimands for all datasets! We understand, however, that (i) certain methods may not be designed for estimating certain estimands and (ii) it may be prohibitive to analyze all the datasets. So, please submit what you can and include a short explanation in the README of your submission.
Are the individual datasets drawn from distinct populations or one large population?
Each dataset is drawn from its own respective population. However, the marginal distribution of \(X\) is the same across all populations and, because the treatment is randomized, the marginal distribution of \(Z\) is the same across all populations. Only the conditional distributions of \(Y \vert X , Z\) differ across populations.
Are there any mediators or post-outcome variables?
No. You may assume that all \(X\)’s were measured before treatment was assigned and administered.
Are all the \(X_{j}\)’s confounders?
Recall that the treatment \(Z\) was randomly assigned. So, there are no confounders. However, some of the \(X\)’s may modify the effect of the treatment relative to the business as usual/control condition.
Are there any restrictions on generative AI usage?
We welcome the use of generative AI for this challenge! We only ask that you provide a description of how you used generative AI in your submission README and (ii) reproducible analysis code.
Will you provide feedback on individual submissions?
No. After the submission window closes on April 20, we will evaluate each submission and report the results of the Data Challenge at the ACIC Meeting in May 2026.
Contact
If you have any questions, please do not hesitate to email us at acic2026datachallenge@gmail.com.