Beyond A/B testing: Using AI to uncover what really works
How causal inference and machine learning can unlock smarter corporate decision-making


As companies adopt artificial intelligence tools, increasing both the volume and the personalization of their offers, a central challenge remains:
“How do you sort out which AI tool or AI agent is the best one?” asked Professor Alexandre Belloni, the Westgate Distinguished Professor of Decision Sciences at Duke University’s Fuqua School of Business.
“How do you understand the value added of a new customer experience, or the return on investment of a new ad?”
According to Belloni, the answer lies in testing and measurement. He shared his insights in a recent talk on Fuqua’s LinkedIn page.
To determine the causal effect of a decision or tool, companies routinely use A/B testing: comparing outcomes reveals whether a new approach is working.
“With A/B testing, you can assess failure quickly,” Belloni said. “You don’t need to change your whole system without knowing if it actually works. You can delay the decision until you know it’s a good one.”
Successful A/B tests abound and have a long history. Belloni cited the 2008 Obama presidential campaign, which used A/B testing to find optimal ad combinations, significantly increasing sign-ups and donations. In another example, a travel agency randomly assigned some employees the option to work from home to measure productivity impacts—finding highly heterogeneous results.
But while still the gold standard, A/B testing isn’t always feasible, Belloni noted.
One limitation arises when participants can opt in or out of the treatment, which skews the results. For example, a company offering customers to sign up for a credit card.
Suppose you are interested in assessing the effect of these credit cards on customers' purchases. Since participants in the treatment group can reject the offer, measuring the effect a card has on purchases becomes difficult because the group who accepted is self-selected, Belloni said.
“The group of customers accepting the offer might be quite different from the overall group,” Belloni explained. “So you don’t have a fair comparison with the control group.”
“This selection bias can significantly distort results,” he said.
A second challenge comes from interference between groups, where individuals influence one another.
Belloni pointed to vaccination programs, which try to assess infection rates in treatment groups who received the vaccine versus control groups who did not. In this example, unvaccinated individuals in the control group who happen to be surrounded by vaccinated people might show a lower infection risk simply because the vaccinated group shields—that is, interferes with—the results of the untreated group.
Similarly with ride-hailing platforms, where an incentive provided to drivers in one geographical area may affect the volume of rides of drivers in different areas, skewing the experiment’s findings.
This is where artificial intelligence—and specifically machine learning—can help with causal inference, Belloni said.
Machine learning models can estimate each customer’s likelihood to accept an offer. By reweighting data based on these tendencies, companies can adjust for selection bias and create fairer comparisons, he said.
By predicting which customers are most likely to accept the credit card offer, the company can isolate the true impact of the credit card itself—rather than simply measuring the behavior of customers who were already inclined to spend more.
“We can build a machine learning model to assess that propensity for us. That allows you to undo the unfair comparison and measure causality. We can even improve our estimates by combining these propensity estimates with machine learning estimates of the impact itself,” he said.
Causal inference can help organizations in complex, real-world environments, but might require designing more sophisticated randomized experiments, Belloni said.
“For example, in many business settings it is very practical to monitor users at different points in time,” he said. “In such cases, we can switch users between treatment and control at different times to mitigate network effects or interference.”
And while tech giants have long invested in these tools and many other techniques, even smaller companies can benefit, he added.
“Some companies starting A/B testing today don’t need a fine-tuned AI system to gain an advantage,” he said. “These tools can be relatively inexpensive and still add significant value—even for smaller businesses.”
First Published: Nov 26, 2025, 13:41
Subscribe Now