How Eppo accelerates experimentation for thousands of customers, with Ryan Lucht

Evangelism, insights, and innovation at Eppo by Datadog

Can you elaborate on your role at Eppo and Datadog, particularly regarding evangelism and the initiatives you lead?

Much of my work at Eppo and Datadog has been helping both customers and non-customers navigate and adopt experimentation within their organisations. My initial pitch to Eppo was that while they had built an incredible tool, many barriers to fully adopting experimentation can’t be solved by tooling – they’re cultural in nature. So my work involves researching, writing and speaking about the challenges in scaling experimentation, amplifying the stories of leaders who have successfully overcome them, connecting smart folks in the space, and supporting other work towards growing experimentation globally.

We publish “Outperform,” which is a podcast, blog, and now a 300-page print magazine being shipped out soon. It features content from across the experimentation space, not just Eppo/Datadog, with authors from companies like Booking.com, Twitch, Canva, and Groupon, sharing their success stories and insights. There’s no “one size fits all” advice for this stuff, so hearing peer stories and building connections is crucial.

We’re also co-producing this year’s edition of the EXL conference in Austin this October with Speero, an experimentation-led growth consulting firm. It’s a unique one in its blend of topics. Many conferences today focus on “what to test,” attracting marketers, UX researchers, and product managers discussing hypotheses. Separately, there are “platform” conferences for experimentation leaders from large tech companies like Netflix and Amazon, who focus on underlying methodology and scaling support for diverse use cases, rather than individual hypotheses. There has been no space where these two sides interact. EXL aims to bring both groups together to learn from each other; platform people need to understand end-users and use cases, and marketers, designers, and product folks need to understand how to run good experiments and increase velocity. The conference will feature short talks, but most of the time will be dedicated to curated small group discussions led by about 50 moderators who are successful experimentation leaders from various industries and company sizes. It’s primarily about peer-to-peer connection rather than passive listening.

What are some of the most interesting or surprising use cases you have encountered with Eppo customers across different industries?

There are tons of interesting stories, though I’d need to anonymise some details. Eppo customers span a wide range of industries, including software companies, dog food delivery, banks, insurance companies, greeting card manufacturers, and various physical product and e-commerce businesses. A key advantage of modern experimentation tools like Eppo is the ability to experiment on anything, not just websites. For example, one retailer’s finance department wanted to renegotiate their agreements with the brands they stocked. They ran a series of experiments by excluding certain brands from the shopping experience to observe if customers would substitute those brands or avoid purchasing altogether if their preferred brand was unavailable. This allowed them to understand each brand’s market power and use that information to renegotiate deals.

Companies have also tested the impact of billboard campaigns using Eppo. While you cannot A/B test who sees a billboard, you can use quasi-experimental designs, part of a suite of tools referred to as “incrementality tests” in marketing. There are also advanced experiment designs like “switchbacks”: in two-sided marketplaces like Uber, running A/B tests is challenging due to interference effects between drivers and riders. In such cases, “switchback tests” are used, where units of time are randomised instead. Uber might switch a treatment on and off hundreds of times over a two-week period, then normalise and compare the performance of those time periods. Careful consideration is needed, for example, to ensure rush hour windows are distributed evenly between control and treatment groups. Other customers like Perplexity are using Eppo to test and orchestrate the matching of various AI models to use cases within their product and userbase. There’s never a boring day.

Driving experimentation velocity and culture

How do you advise teams to cultivate the necessary culture and structures to increase their experimentation velocity, moving from a few experiments to thousands per year?

Velocity is primarily driven by two factors: adoption and efficiency. Trustworthiness is another important consideration, but more of a foundational prerequisite, so let us assume that is always paramount.

Adoption, which is how many people are interested in and adopting experiments, is determined by both cultural and technical considerations. If we want the marketing team to run experiments, we need to provide them with a way to do so that does not require significant engineering resources, as marketers typically are not in the product’s codebase, nor do they often have dedicated engineering headcount. Then culturally, some teams readily embrace experimentation, while others are very resistant. Even in large tech companies, the growth story of experimentation often involves an early adopter team that built a beachhead and served as an example for the rest of the organisation. For instance, at Microsoft, the culture of experimentation wasn’t just magically ingrained from the start. When Ronny Kohavi joined Microsoft from Amazon, he had to work with various teams to find adopters, and eventually Bing became that exemplar or beachhead. It was only when Satya Nadella, formerly head of Bing, became CEO of all of Microsoft that experimentation became a mandate across the company, because he knew its importance firsthand.

How about efficiency? What measures can organisations take to significantly reduce the cost per experiment, and how does this impact overall experimentation efforts?

The other crucial factor for velocity is efficiency, which I encourage leaders to consider as “cost per experiment”. Jeff Bezos highlighted this idea as early as the mid-2000s, stating that “the key is to reduce the cost per experiment” as close to zero as possible. This requires both infrastructure and effective ways of working. Technically, launching an experiment needs to be easy and low friction, but it also needs to be simple in terms of approvals and man-hours spent. If an experiment requires three meetings for approval, or if most people are told they cannot try something, it becomes very expensive in terms of time. Amazon has a cultural tenet called “the institutional yes,” meaning if you want to try something, the default answer is yes, go try it. The onus is on those who wish to argue against trying something to make that case.

Regarding the make-up of cost per experiment, we can break it down into a few components. A very small slice is “compute,” the actual bill for computing power used to crunch numbers in a data warehouse. Then there are infrastructure costs, such as an Eppo license or headcount for an in-house platform team. This is a larger chunk, but it amortises across all experiments, so the more experiments you run, the lower the incremental impact per experiment.

The bulk of the cost comprises all the steps for each individual experiment, including generating ideas, conducting research to form hypotheses, writing experiment plans, developing code or designs for treatments, and meetings for analysing results and deciding next steps. To reduce these costs, AI is becoming capable of writing code for experiments, especially for code that does not need to live forever or be highly performant. Figma is also working on AI tools for design. You can also reduce discussion and analysis costs by pre-registering decisions, like in medical trials. By deciding upfront what actions to take based on potential outcomes, you can skip post-experiment cycles and reduce bias. The cost per experiment continues to decrease annually as the tech stack advances, complemented by necessary cultural changes. There is also a hidden opportunity cost, representing what could have been achieved with more experiments if the current ones were cheaper, or delays in rolling out winning ideas.

Like this insight? Get exclusive interviews & actionable Martech strategies delivered weekly. Subscribe free to Martech Family.

The foundations of experimentation success

Could you share your journey into the experimentation field and what initially drove your passion for it?

My start in experiments was a little over a decade ago at a small e-learning startup. It was for a job interview training company with a strong content machine, but very few conversions to their paid app. As the second full-time hire and a marketer, I quickly realised that running experiments on all the traffic we were getting from our content was the most powerful way to grow the business. At the time, Optimizely cost $30 a month, which gives you a sense of how long ago it was. We grew the business exponentially. There was one single experiment that we ran that tripled our revenue essentially overnight. It turns out that people motivated to seek job interview training were willing to pay much more than what we were charging. That got me hooked on experimentation.

When it was time for my next opportunity, I joined Cro Metrics, a boutique consulting firm and a large Optimizely partner. Their whole focus was experimentation, and we helped companies like Zillow, DoorDash, and Uber understand how to leverage this. Big tech companies had been doing it for some time, but the rest of the world was just starting to figure it out. It was while I was at Cro Metrics that I met the Eppo team, saw an early pitch deck, and loved the vision. It was only a matter of time until I joined Eppo (and now Datadog).

Is there still a significant gap in experimentation practices between large tech companies and mid-tier companies today?

Yes, the gap is very real, but it is shrinking. The largest experimentation companies, such as Microsoft, Amazon, Netflix, and Google, regularly report running tens of thousands or even hundreds of thousands of experiments annually. Very few companies operate at that scale. The disparity stems from both tooling and infrastructure; you need specific infrastructure to support experimentation at such a scale. I chose to join Eppo partly because I believed they were developing a product that would enable a broader range of companies, not just the tech giants, to achieve that level of experimentation.

However, the gap is also cultural. At companies running so many experiments, a significant portion are automated, such as canary tests for new code deployments. This is different from the marketing space, where we might manually build and launch tests for new homepage copy or onboarding flows. While it is still not an easy mountain to climb, if a company has the cultural aspiration and invests in the correct infrastructure, they can make so much progress. Many Eppo by Datadog customers today are running thousands, even tens of thousands, of experiments annually, which was not a possibility for most companies five to seven years ago.

Advancements, the future, and personal experimentation

What are the most significant technical improvements in the experimentation domain over recent years, and how do you foresee AI shaping its future?

AI isn’t fundamentally changing the underlying mathematics of experimentation. There is a lot of “AI washing” in the MarTech space, where traditional machine learning techniques are rebranded as AI for sales purposes. One of the most significant innovations in recent years has “simply” been the development of experimentation platforms that operate natively within a customer’s data warehouse. When I was an Optimizely customer and partner, I spent considerable time resolving data discrepancies between Optimizely’s reports and a company’s internal source of truth or Google Analytics account. This often led to leadership distrusting the experiment’s results. Now, by running experiments directly with metrics already defined in a company’s data warehouse, without data egress, such discrepancies are totally eliminated (not to mention the benefits to privacy and security). This allows experimentation on metrics completely unrelated to a website or detached from the test itself. Eppo was the first product on the market to build this, then Statsig built their own warehouse-native tool, and Optimizely acquired a company called Netspring to help them build similar capabilities.

Other advancements have come from a new generation of experimentation tools like Eppo by Datadog or the open-source platform GrowthBook making sophisticated statistical tools available to everyone. For example, Eppo was the first commercial platform to offer CUPED, a variance reduction technique developed by Microsoft, which can significantly slash the sample size required for experiments when prior user information is available. Other vendors quickly followed suit, and now CUPED is available from four or five different providers. Although this makes it harder for us to compete in the market, it’s great news – I genuinely want these powerful tools to be democratised.

Looking ahead, I believe AI’s main impact will be in reducing the “cost per experiment” further. This includes things like AI writing code for experimental treatments, cleaning up old feature flags for engineering teams, or even initially wrapping code in feature flags. Historically, marketing-focused testing tools struggled because they wrote code “on top of” existing code, executing it in the user’s browser. This was prone to breaking, especially with dynamic frameworks like React, where CSS selectors change with deployments. If AI could help marketers write code that integrates directly into their codebase instead of being detached, it would be a major fix. I anticipate significant progress on this front within the next one to two years.

If we were to talk three years from now, what would be the biggest transformation in the experimentation landscape?

I think three years might be slightly optimistic for the true, “agentic” version of this, but ideally three years from now, anyone within a company could launch an experiment with a simple prompt. AI would assist with designs and code. An experimentation team would have defined protocols for how experiments should be run, including metrics and statistical guardrails, which would be automated for the end-user. Decisions would be pre-registered, meaning if the primary metric improves and no guardrails are triggered, the change would automatically ship. AI could also automatically remove feature flag code. A data scientist might simply have an inbox of tests to review and approve. By combining AI assistants with governance tools for statisticians and data scientists, we could unlock further scale in experimentation.

On a personal note – the Eppo team is currently building a whole new feature flagging and experimentation product inside Datadog, taking everything we’ve learned and combining it with some incredible capabilities our new home at Datadog enables. Datadog has over 30,000 customers, and within months, they’ll have access to this new product inside Datadog and feature flags and experiments will be coupled with all the observability, reliability, and performance data that companies already collect and monitor using Datadog. For example, when launching an experiment, you could instantly see real-world performance data, like website latency, to understand if the measured impact is due to your change or if you inadvertently introduced a performance issue. Having all this data in one place will be highly beneficial.

We hear you run personal experiments; what is the most interesting or unusual experiment you have conducted on yourself?

I run personal experiments all the time. One rather unusual set of experiments involved nootropic supplements. These are vitamins and supplements intended to enhance focus, energy, or alertness. I randomly took these supplements and tracked metrics such as screen time for specific apps using tools like RescueTime, and even used EEG headbands to measure brainwave prevalence. I wanted to see if taking these supplements changed my brainwave production and whether I was spending my time on productive work or merely browsing the internet. I did not find conclusive results, which led me to stop taking some very expensive pills. (Perhaps I could design those experiments better in the future… but I’ve generally stopped believing in magic pills. That movie “Limitless” was all too fictional.)

I’m generally a big fan of “citizen science.” Sometimes this involves experiments, for example I recently came across “the big mouth taping trial,” where people are trying out taping their mouths shut at night to promote nasal breathing. Participants are running self-experiments with mouth taping, randomising it, and collating their results to form a broader randomised controlled trial. Love it. Sometimes this is just longitudinal studies, like the Dog Aging Project (a study on canine longevity) that we participate in with our 12lb rescue shih tzu Captain.

What are you reading right now?

I’m always reading 5+ books at once. Right now I’m enjoying revisiting Experimentation Matters by Stefan Thomke – his lesser-known book from 2002 that explores innovation and product development via experimentation long before A/B testing became popular.

What’s your favorite app?

MacroFactor is probably the app I open the most since I use it to track my diet, so I’m logging meals 5 times a day or so. It’s a super well-built app and the scientific thinking behind its algorithm is fantastic.

Which three tools would you take to a new job?

Boring answers, but:
Slack, because I won’t work at a company that uses Microsoft Teams.
Motion, which I rely on for managing tasks and my calendar.
Evernote, my personal favorite for note-taking. Notion never really worked for me.

Where do you find meaning in life?

In trying new things. I’m driven by learning and pushing my limits. That mindset is why experimentation feels like the perfect fit for me. It’s hard to imagine doing anything else.

You can find more information about Eppo here and contact Ryan Lucht here.