Meta’s Gufeng Zhou on MMM, Causality, and the Future of Attribution

You didn’t start your career as a data scientist. How did you end up leading an open-source MMM project at Meta?

I actually started my career in journalism. I studied it, worked as a journalist, and spent time writing articles and shooting photos. Over time, I moved into marketing analytics. My first deep dive into data came when I joined a marketing agency in Berlin, working on web analytics, newsletter campaigns, and Google Analytics. That’s where I developed my first real understanding of measurement.

Eventually, I moved to Zalando, where I transitioned into more technical work. That’s when I really started coding—at 33, a bit late for the field, but it was the right time to learn. At Zalando, I worked on geo experiments, statistical attribution and media mix modeling (MMM), which set the foundation for what I’d later do at Meta.

How did the idea for Robyn emerge at Meta?

Robyn started as a hackathon project. At the time, a lot of marketing science work was focused on experiments and attribution and both approaches had limitations. With increasing privacy constraints, deterministic tracking was becoming unreliable, so MMM was making a comeback.

We built the first version of Robyn as a way to semi-automate and standardize MMM methodology, making it more accessible to teams inside Meta. The initial prototype was only satisfactory, but we saw the potential. Over time, the project gained momentum, and eventually, it became a fully supported open-source project.

Why did Meta choose to open-source Robyn instead of keeping it internal?

Marketing Mix Modeling relies on cross-channel data, including platforms that compete with Meta. There was no way advertisers would share that full dataset directly with us—for example their TV spend, Google Ads budget, and offline marketing data. So if we wanted to contribute to the cross-media measurement space, we had to go open-source.

By open-sourcing Robyn, we made it possible for advertisers and agencies to use an MMM framework they could trust, without worrying about platform bias. It also helped standardize MMM methodologies, bringing more transparency to how measurement works.

What impact did Apple’s App Tracking Transparency (ATT) policy have on Robyn’s adoption?

Robyn was first beta-released in 2020, but adoption really took off after Apple’s ATT policy in 2021. That change made it harder to track user-level conversions, so advertisers started to turn to alternative ways to measure effectiveness. MMM, which works with aggregated data rather than user-level data, became more attractive.

With deterministic tracking getting weaker, some companies started looking at on-platform incrementality testing and MMM as key components of their measurement stack. Robyn was already available as an open-source solution, so it naturally gained traction during that shift.

How does Robyn compare to other MMM solutions on the market?

At its core, Robyn is designed to balance automation with flexibility. It standardizes key modeling processes, while giving the users the ability to fine-tune and select from a multitude of candidate results.

We’re very glad to see the diversifying and flourishing trend in the open source MMM field. Robyn is now considered one of the “big three” open-source MMM frameworks, alongside Meridian and PyMC. This trend pushes more standardization and transparency into the industry. What used to be a secretive, expensive service is now accessible to everyone.

At its core, Robyn is designed to balance automation with flexibility. It standardizes key modeling processes, while giving the users the ability to fine-tune and select from a multitude of candidate results. That’s why it’s been adopted by brands and agencies, with some SaaS providers building their own solutions on top of it.

What makes Robyn different?

Marketing Mix Modeling (MMM) has been around for decades, but traditional models often suffer from subjectivity and inconsistency. Analysts have too much flexibility in defining their models—selecting variables, determining lag effects, and handling saturation manually. Two analysts working at the same dataset can produce entirely different results.

Robyn was designed to reduce the analyst bias and improve model consistency. It uses multi-objective optimization, meaning it doesn’t just focus on one goal, for example predicting sales accurately. It also optimises towards non-extreme attribution, ensuring media channels are getting interpretable results and credits. With Pareto optimality on multiple objectives, Robyn generates a set of optimized candidate results instead of just one, allowing users to make informed model selection decisions on the outcome level.

Can you explain how optimization works in Robyn?

Common regressions are single-objective, optimizing only for predictive accuracy—how well the model estimates future sales based on past data. We see MMM as a real-life multi-objective problem. It needs to balance multiple goals:

Prediction accuracy – The model should accurately estimate total sales, conversions or any responses.
Fair credit allocation – The model should assign interpretable credit to each media channel.
Calibration accuracy – Optionally, the model should accurately predict results from causal experiments

The challenge is that a model with perfect predictive accuracy could still assign nonsensical attributions to different media channels. Robyn solves this by using multi-objective optimization, which means it doesn’t produce just one model—it generates a Pareto frontier of models that balance prediction accuracy and attribution fairness.

This means marketers don’t get a single “correct” model but a set of plausible models, each representing a trade-off between accuracy and interpretability. This forces advertisers to make informed choices rather than blindly trusting a single output.

Robyn also integrates lift test data. How does that work?

A major limitation of MMM is that it relies purely on observational data. To introduce causality into an observational model, Robyn allows injection of Meta lift or any experimental results—randomised controlled experiments that measure real-world campaign impact.

When a company runs an incrementality test, the result can be fed into Robyn as additional objective function. This guides the model towards causal results and ensures its media effectiveness estimates align with experimental data as close as possible.

For example, if a lift study finds that Meta ads drive 100 incremental conversions for week 5, Robyn can adjust its credit assignment accordingly.

What challenges did you face when designing Robyn?

Aggregated marketing datasets are often inter-correlated, because advertisers tend to plan “flights” on multiple channels at the same time. We can often observe similar peaks and dips across different channel spends. This is also called multicollinearity and can cause instability in model convergence, overfitting and uninterpretable credit allocation. Robyn uses the Ridge estimator to counter this, while the multi-objective optimizer naturally reduces overfitting.

Another challenge is endogeneity, meaning the inability to conclude causal inference due to correlation in predictive error. Another angle to look at this is the omitted variable bias. In theory, there are infinite predictors to a certain response. We can always assume that some relevant variables are left out. On top of this, there are also confounding variables. In summary, it’s very challenging in real-life to determine the causal structure of the model. Robyn believes in calibration of MMM using experimental results.

Moreover, assessing the project impact is also challenging. Robyn is open-source, which means we don’t know who’s using it or how. Unlike a closed tool where user tracking is built-in, we have to rely on community engagement and industry feedback to measure success.

What’s the biggest misconception people have about using Robyn?

Some marketers believe that an MMM produces a single “true” answer, and a lot of tools are offering this. Robyn is quite different, because we believe we don’t have all the answers.

Robyn produces multiple optimal results by design due to the multi-objective principle mentioned before. Some users found it confusing at first, because it’s not obvious how to pick the final result. It’s true. Model selection is hard and we as a “white label” solution provider can’t pretend we know it better. We’re making this challenge very obvious by making the model selection process transparent.

Robyn’s workflow design forces marketers and agencies to reevaluate their hypothesis, interrogate their input data and obtain more business and media context to assist decision making. We also recommend communicating more with the media teams to gain more execution insights.

Like this insight? Get exclusive interviews & actionable Martech strategies delivered weekly. Subscribe free to Martech Family.

Robyn is one of the biggest open-source MMM frameworks today. How did it reach this level of adoption?

Robyn was released at the right time. Before 2021, MMM was a rather niche solution used by larger advertisers with in-house analytics teams or expensive consulting firms. But when the privacy regulations started breaking deterministic identity matching, advertisers had to look for alternatives.

MMM, which doesn’t rely on user-level tracking, became a natural choice. At that moment, Robyn was already open-sourced, relatively stable, and tested by some advertisers. It offered a lower-cost and scalable way to implement in-house MMM.

Early adopters validated its effectiveness, which led to more advertisers, agencies, SaaS companies, and 3P vendors adopting it. We can also observe that some of the components in Robyn can be found in other open source solutions, a sign of industry standardization. I think it’s fair to say that Robyn has contributed to the emergence of the new open source measurement eco-system. We’re very proud of this.

How does Robyn compare to other MMM solutions?

We’re very glad to see the diversifying and flourishing trend in the open source MMM field. Robyn is now considered one of the “big three” open-source MMM frameworks, alongside Meridian and PyMC. This trend pushes more standardization and transparency into the industry. What used to be a secretive, expensive service is now accessible to everyone.

We’re often asked why Robyn isn’t using Bayesian framework. The Bayesian framework has seen rising popularity for business implementations because of its attractive properties like the Bayesian prior as a native model calibration feature and the intuitive interpretation of the Bayesian credible interval. It’s a good choice for MMM.

Regarding the “Bayesian vs Frequentist” debate, however, I want to point out the fact that these two frameworks are more like “two sides of the same coin” rather than having a competitive nature. For example, the Frequentistic Ridge regression is the equivalent to a Bayesian regression with a normal / Gaussian prior. This aspect is academically well-studied. If well-specified, Robyn is able to reproduce any results from other frameworks.

What types of companies are using Robyn?

Robyn is widely used across all industries. There are advertisers using it to power their in-house MMM capabilities. We’ve seen large and small gencies and consultancies use it to enrich their own measurement services. Some SaaS providers have built their solutions on top of Robyn. Also some academics are using Robyn to for research purposes.

Open source and vendors are often not mutually exclusive. We’ve seen companies working with agencies and vendors on the implementation as well as on-going modeling. In the end, open source tools do require in-house investment and service providers remain an attractive option to many.

Since Robyn is open-source, how does Meta track its success?

That’s one of the biggest challenges in open-source projects. Besides download numbers, we have no visibility into users unless they actively approach us and share. Some users asked us if there’s any “back door” to pass the data back to Meta. The answer is no.

The main usage signals come from community engagement—GitHub activity, industry discussions, and direct feedback from advertisers. There are also indirect signs, like when companies mention Robyn in conference talks or case studies.

At the end of the day, Robyn isn’t Meta’s monetization model and will never be. It’s about standardizing marketing measurement, making MMM more accessible, and ensuring brands have reliable ways to measure cross-channel effectiveness.

What are the biggest trends you see right now in measurement?

There’s a strong focus on causality across the industry. Causal inference is a spectrum, with randomised controlled experiments being the gold standard. When experiments are well designed and executed, with large enough sample sizes, proper randomization and strong treatment, they provide the highest level of reliability in the causal inference ladder. In other cases where RCTs are impossible, there are other options like quasi-experiments as well as graph-based causal exploration techniques to introduce certain levels of causality.

Building on top of this, triangulation between MMM, MTA and experiments are becoming the holy grail of modern measurement, combining the strength of different methodologies instead of relying on a single approach. Each method has its limitations, but together they offer a more balanced view of marketing impact.

Another big challenge that needs more attention would be deep dive capability in both recency and granularity of MMM. So far, even with the latest advancement, MMM is still far away from delivering granular insights as attribution users are used to. There’re quite some commercial vendors claiming to specialize in this. But I lack visibility into those details to make an informed opinion how reliable these methodologies are. I expect more innovation into this direction.

How does Robyn fit into the shift toward causality?

Robyn already offers a lift calibration feature. Recently, we introduced a new feature called curve calibrator, which allows calibration of the entire saturation curve instead of just the point estimate. The first use case is calibrating the response saturation with reach and frequency data and we’ll expand it in the future. With this, we’re enabling calibration for advertisers who cannot run experiments due to various reasons.

In the long run, we believe that model calibration should move beyond just point estimates of channel effects. We believe that all estimates in an MMM system should be calibratable, including the full curve of saturation, the marginal returns along the curve, adstock and more, instead of just the beta coefficients. This would foster a more mature view on the model architecture.

What about long-term impact? Can MMM capture that?

Long-term effects are one of the hardest things to measure. If you look at a brand like Coca-Cola, they could stop advertising for a year and still keep stable baseline sales, because their brand awareness and mental availability are already established, while their shelf positions and physical availability are prominent.

From a consumer perspective, you become a loyal customer when you purchase organically without interaction with ads. Loyal customers are the foundation of baseline sales. If we would run a lift test on loyal customers, the incrementality should be close to zero, because they buy anyways.

In other words, long-term effect is about quantifying the ability of advertising to find and convert new customers to loyal customers. This is apparently extremely hard with common MMM approaches. There’re still a lot of work to be done to solve this problem.

Privacy concerns are also shaping the future of measurement. What role do privacy-enhancing technologies (PETs) play in this space?

Privacy enhancing technologies (PETs) are another important player to address sensitive industries and geographies with strong privacy policies. We’re seeing some TEE or MPC based solutions emerging that will enable very exciting use cases like measurement and optimization solutions from a data cleanroom where no user-level information is revealed to any parties. I’m expecting this type of technologies will increasingly emerge too.

What’s the most important takeaway for marketers trying to adapt to this new measurement landscape?

Embrace the diversity of measurement. There is no single perfect measurement method—every approach has trade-offs. The best strategy is to use multiple sources of truth, validate models with experiments, and be willing to adjust based on new data.

Robyn, for example, doesn’t give a single definitive answer. It provides a set of models that reflect different trade-offs, and it’s up to the advertiser to make the final decision based on their business context. Instead of chasing an illusion of precision, marketers should think probabilistically and embrace a mix of methodologies.

Any advice for brands trying to build their own MMM models?

Start simple, be skeptical and iterate. No one can specify a model perfectly from day one. Get the basics right, interrogate your data, and then refine the model over time. The goal isn’t to build the most complex model—it’s to build one that provides useful and actionable insights.