Marketing analysts!
Are you sure about deriving the right insights from your data?
The lesser-known statistical fallacy - Simpson’s Paradox - thinks otherwise!
Let me explain how ignoring this paradox isn’t healthy for your marketing analytics.
Consider this simple scenario -
Your firm has introduced a campaign to achieve high click-through rates (CTR). The marketing manager pulls out the first week’s report and segregates the data by gender.
As per this data, males have a 100% higher CTR than females. So, the team concludes that more budget needs to be allocated to males.
However, this could be a huge mistake because other variables aren’t being considered.
For instance, let’s segregate this data by age.
The male group still has an aggregate of 2% CTR (versus females: 1%). Yet, this chart points out that the marketing team should increase their spending on females in the cohort 18-25 years.
Thus, in this case, ‘age segregation’ is the confounding variable.
That’s Simpson’s Paradox for you and ignoring it can be a fatal mistake.
Lessons?
- You shouldn’t over-rely on broad audience segments. Work with granular, robust audiences or it will leave you susceptible to Simpson’s paradox.
- Work with your team to identify confounding variables (sample size, segment characteristics, etc.) - otherwise, such variables will muddle the insights drawn from the data.
As an analyst, you cannot afford to ignore this paradox. So, let’s understand a little more about this paradox and how it can impact marketing analytics.
What Is Simpson’s Paradox?
Simpson’s paradox is a statistical phenomenon where a relationship or trend within different data groups disappears or reverses when all the data is combined.
It happens when a confounding variable impacts the relationship between two variables, leading to contradictory outcomes.
Notice the below-shared visual depicting how the trend changes when all the data is grouped.
Here, UC Berkley’s suspected gender bias is one of the most famous examples that comes to my mind.
At the beginning of its academic year, UC Berkeley’s school admitted roughly 44% of male and 35% of female applicants. The school was eventually sued for gender discrimination. However, this wasn’t a case of discrimination.
The school hired a statistician, Peter Bickel, to analyze the data and they found a few surprising insights.
Bickle’s team found statistically significant gender bias favoring women for 4 of 6 departments. There was no significant gender bias in the remaining 2.
In general, women tend to apply to departments admitting fewer applicants.
Simply put - they applied to more competitive departments with low acceptance/admission rates. For instance - the English department.
On the other hand, men applied to less competitive departments with high acceptance/admission rates. For instance - The engineering department.
This hidden variable affected the marginal values for the percentage of accepted applicants, thereby reversing the trend existing in the data as a whole.
Thus, the conclusion flipped when they changed their data viewpoint to account for the school being divided into departments.
That’s how Simpson’s paradox can make decision-making tough.
How Does It Affect Data Analytics?
Let’s understand how Simpson’s paradox can affect your analytical endeavors while performing routine tasks.
- Deceptive Performance Metrics: Simpson’s paradox can distort performance metrics in SaaS marketing.
For instance, while evaluating customer engagement rates for different feature sets of your SaaS product, the overall engagement may seem higher for the basic features.
But digging deep into the data, you may find that premium features have higher customer engagement rates, leading to better retention. This reflects how Simpson’s paradox can result in inaccurate assumptions, thus impacting decision-making.
- Masked Customer Segments: Failing to account for customer segments can make you overlook crucial insights.
For instance, evaluating freemium-to-premium conversion rates without considering B2B customer characteristics like their industry or firm size can lead to inaccurate outcomes.
On the other hand, by segmenting the entire dataset, you may figure out that your SaaS product resonates better with medium-sized businesses, even if their conversion rate initially seems poor.
- Distorted Customer Behavior: Simpson’s paradox can contribute to misinterpreting customer behavior trends.
Suppose you notice that customers who engage less frequently with your SaaS product often have high churn rates. However, you might see the opposite picture when segmenting the data based on their activity levels.
- Poor Pricing Strategy: Simpson's paradox can mislead you into making an ineffective pricing strategy.
For instance, analyzing product pricing plans based on average revenue per user (ARPU) can highlight that a lower-priced scheme is more successful.
However, segmenting the data by customer usage pattern or type might reveal that a higher-priced plan appeals to enterprise clients, generating substantial revenue.
- Missed Personalization Opportunities: Neglecting customer segments can lead to missed personalization opportunities.
For instance, monitoring email opening rates across all customers may indicate that specific campaigns perform better.
However, segmenting the data by customer behavior or interests can unveil specific campaigns that resonate uniquely with various segments, enabling targeted personalization for improved engagement and conversions.
By acknowledging Simpson’s paradox, you can unearth hidden patterns, optimize your business strategies, and tailor them to specific customer segments.
This will further ensure your marketing team, sales reps, and other stakeholders are on the same page, driving data excellence.
The result? High conversions, customer satisfaction, and revenue growth.
Detecting Simpson’s Paradox
Here’s a systematic approach to detecting and addressing Simpson’s paradox.
- Analyze Individual Segments: Calculate key performance metrics for individual customer segments. This can involve comparing conversion rates, revenue, etc.
- Evaluate the Overall Performance: Combine the data from all customer segments and calculate the overall performance.
- Compare the Outcomes: Compare the metrics calculated for each customer segment with overall performance. Look for inconsistencies and glitches in the results.
- Scrutinize the Impact of Segment Size: Assess the sizes of the segments and consider their influence on the overall outcomes. Large data segments can have more impact on the overall result, potentially masking the actual relationship between variables.
Check confounding factors that could contribute to the paradox, such as varying sample sizes, biased sampling, or differences in segment characteristics.
Here’s another example for you!
Enough of the ‘business’ examples 😃
I hope you like baseball!
Because I’m going to share how Simpson’s Paradox is making it tough for us to determine who’s a better hitter - Derek Jeter or David Justice.
Check out their batting average for 1995 and 1996.
As per this chart, David Justice has a higher batting average in 1995 and 1996 individually (marked in RED).
However, Derek Jeter had a higher batting average over the two years combined (marked in BLUE).
Here is an explanation of this phenomenon -
- Both players had higher batting averages in 1996 than in 1995
- When we look at the at-bats -
- Jeter had more at-bats in 1996 (183/582)
- Justice had more in 1995 (104/411)
- Hence, Jeter shows a higher batting average in aggregate
So, who is a better hitter?
In this case, the aggregate was considered for decision-making, concluding that Jeter is a better hitter over two years.
But one thing that’s clear is - Simpson’s Paradox makes it tough to conclude data, especially when data is telling us opposing stories.
At times, one could feel that the disaggregated view is better than the aggregated view. In this post, we’ve seen examples of both.
How to Avoid Making Decisions Based on Faulty Analysis
Here’s how to avoid making decisions based on faulty analysis and ensure well-informed decision-making.
- Define Specific Goals: Outline the goals you want to achieve through the analysis before diving into the data. Understand the insights you need to derive and how they will support your goals.
- Leverage Multiple Data Sources: Gather data from multiple sources to gain a 360-degree view of the marketing landscape. Combine data like market research, customer surveys, and more to mitigate the risk of biased decision-making. Ensure the data is accurate, complete, and reliable.
- Segment Your Data Effectively: Segment the marketing and other business-related data to capture nuances and variations within your target audience. Evaluate data by relevant demographic, behavioral, or psychographic segments for identifying customer behavior and trends.
- Consider Sample Size: Be cautious of deriving conclusions from tiny sample sizes, as they may not give accurate outcomes.
- Verify and Validate the Outcomes: Leverage A/B testing or statistical methods to validate your outcomes. Use advanced analytics tools to ensure 100% accurate decision-making.
Role of Advanced Analytics Tools
Marketing analysts have long relied on traditional business intelligence tools (BI) as their “go-to tool” for data analysis.
However, when it comes to the deceptive nature of Simpson’s paradox, relying on BI tools can lead to disastrous business outcomes.
It can derail your decision-making, resulting in poor marketing strategies.
That's where Revlitix - the all-in-one advanced analytics platform can help!
Revlitix offers powerful features to visualize data effectively and efficiently for addressing this challenge head-on.
With Revlitix, you can gain granular insights into your marketing performance across various channels, campaigns, and customer segments.
Inbuilt with prescriptive and predictive analytics, the tool lets you dig beneath the surface and unearth inconsistencies or unusual customer trends that might indicate the presence of Simpson's paradox.
Furthermore, Revlitix offers “Alerts,” a real-time monitoring feature to inform you of sudden shifts or emerging market trends.
This can empower you to address potential paradoxical situations proactively.
What’s more?
Revlitix -
- Allows you to build customizable, relevant, user-friendly dashboards. These are designed and vetted by experts with > 15 years of experience.
By providing tailored visualizations, Revlitix enables you to unearth hidden market patterns and trends, ensuring that Simpson’s paradox doesn’t go unnoticed.
- Offers 100+ pre-designed, compelling drag-and-drop dashboards with a 30-second set-up - Not kidding! Leverage the dashboard to share accurate reports with all the stakeholders, leaving no room for the deception of Simpson’s paradox.
- Effortlessly integrates with key Martech platforms, thus streamlining business operations and enhancing data management.
This integration capability helps analyze data across different channels and campaigns, thus empowering you to detect and address Simpson’s paradox.
- Helps save efforts involved in coding. Unlike other solutions, you do not need to write codes. Revlitix is a total zero-code platform, not low-code!
This can simplify data analysis rather than getting caught up in coding complexities. So, delve deep into the data and identify anomalies associated with Simpson’s paradox.
Summing Up
Simpson’s paradox highlights the danger of oversimplifying complex truth. It shows us how data is a finite-dimensional representation of a larger and more complex domain. The art lies in looking beyond data and using/developing methods and tools to uncover hidden truths.
We are confident that this guide has given you an in-depth understanding of the complex data relationships and patterns and how to not be misled by inconsistencies.
Use the information shared in this post to make informed decisions by taking the most appropriate data viewpoints into account. Further, embrace advanced analytics tools like Revlitix to better navigate the complexities of Simpson’s paradox.