Published on
Zach Jackson
Last updated on

GA4 isn’t so much a facelifted version of Universal Analytics as it is a complete anatomical reassembly. While this shift was necessary to futureproof Google’s analytics services in the face of evolving privacy regulation, it has been nonetheless jarring for UA users tasked with learning the ins and outs of a tool so dissimilar to one relied on for so long.

To make matters worse, analysts realised that valuable historical and new data was vanishing from their GA4 reports before their very eyes — a veritable ‘now you see it… now you don’t’ scenario. But just like the subject of a classic disappearing act, our data doesn’t actually dematerialise into non-existence; oftentimes, it’s merely tucked in Google’s sleeve after some sleight of digital hand.

In today’s article, we’ll be explaining the logic behind your missing data in GA4, how you might be able to retrieve it, and how to prevent it from slipping through your fingers, moving forward.

GA4 set-up errors

If you’re missing data, and you’re not sure why, it’s important to check GA4 has been set up correctly before pointing fingers at its various idiosyncrasies.

Data layers

Human error in data layers that facilitate the transference of data from a webpage to a tool like GA4 is common. Typically, a programmer will handle data layer composition, but miscommunication between the site owner and programmer may still lead to gaps in GA4 reports.

Google Tag Manager

Google Tag Manager is another portal through which data has to travel to reach GA4. Its job is to grab relevant data from the data layer and pass it on in a form that’s beneficial to you in reports, but if it hasn’t been set up correctly, it may not pull the data you need.

Be sure to double-check that all the events you want to track have been specified in Google Tag Manager.

Incorrect tag & event names

Universal Analytics afforded us a little more freedom in terms of creating and labelling custom metrics, but for GA4, Google implemented a list of naming conventions for a variety of metrics (mostly e-commerce metrics).

Not following these conventions won’t stop the transference of data based on your custom metric to GA4, but you won’t see it in your reports. To resolve this, amend tag names in Google Tag Manager, and don’t forget that Google’s GA4 naming conventions are case-sensitive — one slip and your data won’t pass through to your reports.

Why your report data is present but… off

One of the most prevalent issues reported about GA4, aside from data going AWOL, is numbers in reports that don’t seem to correlate with raw data or what people expect to appear.

There are a couple of reasons for this.

Behavioural modelling

Behavioural modelling isn’t a cause of missing data, but a response. Its purpose? To fill some of the blanks in your reports, using machine learning.

This advanced branch of artificial intelligence is employed to process ‘observable data’ — data gathered on full consenting users that Google believes to be similar to users who do not give tracking consent.

Through analysing observable data, it can provide predictions about the actions of non-consenting users and present them as data in the report. 

GA4’s behavioural modelling facilities are highly advanced, capable of delivering surprisingly accurate data, but there will always be an element of doubt to contend with when Google makes these educated assumptions.

Data blending

Data blending in GA4 comes into play when the analytics platform detects discrepancies or data gaps that could arise from factors such as thresholding or incomplete data sampling (more on these later). The system intelligently combines and augments data from different sources to provide a more comprehensive view of user behaviour and website performance.

However, it's important to note that while data blending offers significant benefits, there are reasons why some users may have reservations about it. Some common concerns include:

Data accuracy: Users who prefer raw, unaltered data may be sceptical about data blending, as it involves merging data from multiple sources, potentially impacting the accuracy of the insights generated.

Transparency: Data blending, while enhancing data completeness, can sometimes obscure the origin of data or make it less transparent. Users who prioritise clear and straightforward data reporting may find this challenging.

Loss of granularity: Data blending often involves aggregating data. This aggregation can result in the loss of granularity in the data, which may not be ideal for users who require highly detailed insights.

Why your data is missing in GA4

Let’s put the mystery of your disappearing GA4 data to bed!

Data retention

Google Analytics 4 significantly differs from Universal Analytics when it comes to data retention. This policy shift illustrates Google’s efforts to prioritise user privacy and data protection in their analytics services moving forward. Data can’t be misused or stolen if it doesn’t exist, so as far as Google’s concerned — the quicker it goes, the better.

Under these revised policies, user-level and conversion data is retained for 2 months by default, or a maximum of 14 months if you choose so in GA4’s settings. The same is true for event data if you’re a standard user, but 360 users can choose to extend retention to 26, 38, or sometimes even 50 months. By contrast, Universal Analytics had permitted indefinite data retention.

If you were unaware of the shortened retention policy when you migrated to GA4, you were at significant risk of losing vast amounts of valuable data. Sadly, GA4 retention periods are veritable event horizons, so if this was the case for you, that data really is gone.

We’d simply advise that you increase the retention period to 14 months as soon as possible. Here’s how:

  • Admin / Data Settings / Data Retention / 14 Months / Save

Thresholding

Thresholding in Google Analytics 4 refers to the process of data aggregation and reporting limitations when dealing with small or low-traffic websites or apps.

This process of withholding data ensures that individual users cannot be identified in the reports and that the data presented is statistically meaningful. But, needless to say, data suddenly up and disappearing has raised some concerns.

Minimum thresholds in GA4

The criteria for applying thresholding are generally proprietary information. In Google Analytics 4, each metric may have its own unique minimum threshold, dependent on various factors, including the size of your account, the specific metric, the level of granularity in your analysis (e.g., daily, weekly, or monthly reports), and the volume of data being processed.

It’s also possible that Google Analytics dynamically determines these thresholds to ensure efficient processing and reporting of data whilst protecting user privacy. However, the general thresholding range for common metrics appears to extend from the mid-30s to around 50.

Thresholding solutions: retrieve your missing data!

To tackle the challenges posed by data thresholding in Google Analytics 4 (GA4), there are several measures you can implement:

  • Adjust the date range: One of the most straightforward methods is to broaden the date range of your analysis. By extending your time frame, you'll push data sets above the aggregation threshold.
  • Export to BigQuery: Unlike GA4, BigQuery isn’t affected by thresholding, so by exporting your data manually to BigQuery (it won’t happen automatically), you’ll likely find your missing data sets.
  • Use aggregated data: Rather than relying solely on raw, detailed data, consider working with aggregated data. Aggregation helps in reducing noise and irregularities, making it easier to spot significant trends and patterns.
  • Use device-based reporting: Reliant on client ID or app instance ID, device-based reporting does not trigger thresholding.
  • Disable Google Signals: To improve data quality, eliminate as much personally identifiable information (PII) from your reports as possible. This can be achieved by disabling Google Signals, thus cutting sensitive demographic data from the equation.
  • Change metric to 'Total Users’: Instead of 'New Users,' consider using the 'Total Users' metric. This modification offers a broader view of user engagement, bolstering event numbers beyond the thresholds applied in GA4.
  • Optimise data visualisation: Transform your reports into line charts and maximise the number of lines per dimension. Then, when analysing specific points within the report, you’ll see the small numbers that, otherwise, would be removed by thresholding.
  • Leverage custom reports: Create custom reports that include multiple dimensions or metrics to work with data at a higher level of granularity.
  • Use segmentation: Apply segments to your data to focus on specific subsets of your audience, which may have larger data volumes.

First-party cookies & permissions

GA4 has been touted as a "cookieless" analytics solution due to its move away from third-party cookies for data collection. Yet, GA4 isn't entirely cookieless, as it relies heavily on first-party cookies to gather data, and even though first-party cookies are less invasive tracking devices, certain dimensions and metrics may still require user consent for full data accuracy:

  • Client ID: GA4 uses a client ID to identify users across different devices and sessions. In situations where cookie consent is not provided, this client ID may be restricted, potentially affecting user attribution and tracking across sessions.
  • User-ID: If you utilise the User-ID feature in GA4 to track users, collecting this data may necessitate user consent in regions with stringent data privacy regulations.
  • Remarketing and advertising features: Some dimensions and metrics related to remarketing and advertising require user consent. This includes information on advertising campaigns and user interactions with ads.
  • Demographics and interests: Information about user demographics and interests may be gathered through cookies or other tracking methods. In regions with strict data privacy rules, gathering this data might require user consent.
  • E-commerce transactions: E-commerce data, including information about purchases, products viewed, and shopping cart behaviour could involve the use of cookies for accurate tracking. Consent might be needed in certain cases.
  • Custom dimensions and metrics: Any custom dimensions and metrics created to capture specific user behaviours or interactions may also require consent if they are reliant on cookie data.

It's important to note that the need for cookie consent largely depends on the privacy regulations in your region and your specific data collection practices. Ensuring compliance with privacy laws and obtaining necessary consents while using GA4 is crucial for maintaining data accuracy while respecting user privacy rights.

Metrics least affected by cookie consent

The good news is that the metrics listed below are often largely unaffected by cookie consent. However, this ultimately still depends on a number of factors, such as cookie consent mode setting and other privacy restrictions.

  • Page views: The total number of page views on your website or app is counted independently of cookie permissions. GA4 tracks page views based on user interactions with your content.
  • Sessions: Sessions represent periods of user engagement with your website or app. They are not tied to individual users' consent settings and are counted based on user interactions within a defined time frame.
  • Bounce rate: The bounce rate, which measures the percentage of single-page visits, is determined by user engagement on your website or app and does not rely on cookies.
  • Session duration: The duration of user sessions is calculated based on user interactions and time spent on your site, irrespective of cookie permissions.
  • Event tracking: Events, such as button clicks, form submissions, or video views, are tracked independently of cookies. These metrics are based on user interactions and are not influenced by cookie consent.
  • Traffic sources: Information about where your traffic is coming from, such as direct, organic search, or referral sources, is determined by the source of the incoming traffic, not by cookies.
  • User location: GA4 can determine the geographic location of users based on their IP addresses, which does not require cookie consent for tracking.
  • Device and browser data: Metrics related to the devices and browsers used by visitors, such as device type, operating system, and browser version, are collected based on user agent information, which is not reliant on cookies.
  • User engagement: Metrics related to user engagement, including scroll tracking and time on page, are determined by user interactions on the site and are not tied to cookie permissions.
  • User flow and behaviour: Information about user flow through your site, such as the pages users visit or the sequence of interactions, is tracked based on user behaviour and is not directly influenced by cookie consent.

In most cases of missing data due to matters of user privacy, GA4 will use machine learning and behavioural modelling to fill in the gaps based on similar user and activity data. The data generated can be incredibly helpful, but it’s important to note that an abstraction has taken place.

Dimension / metric incompatibility

To ensure optimal reporting performance under the new data model, Google has introduced a plethora of incompatibilities between GA4 dimensions and metrics. Pairing these irreconcilable fields in a query will result in one of two things:

  • The incompatible dimensions or metrics are greyed out, meaning you will not be able to click on them.
  • You’re met with an empty report/exploration labelled ‘No data available’, ‘No data for this combination of segments, values, filters, and date range’, or ‘Incompatible request’. 

The two primary reasons behind incompatibility in GA4 are:

  • Paired data is stored separately
  • Data visualisation or analysis technique is restrictive

Let’s examine these factors individually!

Data stored separately

Google has recategorised many dimensions and metrics on a fundamental level, which means they are now too remote from one another in purpose to effectively synergise and deliver valuable insights.

The three impacted dimension umbrellas are:

  • Item-scoped
  • Attribution
  • Query string

See how this reshuffle has impacted the compatibility of individual dimensions and metrics in the tables below.

Side Note: Google has renamed some of the metrics in GA4. We use the new identifiers in the following compatibility tables but have included the original UA identifier where applicable.

Attribution dimensions & event-scoped metrics

The attribution dimensions in the first table are incompatible with the event-scoped metrics listed in the second table:

Dimension ID

Dimension Display Name

campaignId

Event campaign ID

campaignName

Event campaign name

defaultChannelGrouping

Conversion default channel grouping

googleAdsAccountName

Event Google Ads account name

googleAdsAdGroupID

Google Ads ad group ID

googleAdsAdGroupName

Google Ads ad group name 

googleAdsAdNetworkType

Event ad network type

googleAdsCampaignID

Google Ads campaign ID

googleAdsCampaignName

Google Ads campaign name

googleAdsCampaignType

Google Ads campaign type

googleAdsCreativeId

Google Ads creative ID

googleAdsCustomerId

Google Ads customer ID

googleAdsKeyword

Google Ads keyword text

googleAdsQuery

Google Ads query

medium

Event medium

source

Event source

sourcePlatform

Source platform

 

Metric ID

Metric Display Name

adUnitExposure

Ad unit exposure

addToCarts

Add-to-carts

averagePurchaseRevenue

Average purchase revenue

checkouts

Checkouts

ecommercePurchases

Purchases

eventCount

Event count

eventsPerUser

Events per user

eventsPerSession

Events per session

firstTimePurchasersPerNewUser

First-time purchasers per new user

itemListClickEvents [corresponds with UA itemListClicks metric]

Item list click Events

itemListViewEvents [corresponds with UA itemListViews metric]

Item list view events

itemViewEvents [corresponds with UA itemViews metric]

Item view events

newUsers

New users

promotionClicks [corresponds with UA itemPromotionClicks metric]

Promotion clicks

promotionViews [corresponds with UA itemPromotionViews metric]

Promotion views

publisherAdClicks 

Publisher ad clicks

publisherAdImpressions

Publisher ad impressions

screenPageViews

Views

screenPageViewsPerSession

Views per session

shippingAmount

Shipping amount

taxAmount

Tax amount

transactions

Transactions

transactionsPerPurchaser

Transactions per purchaser

userEngagementDuration

Total user engagement duration (sec)

Query string dimensions

The query string dimensions in the first table below are only compatible with the metrics in the proceeding table:

Dimension ID

Dimension Display Name

pagePath

Page path

unifiedPageScreen

Page path, query string, and screen class

pageLocation

Page location

fullPageUrl

Full page URL

landingPage

Landing page

 

Metric ID

Metric Display Name

activeUsers

Active users

conversions

Conversions

country

Country

countryIsoCode

Country code

date

Date

engagedSessions

Engaged sessions

eventCount

Event count

eventName

Event name

fullPageUrl

Full page URL

hostName

Host name

pageLocation

Page location

pagePath

Page path

pageTitle

Page title

platform

Platform

screenPageViews

Views

sessions

Sessions

totalUsers

Total users

unifiedPageScreen

Page path, query string, and screen class

userEngagementDuration

Total user engagement duration (sec)

Item-scoped dimensions & event-scoped metrics

The item-scoped dimensions in the table below are incompatible with the event-scoped metrics in the table that follows:

Dimension ID

Dimension Display Name

itemAffiliation

Item affiliation

itemBrand

Item brand

itemCategory

Item category

itemCategory2

Item category 2

itemCategory3

Item category 3

itemCategory4

Item category 4

itemCategory5

Item category 5

itemID

Item ID

itemListID

Item list ID

itemListName

Item list name

itemName

Item name

itemPromotionCreativeName

Item promotion creative name

itemPromotionID

Item promotion ID

itemPromotionName

Item promotion name

itemVariant

Item variant

 

Metric ID

Metric Display Name

adUnitExposure

Ad unit exposure

addToCarts

Add-to-carts

averagePurchaseRevenue

Average purchase revenue

checkouts

Checkouts

conversions

Conversions

ecommercePurchases

Purchases

eventCount

Event count

eventsPerUser

Events per user

eventValue

Event value

eventsPerSession

Events per session

firstTimePurchasersPerNewUser

First-time purchasers per new user

itemListClickEvents [corresponds with UA itemListClicks metric]

Item list click events

itemListViewEvents [corresponds with UA itemListViews metric]

Item list view events

itemsPurchased [corresponds with UA itemPurchaseQuantity metric]

Items purchased

itemRevenue

Item revenue

itemViewEvents [corresponds with UA itemViews metric]

Item view events

newUsers

New users

promotionClicks [corresponds with UA itemPromotionClicks metric]

Promotion clicks

promotionViews [corresponds with UA itemPromotionViews metric]

Promotion views

publisherAdClicks

Publisher ad clicks

publisherAdImpressions

Publisher ad impressions

purchaseRevenue

Purchase revenue

screenPageViews

Views

screenPageViewsPerSession

Views per session

shippingAmount

Shipping amount

taxAmount

Tax amount

totalAdRevenue

Ad revenue

totalRevenue

Revenue

transactions

Transactions

transactionsPerPurchaser

Transactions per purchaser

userEngagementDuration

Total user engagement duration (sec)

Restrictive visualisation or analysis technique

Much like GA4’s AI features, Explorations help you to extract insights from your reports quickly. They equip you with highly refined data visualisation and analysis techniques, but each technique applies a specific and often quite restrictive data model, meaning there’s a price to pay when using such an advanced analysis tool.

To fully understand why this is the case, let’s review how Exploration reports differ from traditional reports.

Traditional reports vs. Explorations

Standard reporting in Google Analytics 4 represents the traditional approach to analysing your website or app's data. It consists of predefined, structured reports that offer insights into standard metrics and dimensions, providing a broad overview of your data. 

These reports are typically easy to access and require minimal customisation. They include familiar sections like "Overview" and "Events," offering an instant snapshot of key performance indicators, such as user sessions, page views, and events tracking.

However, standard reporting has its limitations. It may not always provide the depth of insight needed to answer nuanced questions about user behaviour — and it may not be as flexible in terms of data exploration.

Explorations, on the other hand, represent a more advanced way of delving into your data in GA4. They allows users to create custom reports and explore data sets in a manner that suits their unique analytical needs. 

In other words — Exploration reporting allows you to ask and answer specific questions about your data by choosing the metrics and dimensions that matter most to you. But herein lies the crux of the matter.

Such flexibility can sometimes result in combinations that are not compatible or meaningful. Furthermore, custom data sources linked to your Analytics account may introduce data elements that don't align perfectly with the standard GA4 data structure, potentially leading to compatibility issues.

Recommended dimension / metric pairings

If you’re having trouble mastering the new compatibility rules in GA4, the following dimensions / metric combinations are all synergistic and are the least likely to trigger thresholding:

  • Page Path (dimension) and Page Views (metric)
  • Traffic Source (dimension) and Sessions (metric)
  • Event Name (dimension) and Event Count (metric)
  • Device Category (dimension) and Bounce Rate (metric)
  • User Location (dimension) and Sessions (metric)
  • User Engagement (dimension) and Session Duration (metric)
  • Traffic Source (dimension) and Goal Completions (metric)
  • Device Category (dimension) and User Engagement (metric)
  • Event Name (dimension) and Event Value (metric)
  • User Location (dimension) and User Language (metric)

Data Sampling

When you have a substantial amount of data to analyse, Google Analytics may employ sampling, which means it works with a subset of the data rather than the entire dataset. This can be especially helpful for preserving system performance and response times, but one or more of the following may occur:

  • Incomplete data representation: Sampling means that only a portion of your data is considered for analysis. As a result, some data points may be left out.
  • Variability in results: Sampling introduces variability into your results. When you run the same report multiple times, you might get slightly different results due to the random nature of the sampling process.
  • Distorted metrics: When you're dealing with a sample of data, calculated metrics or ratios might appear differently compared to the unsampled data.
  • Impact on long-tail data: Data sampling can disproportionately affect long-tail data, which includes less common events or dimensions. These elements may be underrepresented in the sampled data, making it appear as if they're missing or less significant than they actually are.

(not set)

In Google Analytics 4, the '(not set)' dimension is a placeholder value that appears in your reports to indicate the absence of information in a specific dimension.

The problem is, in a system as complex as GA4, finding the root of this information gap can be tricky.

Sometimes, it’s simply the result of a general error, such as forgetting to link your Google Ads account property, not realising that auto-tagging is disabled, or making mistakes when assembling UTM parameters of manually tagged destination URLs. 

However, there may just as well be a more specific causality:

  • Page Title dimension: If you see (not set) in the Page Title dimension, it means there are pages on your website that do not have a defined page title in your tracking data.
  • Event Label dimension: (not set) appearing in the Event Label dimension means that no specific label was assigned when an event was triggered.
  • Session Source / Medium dimension: If the automatically collected event session_start is absent, a report will show (not set) for the Session Source/Medium dimension.
  • Landing Page dimension: Without a page_view event, (not set) will show up for a Landing Page dimension.
  • Content Group dimension: Content Group dimensions are not compatible with automatically collected events, i.e., session_start or first_visit. When paired, (not set) will appear to illustrate the conflict.
  • Custom-defined User ID dimension: Google recommends never setting User ID or other high-cardinality dimensions - which is to say, dimensions with lots of unique values - as custom dimensions. The data science behind this advice is quite complex, but it can be boiled down to… more values mean a bigger job for GA4, in terms of both data gathering and the rows required to house it. The result will most likely be that your less-frequent values will be moved into an ‘(other)' row, but (not set) is also a possibility.
  • Custom parameters: It can take GA4 up to 24 hours to fully metabolise a newly registered custom event. Until this period has elapsed, custom parameters may display as (not set).

Final Thoughts

From data layer errors and tag misconfigurations to the enigma of thresholding and the complexities of data blending — we've explored the many causes of data gaps in GA4.

Missing data can be confusing, but, as discussed here today, there are strategies and best practices to recoup missing data and to stop it from disappearing moving forward.

If you find yourself grappling with GA4 and require expert guidance to master its quirks and optimise your digital marketing efforts, contact us today.