Understanding GA4 BigQuery Medium, Source, and Campaign Data
Whether you want to build custom multi-touch attribution models or recreate the Default Channel Grouping using Google Analytics 4 (GA4) data in BigQuery, first you need to know where to find the data.
However, working with GA4 Medium, Source, and Campaign parameters in BigQuery can be challenging.
This article will guide you on how to extract these crucial attribution metrics from your GA4 BigQuery export.
Medium, Source, and Campaign are the three most common traffic source tags in Google Analytics. They are a subset of campaign tags that Google automatically extracts from the URL utm parameters and adds to events. You can read more about URL tagging and get the full list of parameters here.
User vs Session Attribution in GA4
GA4 export to BigQuery provides information on user-level, session-level, and event-level attribution data:
- User-level data: Reflects how the user first discovered your website or app.
- Session-level data: Indicates how individual sessions were initiated.
- Event-level data: Shows the source of individual page views. This data point is relevant when two or more pages in the same session have different sources.
Event-level data can provide interesting analysis of sessions with multiple sources, typically paid and organic sources in the same session. However, event-level data is harder to work with and usually needs to be summarized at a session level.
User-level data is simply the session data for the first session the user had on your website, sometimes called first-click attribution. Session-level data is equivalent to last-click attribution. Read about Google’s scope definition here.
This distinction is crucial for understanding which data you are working with, so you can correctly identify both user acquisition and immediate traffic drivers.
By analyzing the three types of attribution data, you can fully assess your marketing effectiveness.
Attribution Data in Events Table
The availability and location of session traffic source data in BigQuery have evolved over time. The table below summarizes the fields where you can find the data and when the data first appeared in the BigQuery export.
Event_params: The event_params field in page_view events contains valuable event-level traffic source information. To use, you need to extract this data using UNNEST() function. This data has been available since the beginning of the GA4 export in BigQuery (as far as I know).
Collected_traffic_source: Starting June 2023, Google introduced the collected_traffic_source field, which has the same event-scoped data as the event_param field in the page_view event, but it does not require unnesting. Starting November 2023, the origin of the first pageview of the session was added to the session_start event, thus creating the first easily accessed session-scoped attribution source.
Session_traffic_source_last_click.manual_campaign: Starting July 16, 2024, Google added the source for the first event in the session, which is the true session-scoped attribution source.
Dates | Session/Events Attribution Data | User Attribution Data |
---|---|---|
No date restriction |
event_name = 'page_view' event_params.value.string_value for event_params.key = 'medium' event_params.key = 'source' event_params.key = 'campaign' |
traffic_source.medium traffic_source.source traffic_source.name (campaign name) |
Starting June 4, 2023 |
event_name = 'page_view' collected_traffic_source.manual_medium collected_traffic_source.manual_source collected_traffic_source.manual_campaign_name |
|
Starting November 1, 2023 |
event_name = 'session_start' collected_traffic_source.manual_medium collected_traffic_source.manual_source collected_traffic_source.manual_campaign_name |
|
Starting July 16, 2024 |
event_name = 'session_start' session_traffic_source_last_click.manual_campaign.medium session_traffic_source_last_click.manual_campaign.source session_traffic_source_last_click.manual_campaign.campaign_name |
These many fields may make your head spin, so here is a helpful link to understand GA4 data: Google Analytics BigQuery export schema.
Below is an example of how you can extract the user and session attribution information from session_start
events after November 1, 2023:
Applying Event-Scoped Attribution Data to All Events in the Session
One challenge with GA4 BigQuery data is that traffic source information is typically only available in the session_start
event or the first page_view
. Other events in the session will have no attribution source, thus for correct reporting, you need to add the session-scoped attribution information to all events. To achieve this:
- Summarize the session source information using window functions.
- Apply the summarized data to all events within the session.
User Traffic Source Data
User-level attribution information is available in the traffic_source
variable, which includes:
- Medium
- Source
- Campaign name (referred to as “name” in the data)
This data is valuable for understanding how users initially discovered your site, regardless of their current session’s source. This data is available for all user events, and it needs no additional processing.
Validation Against Google Analytics 4 UI/API, Including Looker Studio
When working with GA4 BigQuery data, it’s essential to validate your findings against the GA4 UI or API:
- User-level traffic data generally matches between BigQuery and the UI.
- Session-level validation is more challenging due to limitations in the BigQuery export.
Challenges in session-level validation:
- Google is not sharing all of the attribution information in the raw data export, thus making Direct channel appear larger in BigQuery data compared to GA4 UI or Looker Studio.
- Up to 15% of sessions may not have page views, making attribution difficult before November 1, 2023.
- Privacy consent mode can result in null user and session IDs, complicating attribution.
Best Practices for Working with GA4 BigQuery Medium, Source, and Campaign Parameters
To effectively use GA4 BigQuery Medium, Source, and Campaign parameters:
- Regularly check GA4 BigQuery export schema and update your code to accommodate changes.
- Use a combination of user-level and session-level data for comprehensive analysis.
- Validate and track data quality to identify changes in the data.
- Create a Channel Grouping user-defined function to aggregate and simplify attribution information.
- Consider creating summary tables to improve query performance and simplify analysis
Understanding and effectively using GA4 BigQuery Medium, Source, and Campaign data is crucial for accurate marketing attribution and analysis. By mastering the techniques outlined in this article, you’ll be better equipped to extract valuable insights from your GA4 data, create custom Marketing Attribution Models, and inform decisions resulting in better marketing strategies.
by Tanya Zyabkina
Tanya Zyabkina has over 15 years of experience leading analytics functions for multiple Fortune 500 companies in the retail, telecom, and higher education. She works as the Director of Marketing Performance Analytics for the The Ohio State University. Go Bucks!