Pros and Cons of BigQuery GA4 Export
Find out if building reporting based on BigQuery Google Analytics data is for you by understanding its advantages and drawbacks
The goal of this article is to summarize the pros and cons of using BigQuery Google Analytics 4 (GA4 ) data based on my own experience developing a data warehouse for a large organization.
I hope it will help you decide on the best path to build your own Google Analytics reporting.
Benefits of Using BigQuery GA4 data
The benefits of using BigQuery GA4 data are impressive. Nothing rivals the power and flexibility of using raw data to build your own custom solution.
1. Access Raw Event Data
BigQuery allows you to access the raw Google Analytics data, exported by Google into your own BigQuery database. You can send virtually any data from the data layer or a UTM parameter to GA4, and then access that specific record in BigQuery.
This is an important factor if you have registered users and would like to access their User ID to join it with other user data for reporting. In that case, BigQuery data export is the way to go.
2. Integrate with Other Data
To combine your GA4 data with additional sources, such as eCommerce sales or CRM email, the best practice is to load and process all of these data in a single data warehouse.
BigQuery integrates well with other services, either directly or using connectors.
3. Avoid GA4 Quotas in Looker Studio
It’s easy to create your GA4 dashboards in Looker Studio, but organizations with multiple users quickly find out about hourly and daily API request limits. This is a common reason to consider switching to BigQuery for data processing.
Creating a cloud data warehouse in BigQuery solves the problem. Not only does the data warehouse offer unrestricted access to your data, but it is also a cheaper alternative to investing in a paid Google Analytics tier (GA360).
4. Fully Custom Reporting
Using GA4 data in BigQuery allows for fully customized reports. Analysts have the flexibility to define their own metrics and dimensions, and if they don’t work out, later modify them, recalculating historical data.
Examples include tracking users across multiple hostnames, analyzing purchase behavior within 180 days of ad response, or creating custom attribution channels.
This customization enhances the depth and relevance of insights from user behavior and marketing effectiveness.
5. Keep Your Old Data Longer
According to the current GA4 data retention policy, Google is only saving 14 months worth of data.
One of the most effective ways to expand this window is to export the data into BigQuery and transform it into your own data warehouse. Storage is cheap, and you can save millions of events spanning a long period of time for a few dollars a month.
6. More Precise Reporting
Using GA4 export to BigQuery offers a significant advantage by avoiding the cardinality and sampling limitations inherent in GA4 reports.
This export ensures that all data is transferred to BigQuery, providing access to unsampled, complete datasets. Consequently, users benefit from more precise and reliable data analysis, enabling deeper insights without the constraints of data truncation or approximation.
7. Advanced Analytics
The crown jewel of using GA4 data in BigQuery is advanced analytics.
The raw event data is suitable for building out analytical capabilities such as longitudinal analysis, which examines user behavior over extended periods. Measuring true incremental attribution and or creating propensity models allows for precise measurement of marketing effectiveness and prediction of user actions.
By leveraging BigQuery, businesses can exploit these advanced methods to drive strategic decisions based on more sophisticated approaches.
Challenges of Using BigQuery GA4 data
While the advantages of using BigQuery are plentiful, it does not come without challenges. To make sure you don’t run into deal breakers, I identified the most consequential drawbacks.
1. History Starts at First Export
The BigQuery GA4 export doesn’t backfill data prior to its setup. This means you cannot access data from before the export was turned on, creating a gap in historical analysis.
This can be problematic for businesses needing in-depth trend analysis or data reviews from the past.
To address this, it’s important to start the BigQuery export as early as possible when using GA4, to ensure a full dataset for analysis in the future.
If historical data is crucial, consider backfilling GA4 BigQuery data with GA4 API export. However, keep in mind that this alternative may not offer the same comprehensive analysis capabilities as a complete dataset in BigQuery.
2. Requires Data Engineering and SQL Skills
Organizations investing in BigQuery reporting must allocate resources for writing SQL queries and for building and maintaining an automated, optimized data warehouse.
The process begins with the GA4 schema, which is complex. Advanced data skills are necessary to unnest, validate, and clean the data.
Unnesting important array data such as event_params, is the first challenge that analysts working with this date face. Then comes the lack of standard metrics, as many dimensions, like channel grouping, have to be constructed from scratch.
3. Data Would Not Match Other GA4 Sources
A crucial part of validating GA4 data in BigQuery involves comparing it with GA4 data from Looker Studio (API) or the UI. However, because Google does not export complete session attribution information to BigQuery, session channels in BigQuery might not align with those in other sources.
From my extensive experience, the count of sessions and users in BigQuery often differs from that in Looker Studio reports.
You can read more about this phenomenon and Google explanation of it here.
4. Null Values for Private Users
Organizations that activate Consent Mode in GA4 are up against another challenge. Private records in GA4, appearing with NULL session and user IDs, require careful handling.
When developing a BigQuery data warehouse, analysts and data engineers have to make a decision about whether and how to include these records, as they can impact session and user counts. It’s critical to avoid aggregating these events into a single ‘NULL session’ in summaries, which could skew the results.
5. Data Quirks
Naturally, the raw GA4 data is not without quirks, with the biggest one being sessions without pageviews (8-15% of all session starts). Depending on the timeframe for your data, you also may or may not have session attribution information for these records.
You have to carefully consider the implications of including or excluding these records from your summaries and reporting. There are benefits and drawbacks to both.
6. Your Data Won't Add Up
Some metric breakdowns are not additive. For example, the number of sessions by page needs to be deduped, and the number of users by time period does not equal the sum of users in each period. This increases computational complexity for the data warehouse to handle these metrics properly.
7. Need to Keep Up-to-date
Google is still changing the GA4 export data schema.
In June 2023, they added collected_traffic_source to the export, and in November 2023 Google stated populating medium, source, and campaign in the session_start event_params variable.
If you build your reporting on BigQuery GA4 data as it is now, you may have to update the queries in the future as the data schema changes.
Conclusions
Deciding to use BigQuery for Google Analytics 4 data requires careful consideration of its pros and cons. Organizations need to assess these factors based on their specific data needs and capabilities.
It is crucial to recognize that using BigQuery for GA4 reporting involves not just strategic gains but also demands dedicated technical and analytical skills.
This is why it is important to consider the cost of BigQuery itself as well as warehouse development and maintenance. These financial aspects are addressed separately in this article, which provides a detailed explanation of BigQuery expenses.
Ultimately, moving to BigQuery for GA4 data can take your company to the next level, as long as it aligns with your broader organizational goals and readiness for technical challenges.
by Tanya Zyabkina
Tanya Zyabkina has over 15 years of experience leading analytics functions for multiple Fortune 500 companies in the retail, telecom, and higher education. She works as the Director of Marketing Performance Analytics for the The Ohio State University. Go Bucks!