Two ways to UNNEST GA4 data in BigQuery
Google Analytics 4 (GA4) BigQuery export data is optimized for storage, not usage, and thus can be confusing and difficult to analyze. Array fields are particularly challenging.
A simple explanation of the array data type is they are small tables inside a field value. Like tables, they have the same fields and contain multiple records per array.
As of June 2024, there were three array data type fields in the standard GA4 export:
Field Name
Data Type
event_params
ARRAY
<STRUCT
<key STRING,
value STRUCT
<string_value STRING,
int_value INT64,
float_value FLOAT64,
double_value FLOAT64>>>
user_properties
ARRAY
<STRUCT
<key STRING,
value STRUCT
<string_value STRING,
int_value INT64,
float_value FLOAT64,
double_value FLOAT64,
set_timestamp_micros INT64>>>
items
ARRAY
<STRUCT
<item_id STRING,
item_name STRING,
item_brand STRING,
…,
item_params STRUCT
<key STRING,
value STRUCT
<string_value STRING,
int_value INT64,
float_value FLOAT64,
double_value FLOAT64>>>>
event_params
field appears in the BigQuery table preview. You can see multiple rows per event, with KEY being the event parameter name and the value stored in string_value
and int_value
fields. UNNEST GA4 Data Before Using
Most traffic analysis needs an understanding of event parameters (event_params), which have to be unnested before usage. The event_params
array has two main elements – key and value, and the value is comprised of four subtypes: string, integer, float, and double.
Event parameter keys are different depending on the event, and examples include ga_session_id
, ga_session_number
, page_location
, page_title
. The values for the keys are stored in the value STRUCT based on type, for example ga_session_id
stores the value in value.int_value
.
You can’t access the values in the BigQuery array field directly, therefore, you have to unnest them before using them.
Two Ways to UNNEST GA4 event_params in BigQuery
There are two main ways to unnest an array in BigQuery, which I call a cartesian and an inline unnest.
This is a basic summary of the two ways:
Let’s review the two ways to unnest BigQuery data in more detail.
Cartesian UNNEST
Cartesian unnest is the equivalent of joining the event data with the array “tables” in the fields. Since it is the equivalent of a cartesian join, it produces multiple rows for every event.
This is an example of how you can unnest the <code>event_params field</code>, and it will give you a good idea of what the cartesian unnest looks like.
Cartesian UNNEST GA4 Examples and Use Cases
Cartesian unnest produces multiple rows for each event, so it should not be used where having a single row per event is expected. However, if you want to explore the nested parameters themselves, it is a great way to get the full list.
The code below gives you an inventory of all nested event_params in GA4 events data. I find this code useful for understanding which parameters need to be unnested for analysis.
Limitations of Cartesian UNNEST
Because cartesian unnest produces multiple rows per event, the output often needs to be filtered for the key before it can be used. Some keys, like ga_session_id
or page_location
are present in every event, others may not be.
Thus, when you filter on a less common key, it is important to remember that you may be undercounting the total number of events in the query. The same may happen when a parameter is missing by accident.
Alternatively, if you have duplicate keys in an array, you will duplicate the main record event after filtering for the key. I have yet to see this scenario in GA4 BigQuery data, but duplicate keys may be present in other data. However, cartesian unnest is the only way to unnest arrays with duplicate keys.
Inline UNNEST
Inline unnest pivots the array by the keys and inserts the values to the same record. Therefore, the inline unnest does not duplicate records.
Inline UNNEST GA4 Examples and Use Cases
Inline unnesting transforms array elements into columns, allowing you to access specific event parameters without duplicating records. The inline unnest provides an efficient method when analyzing the specific parameters within the context of the entire event, such as a session ID or page location.
For example, the following query pulls specific parameters, such as ga_session_id
, page_location
, source
, and medium
, and places them as separate columns within the same record:
This approach is useful when you are interested in analyzing events with a specific set of parameters. Unlike cartesian unnesting, inline unnesting ensures that each event remains a single, unified row in the resulting dataset.
Limitations of Inline UNNEST
Inline unnesting requires prior knowledge of the keys you wish to unnest. If you need to explore or discover keys, you may need to revert to cartesian unnesting for initial analysis.
Additionally, inline unnesting cannot handle arrays with duplicate keys. If an array contains duplicate keys, BigQuery will return an error. Hence, inline unnesting is only appropriate if you are confident that each key appears at most once per array.
The Bottom Line: If You Need to UNNEST BigQuery Data, Use Inline UNNEST
Based on my experience building a GA4 BigQuery data warehouse, choose inline unnest for all applications other than data discovery. In his Medium article, Guillaume Wagner agrees.
Happy unnesting!
by Tanya Zyabkina
Tanya Zyabkina has over 15 years of experience leading analytics functions for multiple Fortune 500 companies in the retail, telecom, and higher education. She works as the Director of Marketing Performance Analytics for the The Ohio State University. Go Bucks!