Two ways to UNNEST GA4 data in BigQuery

Google Analytics 4 (GA4) BigQuery export data is optimized for storage, not usage, and thus can be confusing and difficult to analyze. Array fields are particularly challenging.

A simple explanation of the array data type is they are small tables inside a field value. Like tables, they have the same fields and contain multiple records per array.

As of June 2024, there were three array data type fields in the standard GA4 export:

Field Name

Data Type

event_params

ARRAY
  <STRUCT
    <key STRING,
      value STRUCT
       <string_value STRING,
         int_value INT64,
         float_value FLOAT64,
         double_value FLOAT64>>>

user_properties

ARRAY
  <STRUCT
    <key STRING,
      value STRUCT
       <string_value STRING,
         int_value INT64,
         float_value FLOAT64,
         double_value FLOAT64,
         set_timestamp_micros INT64>>>

items

ARRAY
  <STRUCT
    <item_id STRING,
      item_name STRING,
      item_brand STRING,
      …,
      item_params STRUCT
        <key STRING,
          value STRUCT
           <string_value STRING,
             int_value INT64,
             float_value FLOAT64,
             double_value FLOAT64>>>>

This is how the event_params field appears in the BigQuery table preview. You can see multiple rows per event, with KEY being the event parameter name and the value stored in string_value and int_value fields.
event_param field in GA4 data export in Bigquery preview. Field names key, value.string_value, value.int_value are highlighted.

UNNEST GA4 Data Before Using

Most traffic analysis needs an understanding of event parameters (event_params), which have to be unnested before usage. The event_params array has two main elements – key and value, and the value is comprised of four subtypes: string, integer, float, and double.

Event parameter keys are different depending on the event, and examples include ga_session_id, ga_session_number, page_location, page_title. The values for the keys are stored in the value STRUCT based on type, for example ga_session_id stores the value in value.int_value.

You can’t access the values in the BigQuery array field directly, therefore, you have to unnest them before using them.

Two Ways to UNNEST GA4 event_params in BigQuery

There are two main ways to unnest an array in BigQuery, which I call a cartesian and an inline unnest.

This is a basic summary of the two ways:    

Infographic illustrating how cartesian and inline unnest work. Cartesian unnest results in multiple records per event and standard columns. Inline unnest pivots nested fields into new columns with 'key' as the column name and str or int value.

Let’s review the two ways to unnest BigQuery data in more detail.

Cartesian UNNEST

Cartesian unnest is the equivalent of joining the event data with the array “tables” in the fields. Since it is the equivalent of a cartesian join, it produces multiple rows for every event.

This is an example of how you can unnest the <code>event_params field</code>, and it will give you a good idea of what the cartesian unnest looks like.

SELECT
  timestamp_micros(event_timestamp) AS event_time,
  event_name,
  user_pseudo_id,
  key,
  value.string_value,
  value.int_value,
  value.float_value,
  value.double_value
FROM `project-000000.analytics_00000.events_yyyymmdd`, UNNEST(event_params);

Cartesian UNNEST GA4 Examples and Use Cases

Cartesian unnest produces multiple rows for each event, so it should not be used where having a single row per event is expected. However, if you want to explore the nested parameters themselves, it is a great way to get the full list.

The code below gives you an inventory of all nested event_params in GA4 events data. I find this code useful for understanding which parameters need to be unnested for analysis.

SELECT
  key, COUNT(*) as record_count
FROM
  `project-000000.analytics_00000.events_*`, UNNEST(event_params)
GROUP BY
  key
ORDER BY record_count desc;

Limitations of Cartesian UNNEST

Because cartesian unnest produces multiple rows per event, the output often needs to be filtered for the key before it can be used. Some keys, like ga_session_id or page_location are present in every event, others may not be.

Thus, when you filter on a less common key, it is important to remember that you may be undercounting the total number of events in the query. The same may happen when a parameter is missing by accident.

Alternatively, if you have duplicate keys in an array, you will duplicate the main record event after filtering for the key. I have yet to see this scenario in GA4 BigQuery data, but duplicate keys may be present in other data. However, cartesian unnest is the only way to unnest arrays with duplicate keys.

Inline UNNEST

Inline unnest pivots the array by the keys and inserts the values to the same record. Therefore, the inline unnest does not duplicate records.

Inline UNNEST GA4 Examples and Use Cases

Inline unnesting transforms array elements into columns, allowing you to access specific event parameters without duplicating records. The inline unnest provides an efficient method when analyzing the specific parameters within the context of the entire event, such as a session ID or page location.

For example, the following query pulls specific parameters, such as ga_session_id, page_location, source, and medium, and places them as separate columns within the same record:

SELECT
  user_pseudo_id,
  stream_id,
  TIMESTAMP_MICROS(event_timestamp) AS event_timestamp,
  event_name,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = ‘ga_session_id’) AS ga_session_id,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘page_location’) AS page_location,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘source’) AS utm_source,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = ‘medium’) AS utm_medium,
  device.category AS device,
FROM
  `project-000000.analytics_00000.events_*`;

This approach is useful when you are interested in analyzing events with a specific set of parameters. Unlike cartesian unnesting, inline unnesting ensures that each event remains a single, unified row in the resulting dataset.

Limitations of Inline UNNEST

Inline unnesting requires prior knowledge of the keys you wish to unnest. If you need to explore or discover keys, you may need to revert to cartesian unnesting for initial analysis.

Additionally, inline unnesting cannot handle arrays with duplicate keys. If an array contains duplicate keys, BigQuery will return an error. Hence, inline unnesting is only appropriate if you are confident that each key appears at most once per array.

The Bottom Line: If You Need to UNNEST BigQuery Data, Use Inline UNNEST

Based on my experience building a GA4 BigQuery data warehouse, choose inline unnest for all applications other than data discovery. In his Medium article, Guillaume Wagner agrees.

Happy unnesting!

by Tanya Zyabkina

Tanya Zyabkina has over 15 years of experience leading analytics functions for multiple Fortune 500 companies in the retail, telecom, and higher education. She works as the Director of Marketing Performance Analytics for the The Ohio State University. Go Bucks!