GA4 & BigQuery: A SQL Model for a Base Events Table (How to Unnest Fields)

GA4 & BigQuery: A SQL Model for a Base Events Table (How to Unnest Fields)

Introduction to GA4 Public Dataset in BigQuery

In this post, we’ll be using the GA4 public dataset in BigQuery. You will need a Google Cloud Platform account to access this.

Using the sample dataset, you’ll see date sharded tables that look like this:

What the GA4 Public Dataset Looks Like in BigQuery

Each day’s worth of data exists in its own date sharded table.

Clicking on the events_ (92) table, and going to the “Details” tab, you’ll see the name of one of the shards on the top with a dropdown showing the years and dates of the table where data is available for.

Details Section of Google’s Public GA4 Dataset’s Date Sharded Table

If you click on the Preview tab, you’ll see a preview of the data showing all events that have happened on the corresponding website for the selected date:

Preview of events table for date January 1, 2021

If you want to run a query for all the tables, you’d have to run a wildcard query (*) like this:

select * from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

Which is short for:

select * from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210101`
union all
select * from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210102`
union all
select * from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210103`
…

If you want a cheatsheet for how to query different date ranges for this, click here.

Nested Fields in GA4 Dataset in BigQuery

The raw data export gives you a schema that looks like the below.

Any field that is nested will be underneath a field name where Type = Record and Mode = Repeated.

Examples of nested fields in GA4 dataset in BigQuery

If you go to preview the data, you’ll see the nested fields in separate cells in each individual row (Example: row 1 has multiple event_params.key).

Preview of a nested field in GA4 data in BigQuery

Error Trying to SELECT a Nested Field in BigQuery

Let’s say you want to get the event_params.key field. If you run the below query:

select event_params.key 
from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`

You get an error:

Error: Cannot access field key on a value with type
ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64, float_value FLOAT64, ...>>> at [2:21]

This is because you need to unnest the field.

Run the following query:

select e.key 
from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`, unnest(event_params) as e

And you’ll get a list of all the event keys in the table.

Results of a SQL query after unnesting event_params

Examples of Unnesting Fields

Now, let’s say we want to unnest multiple fields and get their values all in the same table.

Firstly, if you notice, each event_params.key has a corresponding value that’s either one of the following:

  • a string value
  • an int value (integer)
  • a float value
  • a double value

You’ll see the value of each key in the column that fits their data type, for example all page_location has a string_value, so you’ll see their value in the string_value column, and the rest of the “value” columns will be null.

So, when we unnest multiple fields, we can run a query like the following:

select
  (select value.int_value from unnest(event_params) where key='ga_session_id') as session_id
  , (select value.string_value from unnest(event_params) where key='gclid') as gclid
  , (select value.string_value from unnest(event_params) where key='page_location') as page_location
  , (select value.string_value from unnest(event_params) where key='page_referrer') as page_referrer
  , (select value.string_value from unnest(event_params) where key='content') as content
  , (select value.string_value from unnest(event_params) where key='term') as term
from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`

Any field that is nested, you’ll need to follow the pattern below.

SQL Template for Unnesting Fields

SELECT value.<data_type_value> FROM UNNEST(event_params) WHERE key='whatever_key_you_want') as name_of_column

The query above will unnest all of those fields and allow us to query from it (you can query queries), without needing to unnest anything.

Now, the query above will make no sense and have no meaning, because we do not know which rows belong to which event, what the timestamp of those events are, etc.

To make the query more meaningful, we can add more fields to it.

The SQL Model For A base_events Table

SELECT
  event_date
  , event_timestamp
  , event_name
  , geo.country
  , geo.region
  , (SELECT value.int_value FROM UNNEST(event_params) WHERE key='ga_session_id') as session_id
  , user_pseudo_id as user_id
  , (SELECT value.string_value FROM UNNEST(event_params) WHERE key='gclid') as gclid
  , (SELECT value.string_value FROM UNNEST(event_params) WHERE key='page_location') as page_location
  , (SELECT value.string_value FROM UNNEST(event_params) WHERE key='page_referrer') as page_referrer
  , traffic_source.medium as medium
  , traffic_source.source as source
  , traffic_source.name as campaign
  , (SELECT value.string_value FROM UNNEST(event_params) WHERE key='content') as content
  , (SELECT value.string_value FROM UNNEST(event_params) WHERE key='term') as term
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

Now this query will give us closer to what we need to generate something called a base_events table.

Here, each row represents one single event that has happened on the website.

Bonus: Add a Session Key to The base_events Table

If you’ve read this post, you’ll know that GA4’s field ga_session_id is not unique by itself. It’s only unique for each individual user_pseudo_id.

So if you want to get a session key that is unique regardless of user id, you would need to concatenate ga_session_id with user_pseudo_id in order to get a truly unique session key.

To add that session key to your base_events table, you can use a CTE (Common Table Expression) in the SQL model:

WITH main as (
SELECT
  PARSE_DATE('%Y%m%d', event_date) as event_date
  , event_timestamp
  , event_name
  , geo.country
  , geo.region
  , (SELECT value.int_value FROM UNNEST(event_params) WHERE key='ga_session_id') as session_id
  , user_pseudo_id as user_id
  , (SELECT value.string_value FROM UNNEST(event_params) WHERE key='gclid') as gclid
  , (SELECT value.string_value FROM UNNEST(event_params) WHERE key='page_location') as page_location
  , (SELECT value.string_value FROM UNNEST(event_params) WHERE key='page_referrer') as page_referrer
  , traffic_source.medium as medium
  , traffic_source.source as source
  , traffic_source.name as campaign
  , (SELECT value.string_value FROM UNNEST(event_params) WHERE key='content') as content
  , (SELECT value.string_value FROM UNNEST(event_params) WHERE key='term') as term
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
)
	
, add_session_key as (
  SELECT *,
    CONCAT(user_id, CAST(session_id as STRING)) as session_key
  FROM main
)
	
SELECT * FROM add_session_key

Now that you have your base_events model, you can save it as a view to use as a foundation for other models (like a sessions dimension table and a sessions fact table).

How to Save a SQL Query as a View in BigQuery
How to Export Your Universal Analytics (GA3) Data the Fastest & Easiest Way [w/ Free Templates Included]
A Beginner Data Engineer Project: Extract Random Jokes via API with Python (Part 2)
A Beginner Data Engineer Project: Extract Random Jokes via API with Python (Part 1)
What Are Python Lambda Functions?
How to Make a BigQuery Real-Time MTD Job Cost Monitoring Dashboard (Part 1): Setting Up the Data
3 Things You Need to Get Paid More
How to Connect BigQuery with dbt
How to Model GA4 Data Easier in BigQuery with this Helpful Tool
When to Use IF NULL THEN 0 (How to Handle Null Values in SQL)
GA4 & BigQuery: A SQL Model for a Base Events Table (How to Unnest Fields)

Related Posts