How to Model GA4 Data Easier in BigQuery with this Helpful Tool

How to Model GA4 Data Easier in BigQuery with this Helpful Tool

Issues with modeling data strictly from inside BigQuery:

What I found while modeling GA4 data in BigQuery was:

  1. You can’t easily see model lineage – Many times I have SQL views saved they often reference more than one other SQL view or table. When I want to see the lineage of which models lead to what, I would have to mentally visualize or manually draw it out in LucidChart.
  2. SQL files often get unwieldy – I cannot tell you the number of times I’ve had 15+ SQL file editors opened up in my BigQuery browser. When you have to start clicking that left or right scroll arrows to get to your other SQL files, that’s when you know you have a problem.
  3. Lack of version control – A lot of times I want to see previous versions of my work or a query I wrote a few weeks ago. I usually have to check the project history tab and search for what I remember to pull it up.

Enter the solution…

What is dbt?

On their website, it says;

dbt is a SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation.

It’s not just a tool… it’s a new workflow and framework for transforming data in the cloud.

The new workflow is also helping give rise to a new niche in the careers market: the analytics engineer.

A search on indeed.com shows 2x more jobs for “analytics engineer” than “data engineer” now.

2,577 jobs for data engineer
5,359 jobs for analytics engineer

Some Results from Using dbt

  • jetBlue completely redesigned their entire data approach with 6-8 hour data infrastructure maintenance windows reduced to 0.
  • Nasdaq does 100-125 billion transactions per day and now uses dbt for faster turnaround times, easier user onboarding, and extracting insights from previously inaccessible data points. – source
  • Hubspot was using Airflow to help solve data transformation – but now they use dbt.
  • Red Ventures spends 80% less time on data processing jobs and serves 30% more clients without increasing IT head count.
  • SafetyCulture went from -20 to +69 eNPS and 2x increase in new customer retention.
  • “80% reduction in data engineering hours and 95% reduction in time to onboard new employees” – Aktify
  • “600% decrease in time to actionable data” – Spoton

Features and Benefits of dbt

1. Debug your work easier with modularity

Modularity is a quality and best practice in software engineering that allows your work to be broken down into smaller parts and recombined to form a system. The fact that you can keep smaller parts of the system separated and recombined allows you to debug easier and build faster and iteratively.

Opposite of a modular system would be a monolithic system. Imagine having just one table in your database that holds ALL of your data from different sources like Google Analytics, Google Ads, Bing Ads, Google Business Profiles, etc. That would be a nightmare.

2. Model data anywhere from any computer (portability)

The SQL you write and the files you save can be easily ported over, run, and used on different operating systems and computers. Since dbt has a web-based IDE (Integrated Development Environment), you can essentially work on one computer, jump to another, and resume your work with little bumps in the road.

3. Constantly monitor your data’s integrity with tests

Super important, because if a stakeholder notices an issue in the data in their report, that means it’s already too late for you. You need to test your data for accuracy, correctness, and completeness even before it reaches the reporting layer (Tableau, Excel sheet, Google Data Studio, Power BI).

Having to do this for every single report and every single visualization manually is a nightmare. dbt allows you to one-time build testing scripts and have it run on a recurring basis so you can keep monitoring your data’s integrity.

4. Communicate better and have more visibility with auto-generated documentation

Automatically generate documentation with the click of a button (literally) to help onboard new users easily. The automatic generation is possible because the documentation system is built on top of the inline code you write to build the models.

5. Clearly see data lineage with auto-generated DAGs

Auto-generate Direct Acyclic Graphs to show the lineage of your data models.

Example of a DAG in dbt

6. Reuse pieces of code with macros

Using Jinja (a templating language), you can create pieces of code that can be reused multiple times in different places.

Example: let’s say you want to pull a SUM(CASE WHEN .... THEN .... END) aggregation on multiple columns. Instead of writing out all of it, you can write a jinja template.

A SQL query using a jinja template:

-- /models/order_payment_method_amounts.sql

{% set payment_methods = ["bank_transfer", "credit_card", "gift_card"] %}

select
    order_id,
    {% for payment_method in payment_methods %}
    sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount,
    {% endfor %}
    sum(amount) as total_amount
from app_data.payments
group by 1

The above code will be equivalent to:

-- /models/order_payment_method_amounts.sql

select
    order_id,
    sum(case when payment_method = 'bank_transfer' then amount end) as bank_transfer_amount,
    sum(case when payment_method = 'credit_card' then amount end) as credit_card_amount,
    sum(case when payment_method = 'gift_card' then amount end) as gift_card_amount,
    sum(amount) as total_amount
from app_data.payments
group by 1

7. Reduce the need to write boilerplate DML and DDL language with materializations

Do you want (or need) your models to be fully refreshed or only incrementally loaded? Do you need them to show up as tables or views?

Materialization covers all of these and dbt makes it easy for you by taking care of it – no need to write boilerplate DML (Data Manipulation Language) and DDL (Data Definition Language).

You can focus on engineering the logic instead of being a database administrator.

Best Part is: it’s Free!

As of the time of this writing, you can start building models with a free account seat for one developer.

How to Export Your Universal Analytics (GA3) Data the Fastest & Easiest Way [w/ Free Templates Included]
A Beginner Data Engineer Project: Extract Random Jokes via API with Python (Part 2)
A Beginner Data Engineer Project: Extract Random Jokes via API with Python (Part 1)
What Are Python Lambda Functions?
How to Make a BigQuery Real-Time MTD Job Cost Monitoring Dashboard (Part 1): Setting Up the Data
3 Things You Need to Get Paid More
How to Connect BigQuery with dbt
How to Model GA4 Data Easier in BigQuery with this Helpful Tool
When to Use IF NULL THEN 0 (How to Handle Null Values in SQL)
GA4 & BigQuery: A SQL Model for a Base Events Table (How to Unnest Fields)

Related Posts