A couple of questions I’ve always had are:
- How are nulls handled and calculated when performing aggregate queries?
- Should I use IFNULL THEN 0, and if so, when should I be using it?
Why is this relevant?
When working with Google Analytics data, many times you will come across your conversion events in their own separate column.
What if you’re tracking multiple conversion types and you want to sum up all of them?
If you only implemented conversion tracking starting from a specific date, you will only have that data from that date onwards, and any values for that conversion event before that date will be null.
So when you add multiple conversion column events together, sometimes you will get null values for dates that shouldn’t be null.
Here is how to handle that.
The Experiment
So I created a sample dataset and ran some queries on it to see what would happen in different scenarios.
The Dataset
Run the following sample-data-generating query and you’ll get a table that looks like below.
SELECT 1 as row_num, 4 as metric1, 1 as metric2, 0 as metric3
UNION ALL
SELECT 2 as row_num, 10 as metric1, 0 as metric2, null as metric3
UNION ALL
SELECT 3 as row_num, 1 as metric1, 2 as metric2, 3 as metric3
UNION ALL
SELECT 4 as row_num, 0 as metric1, 0 as metric2, 0 as metric3
UNION ALL
SELECT 5 as row_num, null as metric1, null as metric2, null as metric3
Couple notes about dataset:
- Included some zeros in different columns and rows to see what would happen when performing SUM aggregations and addition (+) operations
- Included some nulls to see what would happen with those as well
When running the test queries, you can either save this dataset as a view, as a table, or use it in a CTE (Common Table Expression) with the test queries. Below, I use CTE.
Test Query #1: SUM Aggregate for Each Column
In this first query, we perform a sum aggregation on columns: metric1, metric2, and metric3.
SELECT
SUM(metric1) as sum_metric1
, SUM(metric2) as sum_metric2
, SUM(metric3) as sum_metric3
FROM t1
Results:
For metric1 column: SUM(metric1) = 15
For metric2 column: SUM(metric2) = 3
For metric3 column: SUM(metric3) = 3
Conclusion:
SUM function in SQL treats nulls like zero.
Test Query #2: AVG Aggregate for Each Column
SELECT
AVG(metric1) as avg_metric1
, AVG(metric2) as avg_metric2
, AVG(metric3) as avg_metric3
FROM t1
Results:
For metric1 column: AVG(metric1) = 3.75
For metric2 column: AVG(metric2) = 0.75
For metric3 column: AVG(metric3) = 1
Conclusion:
AVG function in SQL does not include null in the calculation.
Test Query #3: COUNT Aggregate for Each Column
SELECT
COUNT(metric1) as count_metric1
, COUNT(metric2) as count_metric2
, COUNT(metric3) as count_metric3
FROM t1
Results:
For metric1 column: COUNT(metric1) = 4
For metric2 column: COUNT(metric2) = 4
For metric3 column: COUNT(metric3) = 3
Conclusion:
COUNT function in SQL does not include null in the calculation.
Test Query #4: Addition (+) Operator for Each Row
SELECT *,
metric1 + metric2 + metric3 as added_values
FROM t1
Results:
In this query, we can see the output is not 1 row but 5 rows (same as the original dataset).
When we use an addition operator to add columns together, we get a new value for every row.
Row 1: 4 + 1 + 0 = 5
Row 2: 10 + 0 + null = null
Row 3: 1 + 2 + 3 = 6
Row 4: 0 + 0 + 0 = 0
Row 5: null + null + null = null
Conclusion:
Using the addition operator (+) to add multiple columns together in SQL does include null in the calculation.
If any one value in the row is a null, the result of that calculation for that row will be null.
Test Query #5: Using IFNULL with Addition Operator (+)
SELECT *,
IFNULL(metric1,0) + IFNULL(metric2,0) + IFNULL(metric3,0) as result
FROM t1
Results:
In query #4, we saw that if any one value in any column is a null, the resulting calculation would be a null.
What if we didn’t want a null as the calculation? What if we wanted to sum of the values for multiple columns, even if a single value is null?
Then… we treat nulls as if they were zeroes.
You use that with IFNULL(column_name, whatever_integer_you_want_to_replace_the_null_with)
.
Example: IFNULL(metric1, 0) says…. If metric1 is null, replace it with 0.
Row 1: 4 + 1 + 0 = 5
Row 2: 10 + 0 + null = 10
Row 3: 1 + 2 + 3 = 6
Row 4: 0 + 0 + 0 = 0
Row 5: null + null + null = 0
Conclusion:
Using the addition operator (+) to add multiple columns together in SQL AND using IFNULL THEN 0 function will replace nulls with 0.
Test Query #6: Using Aggregate Function COUNT & Addition Operator (+) at the Same Time
SELECT COUNT(metric1 + metric2 + metric3) as result
FROM t1
Results:
In this query, whatever is inside the parenthesis of the COUNT function (or any other aggregate function) will be calculated first.
Only after that will the COUNT function run.
Step 1: Calculate metric1 + metric2 + metric3 first.
Result is:
Row 1: 4 + 1 + 0 = 5
Row 2: 10 + 0 + null = null
Row 3: 1 + 2 + 3 = 6
Row 4: 0 + 0 + 0 = 0
Row 5: null + null + null = null
Step 2: Run COUNT on added_values
column.
COUNT( added_values ) = 3
Conclusion:
Using an aggregate function and the an operator at the same time will include nulls in the calculation.
Test Query #7: Using Aggregate Function COUNT & Addition Operator (+) at the Same Time (While Handling Nulls)
If you want to count all of the values regardless if they’re null or not, you can wrap a COUNT function around all the IFNULL functions for each column.
SELECT
COUNT(IFNULL(metric1,0) + IFNULL(metric2,0) + IFNULL(metric3,0)) as result
FROM t1
Results:
COUNT( added_values ) = 5
Conclusion:
Use IFNULL to replace nulls with 0s in any operation.
Shortcut for counting all rows regardless of null:
SELECT COUNT(*) FROM table
Takeaway
When you want to add multiple columns together like:
SELECT conv_1 + conv_2 + conv_3 as total_conversions FROM table
It is safe to wrap each column in an IFNULL function, in order to treat nulls as zeroes.
SELECT IFNULL(conv_1, 0) + IFNULL(conv_2, 0) + IFNULL(conv_3, 0) as total_conversions FROM table
And if you want to sum up all the values in a single column, you can just use the sum aggregation function SUM() (or any other type of aggregation function), and nulls will automatically be accounted for.
SELECT SUM(conv_1) as conversions
FROM table
WHERE date BETWEEN '2022-01-01' AND '2022-01-31'