Let’s say you are running a business. You have just on-boarded an (expensive) new agency for acquiring customers. After a month, you want to decide if you should continue using them. You do this by comparing the average revenue of customers brought in by them agency, to average revenue of customers from your existing channels. How many customers from that agency do you need, to make an informed decision?
[Read More]
QUALIFY clause is now available in BigQuery
When browsing the BigQuery docs, I discovered that BigQuery now supports the QUALIFY
-clause. At time of writing, it’s in preview, but you can already try it out.
The QUALIFY
-clause lets you filter on the output of analytic functions. This allows me to scratch an itch I’ve had for a while:
For each distinct value in column X, find the row with max value in column Y
A common pattern is: “For each distinct value in column X, find the row where column Y takes it’s maximum value”. You can accomplish this by using ROW_NUMBER() OVER (PARTITION BY X ORDER BY Y DESC) AS rank
.
An example of this would be to find the date at which each of a set of countries reported the largest number of new confirmed covid-19 cases. We’ll be using the
bigquery-public-data.covid19_open_data
dataset.