When browsing the BigQuery docs, I discovered that BigQuery now supports the QUALIFY
-clause. At time of writing, it’s in preview, but you can already try it out.
The QUALIFY
-clause lets you filter on the output of analytic functions. This allows me to scratch an itch I’ve had for a while:
For each distinct value in column X, find the row with max value in column Y
A common pattern is: “For each distinct value in column X, find the row where column Y takes it’s maximum value”. You can accomplish this by using ROW_NUMBER() OVER (PARTITION BY X ORDER BY Y DESC) AS rank
.
An example of this would be to find the date at which each of a set of countries reported the largest number of new confirmed covid-19 cases. We’ll be using the
bigquery-public-data.covid19_open_data
dataset.