There is an order of operations when it comes to MySQL’s ORDER BY and GROUP BY functions. It’s something that is intrinsically understood but often difficult to know how and when to correctly apply it. In this article I’ll attempt to explain by examples.
Transactional-style Tables
I identify these tables as those in which there are multiple rows representing similar or duplicate data with an auto-incrementing primary key being used to distinguish them. Some examples are where every user’s ID that views an article is recorded; another is recording movie tickets and quantities sold. Understanding when items change states and having a historical context are typically the main reasons for creating tables like these.
Last Or First in Group
Suppose you were an avid golfer who frequented a favorite golf course and you wanted to determine if your swing was getting better over time. Well, the main reason you love this golf course is because it has a technological system which can record any hole number you’re at and how far you’ve hit your golf ball to get it to the next hole. You ask the staff to give you a data dump of your games, and they give you a table which looks like this:

In order to see if and how your game has improved over time we could decide to look at this data in a few different ways. Let’s start by assuming you were interested in simply seeing what your max distance was at each hole at the golf course. To do that, we’d use this query:
SELECT
hole,
MAX(DISTANCE) AS max_distance
FROM golf_games
GROUP BY hole
ORDER BY max_distance DESC
https://www.db-fiddle.com/f/h9eYThQ54AmkDZidWzcgzB/0
Here we can see that at 4 of the holes you managed to get to the (hypothetical) maximum distance between holes (think of them as hole-in-ones). You’re obviously happy to see this data because it means you’re a great hypothetical golfer! But what if you wanted to know the first day you actually hit these hole-in-ones?
The MAX function isn’t useful in this scenario for two reasons. The first is that we are grouping by hole, and MySQL is looking for the maximum distance value within that group. Everything else is disregarded, and so essentially it cannot “see” other values from within the same row other than the hole value it belongs to. The second reason is because MySQL (from 5.7 onwards) by default runs in a mode called ONLY_FULL_GROUP_BY and will throw an error if we try to run this:
SELECT
hole,
MAX(DISTANCE) AS max_distance,
created_at AS 'date'
FROM golf_games
GROUP BY hole
ORDER BY max_distance DESC
To answer this question with SQL, let’s first think about how we would achieve this if we were looking at the entire dataset by hand. First, we need to list out all the holes at the course (17 in total, since hole 18 is when the game ends). Then we’d need to keep track of all the distances per hole, per date. Of those distances we need to see which one ranks first and then pick the corresponding date.
If you were to try and translate what I just described above into SQL, you’d probably have something like this:
SELECT
hole,
DATE(created_at) AS 'date',
distance
FROM golf_games
ORDER BY distance DESC # Make sure highest distance is first in list
GROUP BY hole # Get unique list of holes
If you ran this, you’d see that it results in an error (and also see that it’s one of the reason why I’m writing this article!). ORDER BY has to come after GROUP BY. But if we GROUP BY first, we run into the same issue above with the MAX function and, depending on your MySQL version, ONLY_FULL_GROUP_BY mode . Even if we turned that mode off, the grouping mechanism would disregard the ORDER BY directive until all the data is grouped first, and then sort on that grouped data. So what can we do?
Well, we realize now that the data must be sorted first before it is grouped; that’s a simple query:
SELECT
id,
hole,
distance,
DATE(created_at) AS 'date'
FROM golf_games
ORDER BY
distance DESC,
hole,
DATE(created_at)
https://www.db-fiddle.com/f/h9eYThQ54AmkDZidWzcgzB/0
Looking at the data in this way, it’s pretty simple to go down the list and pick out the first row we see for every hole because that would be the first date where you hit your maximum distance. We can ignore the subsequent rows for the same hole-date combination. So essentially, we need to have the data collapsed so that only the first row belonging to the group is left remaining.
To have MySQL do this for us though, we need to utilize the results from the query above. This can be achieved by creating what’s known as a temporary table!
A temporary table is automatically created whenever the results of a query are used within another query. That inner query is known as a sub-query. So the results of a sub-query (which would automatically become a temporary table) can be queried again by the query immediately above it.
Using the query above to create a sorted temp table, the final query becomes:
# MySQL Version <= 5.6
SELECT
sorted_temp_table.hole,
sorted_temp_table.distance,
sorted_temp_table.date
FROM (
SELECT
hole,
distance,
DATE(created_at) AS 'date'
FROM golf_games
ORDER BY
distance DESC,
hole,
DATE(created_at)
) AS sorted_temp_table
GROUP BY sorted_temp_table.hole
https://www.db-fiddle.com/f/h9eYThQ54AmkDZidWzcgzB/0
In order to use this query in MySQL 5.7 and onwards, you’d need some slight modifications:
SELECT
sorted_temp_table.hole,
ANY_VALUE(sorted_temp_table.distance) AS distance,
ANY_VALUE(sorted_temp_table.date) AS 'date'
FROM (
SELECT
(@row_number := @row_number + 1) AS virtual_id,
hole,
distance,
DATE(created_at) AS 'date'
FROM golf_games,
(SELECT @row_number := 0) AS row_table
ORDER BY
distance DESC,
hole,
DATE(created_at)
) AS sorted_temp_table
GROUP BY sorted_temp_table.hole
ORDER BY sorted_temp_table.hole
https://www.db-fiddle.com/f/h9eYThQ54AmkDZidWzcgzB/0
In MySQL 8, it’s slightly simpler:
# MySQL Version >= 8
SELECT
sorted_temp_table.hole,
ANY_VALUE(sorted_temp_table.distance) AS distance,
ANY_VALUE(sorted_temp_table.date) AS 'date'
FROM (
SELECT
ROW_NUMBER() OVER (
PARTITION BY hole
ORDER BY
distance DESC,
hole,
DATE(created_at)
) AS virtual_id,
hole,
distance,
DATE(created_at) AS 'date'
FROM golf_games
) AS sorted_temp_table
GROUP BY sorted_temp_table.hole
ORDER BY sorted_temp_table.hole
https://www.db-fiddle.com/f/h9eYThQ54AmkDZidWzcgzB/0
It is important to create a virtual_id column for our sorted temporary table. This enables MySQL to understand that it is a functionally dependent table. Then when MySQL proceeds to collapse the rows based on our GROUP BY sorted_temp_table.hole, the ANY_VALUE function we provided will tell it to pick the first available row for the group.
As always, I hope you found this article helpful! Please let me know what you think with comments, questions, or concerns.
Also, please follow me on Twitter!


Leave a Reply