Breaking into Amazon by passing SQL interview - Customer Orders
- Christina

- Jun 12, 2022
- 4 min read
Updated: May 8, 2024
Regularly challenging oneself is a habit I'm trying to build. So what better first post for this blog than starting right away. Today, I will be solving a SQL exercise coming from a technical interview at Amazon.
As a Data Engineer, SQL is my mother tongue. I use it every day for a lot of different tasks. Like any language, practice is the key the keep fluency and accuracy.
Our question today is: Identify the customers who placed more than three transactions each in both 2019 and 2020. The database looks like the following:

The expected output is a single column containing the list of customers' names.
I am going to build a SQL script that is going to run directly on the database to get the expected result.
Let's begin coding!
A quick mention about Interview Query for SQL
For a couple of years, I've subscribed to Interview Query (this post is not sponsored by them). It's an awesome learning platform to prepare for technical interviews. From Amazon to Google, they have a huge database of questions and exercises for you to use. This topic came from their platform and they also provide the database for you to use which is very convenient. You can directly run the queries and see the end result.
I just wanted to share with you the platform as they are doing an amazing job for supporting the tech community. Now let's go back to our main problem.
Let's get started for real this time!
Step 1: Let's keep the transactions of 2019 and 2020
The volume of data is a relevant value to have. But why? Because I need to have a high-level knowledge of the size of my database. I'm not thinking right away about optimization but these numbers will give me the right instinct about the answer. If the volume is incoherent with this then I know that I must have made a mistake somewhere. To know the number of transactions per year, use the following query:
SELECT YEAR(created_at) as year, count(*) as nb_transactions
FROM transactions
WHERE YEAR(created_at) IN (2019,2020)
GROUP BY YEAR(created_at)
ORDER BY YEAR(created_at) ASC;Step 2: Let's focus on 2019 and retrieve the customer
Let's observe closely how many transactions our customers did in 2019. To do so, let's add the customer's name (because it is better than an ID) and count how many transactions there are for each customer.
SELECT name, count(*) as nb_transactions
FROM transactions as t
INNER JOIN users as u ON t.user_id = u.id
WHERE YEAR(created_at) = 2019
GROUP BY name
ORDER BY count(*) ASC;Now let's focus on the second part of the question. Because the logic is the same for each year, you can copy/paste the query of 2019 and change the year.
-- Situation in 2019
SELECT name
FROM transactions as t
INNER JOIN users as u ON t.user_id = u.id
WHERE YEAR(created_at) = 2019
GROUP BY name, YEAR(created_at)
HAVING COUNT(*) > 3
ORDER BY name ASC;
-- Situation in 2020
SELECT name
FROM transactions as t
INNER JOIN users as u ON t.user_id = u.id
WHERE YEAR(created_at) = 2020
GROUP BY name, YEAR(created_at)
HAVING COUNT(*) > 3
ORDER BY name ASC;Each query will give me the list of customers who did more than three transactions. Below find the results based on my data sample (I ran them directly on a made-up database but if you use InterviewQuery the result is going to be different). If you have yours, you can skip this part.
For 2019, we find the following result:
Cammy Whitelaw
Darby Rayman
Erminia Cowterd
Everlyn Kembley
Jewell Karim
Ronnie Simper
For 2020, we find the following result:
Darby Rayman
Erminia Cowterd
Huberto Bradnum
Now the ultimate step consists of selecting only the customers who appear in both results. So based on my example, the final answer should be:
Darby Rayman
Erminia Cowterd
Step 3: Identify the customers and answer the question
We have several options to store the results. I'm assuming that we only need the immediate answer so I'm building a one-shot query for this one. Otherwise, I would have built a view for making that information available and reusable.
-- List of customers with more than 3 transactions in both 2019 and 2020
SELECT cust_2019.name as name
FROM (
-- Situation in 2019
SELECT name --, count(*) as nb_transactions
FROM transactions as t
INNER JOIN users as u ON t.user_id = u.id
WHERE YEAR(created_at) = 2019
GROUP BY name, YEAR(created_at)
HAVING COUNT(*) > 3
-- ORDER BY name ASC;
) as cust_2019
INNER JOIN (
-- Situation in 2020
SELECT name --, count(*) as nb_transactions
FROM transactions as t
INNER JOIN users as u ON t.user_id = u.id
WHERE YEAR(created_at) = 2020
GROUP BY name, YEAR(created_at)
HAVING COUNT(*) > 3
-- ORDER BY name ASC;
) AS cust_2020
ON cust_2019.name = cust_2020.name;Final step: Review and optimize
Awesome! We answered the question but let's watch closely our query again. Maybe doing the JOIN statement so soon is not relevant. Based on the same logic, we can work with the table transactions and manipulate the user_id instead of the name. So a possible alternative could be the following:
SELECT u.name as customer_name
FROM (
-- Situation in 2019
SELECT user_id
FROM transactions
WHERE YEAR(created_at) = 2019
GROUP BY user_id, YEAR(created_at)
HAVING COUNT(*) > 3
) as cust_2019
INNER JOIN (
-- Situation in 2020
SELECT user_id
FROM transactions
WHERE YEAR(created_at) = 2020
GROUP BY user_id, YEAR(created_at)
HAVING COUNT(*) > 3
) AS cust_2020 ON cust_2019.user_id = cust_2020.user_id
INNER JOIN users as u ON cust_2019.user_id = u.id ;The final result is the list of customers below. Congratulation, you've finished this exercise given by Amazon.
CUSTOMER |
Darby Rayman |
Erminia Cowterd |
See you at the next practice!
source: direct link