SQL Advanced Queries: JOINs, Subqueries, and Window Functions

This is Part 2 of Chapter 4, continuing from the SQL basics.

In Part 1 we covered how to pull data from one table. Filter it, sort it, count it. But real databases have many tables. Customers in one, orders in another, products in a third. The interesting stuff happens when you combine them.

That is what this part is about: joins, subqueries, window functions, and some practical advice on writing clean SQL.

JOINs: combining tables

Remember our two tables? The Customer table has four customers (IDs 1 through 4). The Orders table has four orders, but order 104 has customer_id = 5, which does not exist in the customer table. This mismatch is what makes join examples interesting.

INNER JOIN

The most common join. It returns only rows where there is a match in both tables.

SELECT
    orders.order_id,
    customers.customer_name,
    orders.total_amount
FROM orders
INNER JOIN customers
    ON orders.customer_id = customers.customer_id;

Result: you get three rows. Ada Herbert (orders 101 and 102) and Grace Turing (order 103). Alan Hopper and Margaret Bill do not appear because they have no orders. Order 104 does not appear because customer 5 does not exist.

Here is how it works: the database looks at every combination of rows from both tables and keeps only the ones where customer_id matches on both sides. No match, no row.

LEFT JOIN

Returns all rows from the left (first) table, plus matching rows from the right table. If there is no match, you get NULLs.

SELECT
    customers.customer_id,
    customers.customer_name,
    orders.order_id,
    orders.total_amount
FROM customers
LEFT JOIN orders
    ON customers.customer_id = orders.customer_id;

Now you see all four customers. Ada and Grace have their orders. Alan Hopper and Margaret Bill show up with NULL for order_id and total_amount. They are in the customer table but have never ordered anything.

This is useful when you want to find inactive customers or spot missing data. “Show me everyone, even if they have not done anything yet.”

RIGHT JOIN

The mirror image of LEFT JOIN. All rows from the right (second) table, matched rows from the left.

SELECT
    customers.customer_name,
    orders.order_id,
    orders.total_amount
FROM customers
RIGHT JOIN orders
    ON customers.customer_id = orders.customer_id;

Now order 104 shows up with NULL for customer_name. The order exists, but the customer does not. This can happen after data migrations, when records get lost or were never recorded properly.

Honestly, RIGHT JOINs are rare in practice. Most people just flip the table order and use a LEFT JOIN. But it is good to know they exist.

FULL OUTER JOIN

Returns everything from both tables. Matches where possible, NULLs where not.

SELECT
    customers.customer_name,
    orders.order_id,
    orders.total_amount
FROM customers
FULL OUTER JOIN orders
    ON customers.customer_id = orders.customer_id;

You get all customers (including those with no orders) and all orders (including those with no matching customer). This is your “show me the complete picture” join. Good for data quality checks and finding mismatches between tables.

Subqueries

Sometimes you need the answer to one question before you can ask another. Who has the highest order total? First, find the max. Then, find who placed it.

SELECT
    customers.customer_name,
    orders.order_id,
    orders.total_amount
FROM orders
JOIN customers
    ON orders.customer_id = customers.customer_id
WHERE orders.total_amount = (
    SELECT MAX(total_amount)
    FROM orders
);

The inner query finds the maximum total_amount. The outer query uses that result to filter. Grace Turing, order 103, 400 dollars.

Subqueries are queries nested inside other queries. They work. But they can get messy fast.

CTEs: a cleaner way

Common Table Expressions (CTEs) solve the readability problem. Instead of nesting queries inside each other, you define a named temporary result at the top, then reference it below.

WITH avg_order AS (
    SELECT AVG(total_amount) AS average_amount
    FROM orders
)
SELECT *
FROM orders, avg_order
WHERE orders.total_amount >= avg_order.average_amount;

The WITH clause creates avg_order as a temporary result. Then you use it like a regular table. No nesting, easy to read, easy to debug.

Here is the thing: if you need the same calculation in multiple places, a CTE saves you from repeating yourself. Write the logic once, reference it by name. Your future self will thank you.

Set operations

These combine results from two separate queries.

UNION merges two result sets and removes duplicates:

SELECT customer_id FROM customers
UNION
SELECT customer_id FROM orders;

You get IDs 1 through 5. Customer IDs from both tables, no duplicates.

UNION ALL does the same but keeps duplicates. Faster because it skips the dedup step.

INTERSECT returns only rows that appear in both queries:

SELECT customer_id FROM customers
INTERSECT
SELECT customer_id FROM orders;

Result: 1 and 2. These are customers who actually placed orders.

EXCEPT returns rows from the first query that are not in the second:

SELECT customer_id FROM customers
EXCEPT
SELECT customer_id FROM orders;

Result: 3 and 4. Customers who exist but never ordered.

One rule: both queries must have the same number of columns with compatible data types.

Window functions

This is where SQL gets powerful. Window functions do calculations across a set of rows related to the current row, but without collapsing everything into groups like GROUP BY does. You keep every row and add a calculated column.

Ranking functions

The book uses an employee sales table:

employeeregionsales
DavidWest900
EveWest800
FrankWest800
GraceWest700
BobEast600
AliceEast500
CharlieEast500
HannahEast400

ROW_NUMBER() gives each row a unique number within its group:

SELECT
  employee, region, sales,
  ROW_NUMBER() OVER (PARTITION BY region ORDER BY sales DESC) AS row_num
FROM sales;

Eve and Frank both have 800 in sales, but they get different row numbers (2 and 3). ROW_NUMBER always assigns unique numbers, even for ties.

Let me break down the syntax. OVER() defines the window. PARTITION BY region means “restart the numbering for each region.” ORDER BY sales DESC means “highest sales first.”

RANK() handles ties differently. Eve and Frank both get rank 2, and then Grace gets rank 4. Rank 3 is skipped. Like a sports tournament where two people tie for second place.

DENSE_RANK() also handles ties, but does not skip. Eve and Frank both get rank 2, Grace gets rank 3. No gaps.

When to use which? ROW_NUMBER when you need unique positions. RANK when ties should skip numbers. DENSE_RANK when you want continuous ranking despite ties.

Aggregate window functions

You can use SUM(), AVG(), and other aggregates as window functions. The difference from regular GROUP BY is that you keep every row.

SELECT
  employee, region, sales,
  SUM(sales) OVER (PARTITION BY region) AS total_sales_region,
  AVG(sales) OVER (PARTITION BY region) AS avg_sales_region
FROM sales;

Every row in the West region shows total_sales_region = 3200 and avg_sales_region = 800. You can instantly compare each employee to their regional average without losing any detail.

LAG and LEAD

These let you peek at previous or future rows.

-- Previous month's sales
SELECT
  month, sales,
  LAG(sales) OVER (ORDER BY month) AS prev_month_sales
FROM sales;

-- Next month's sales
SELECT
  month, sales,
  LEAD(sales) OVER (ORDER BY month) AS next_month_sales
FROM sales;

LAG looks backward. LEAD looks forward. The first row of LAG is NULL (nothing before it). The last row of LEAD is NULL (nothing after it). These are great for trend analysis. “Did sales go up or down compared to last month?”

Best practices

The chapter ends with practical advice. Here is what I found most useful:

Avoid SELECT *. Only pull the columns you need. It is faster and makes your intent clear.

Filter early. Apply WHERE conditions before joins and aggregations. The less data you process, the faster your query runs.

Use CTEs for complex logic. Break things into named steps. Your queries become readable and maintainable.

Capitalize SQL keywords. Write SELECT, not select. Write FROM, not from. It is a convention that makes queries easier to scan.

Test on real data. A query that works on four rows might crawl on four million. Check your execution plans.

Setting up a practice lab

The book walks through setting up SQL Server Express and SQL Server Management Studio (SSMS). You create the sample tables, insert the data, and run the queries yourself. If you are following along with the book, I recommend doing this. Reading SQL is one thing. Writing it and seeing results is how it actually sticks.

The sample code is also available on GitHub at github.com/Sommie09/sql-chapter-examples.

My take

This second half of Chapter 4 is where the book earns its keep. Joins and window functions are the tools that separate beginners from people who can actually work with data. Nwokwu explains them clearly, with consistent examples and good visual analogies.

If you learn nothing else from this chapter, learn INNER JOIN, LEFT JOIN, and how to read a window function. Those three things will cover 80% of the SQL you write as a data engineer.

The CTE section is also worth your time. Once you start writing CTEs instead of nested subqueries, you will never go back. Your SQL becomes something other people can actually read.

Next up: Chapter 5 on data modeling and ER diagrams. We go from writing queries to designing the tables themselves.


This is part 6 of 18 in my retelling of “Data Engineering for Beginners” by Chisom Nwokwu. See all posts in this series.

| < Previous: SQL Basics | Next: Data Modeling & ER Diagrams > |

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More