SQL Basics: SELECT, WHERE, and Aggregate Functions

This is Part 1 of Chapter 4. Part 2 covers joins and advanced queries.

Chapter 4 is where Nwokwu puts SQL in your hands. No more theory. You write queries, you get results, you learn by doing. If Chapter 3 was about understanding what databases are, this chapter is about talking to them.

SQL stands for Structured Query Language. It is the language you use to communicate with relational databases. Ask questions, get answers. Insert data, update data, delete data. Every data engineer writes SQL. Every single day.

The book uses two simple tables throughout the chapter, and I think that is a smart move. You see the same data over and over, so you can focus on what each query does instead of trying to understand new data every time.

Customer table:

customer_idcustomer_namestatus
1Ada HerbertGold
2Grace TuringGold
3Alan HopperBronze
4Margaret BillSilver

Orders table:

order_idcustomer_idtotal_amountshipping_countrypayment_methodorder_date
1011200.0USACredit2025-01-02
1021100.0UKCredit2025-02-15
1032400.0USACoupon2025-01-30
1045150.0CanadaCreditNULL

Notice that order 104 has customer_id = 5, but there is no customer 5 in the customer table. That is intentional. It comes up later when we talk about joins.

SELECT: getting data out

The most basic SQL command. You tell the database what columns you want and from which table.

-- Give me everything
SELECT * FROM orders;

-- Give me just these two columns
SELECT order_id, total_amount FROM orders;

The * means “all columns.” It works, but in real life, you should name the columns you actually need. Pulling everything is wasteful when your table has 50 columns and you only care about two.

You can also rename columns in the output with AS:

SELECT order_id, total_amount AS amount_paid
FROM orders;

Now the result shows amount_paid instead of total_amount. The actual table does not change. It is just a display name, an alias.

WHERE: filtering rows

Most of the time you do not want every row. You want specific ones. That is what WHERE does.

-- Orders over 100 dollars
SELECT order_id, total_amount
FROM orders
WHERE total_amount > 100;

-- Orders shipped to the USA
SELECT order_id, shipping_country
FROM orders
WHERE shipping_country = 'USA';

Simple. You state your condition, and the database gives you only the rows that match.

Comparison operators

The usual suspects work in WHERE clauses:

OperatorMeaning
=Equal to
!= or <>Not equal to
>Greater than
<Less than
>=Greater than or equal
<=Less than or equal

Nothing surprising here. If you have written any code before, these feel familiar.

Pattern matching with LIKE

Sometimes you do not know the exact value. You know part of it. LIKE lets you search with wildcards.

-- Payment methods starting with "Credit"
SELECT order_id, payment_method
FROM orders
WHERE payment_method LIKE 'Credit%';

The % matches zero or more characters. The _ matches exactly one character. So '_da' would match any three-letter string ending in “da.”

IN and BETWEEN

Instead of writing status = 'Gold' OR status = 'Bronze', you can use IN:

SELECT customer_id, customer_name, status
FROM customer
WHERE status IN ('Gold', 'Bronze');

Cleaner, easier to read.

BETWEEN filters a range, inclusive on both ends:

-- Orders between 100 and 300 dollars
SELECT order_id, customer_id, total_amount
FROM orders
WHERE total_amount BETWEEN 100 AND 300;

Works great for dates too:

SELECT order_id, total_amount, order_date
FROM orders
WHERE order_date BETWEEN '2025-01-01' AND '2025-01-31';

Logical operators: AND, OR, NOT

You combine conditions with these.

-- Credit orders in the USA
SELECT * FROM orders
WHERE payment_method = 'Credit' AND shipping_country = 'USA';

-- Orders from USA or Canada
SELECT * FROM orders
WHERE shipping_country = 'USA' OR shipping_country = 'Canada';

-- Orders that are NOT Credit
SELECT * FROM orders
WHERE NOT payment_method = 'Credit';

AND means both conditions must be true. OR means at least one. NOT flips the condition.

Dealing with NULL

In SQL, NULL means the value is missing. Not zero, not empty string. Missing. You cannot compare it with =. You have to use IS NULL or IS NOT NULL.

-- Orders that have not shipped yet
SELECT order_id, shipped_date
FROM orders
WHERE shipped_date IS NULL;

-- Orders that have shipped
SELECT order_id, shipped_date
FROM orders
WHERE shipped_date IS NOT NULL;

This trips up beginners all the time. WHERE shipped_date = NULL does not work. The correct way is always IS NULL.

Sorting and limiting

ORDER BY sorts your results. Default is ascending. Add DESC for descending.

-- Cheapest orders first
SELECT order_id, total_amount
FROM orders
ORDER BY total_amount ASC;

-- Most recent shipments first
SELECT order_id, shipped_date
FROM orders
ORDER BY shipped_date DESC;

To limit how many rows come back, use LIMIT (MySQL, PostgreSQL) or TOP (SQL Server):

-- Top 2 highest orders
SELECT order_id, total_amount
FROM orders
ORDER BY total_amount DESC
LIMIT 2;

Aggregate functions

These do math across rows and return a single value. Five of them:

-- How many orders?
SELECT COUNT(*) AS number_of_orders FROM orders;

-- Total revenue?
SELECT SUM(total_amount) AS total_revenue FROM orders;

-- Average order value?
SELECT AVG(total_amount) AS average_order_value FROM orders;

-- Highest order?
SELECT MAX(total_amount) AS highest_order FROM orders;

-- Earliest order?
SELECT MIN(order_date) AS first_order_date FROM orders;

Nothing complicated. But these become very powerful when you combine them with GROUP BY.

GROUP BY and HAVING

GROUP BY groups rows that share a value and then applies an aggregate to each group.

-- Orders per country
SELECT shipping_country, COUNT(*) AS order_count
FROM orders
GROUP BY shipping_country;

-- Revenue per customer
SELECT customer_id, SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id;

Here is the thing: every column in your SELECT that is not inside an aggregate function must appear in GROUP BY. If you forget this, you get an error.

HAVING is like WHERE, but it filters after grouping. You cannot use aggregate functions in WHERE, so you use HAVING instead.

-- Only countries with more than 5 orders
SELECT shipping_country, COUNT(*) AS order_count
FROM orders
GROUP BY shipping_country
HAVING COUNT(*) > 5;

-- Only customers who spent over 200
SELECT customer_id, SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(total_amount) > 200;

Quick rule of thumb: WHERE filters individual rows before grouping. HAVING filters groups after grouping.

My take

This first half of Chapter 4 is well structured. Nwokwu builds from the simplest query (SELECT *) to aggregate functions and grouping. Each concept adds one piece on top of the last. The consistent use of the same two tables means you spend your brain cycles learning SQL, not re-reading data definitions.

If you are brand new to SQL, this section gives you enough to start exploring real databases. You can filter, sort, count, and summarize. That covers a surprising amount of real-world work.

Next up: joins, subqueries, and window functions. That is where SQL gets really interesting.


This is part 5 of 18 in my retelling of “Data Engineering for Beginners” by Chisom Nwokwu. See all posts in this series.

| < Previous: Database Fundamentals | Next: SQL Advanced Queries > |

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More