Database Fundamentals: SQL, NoSQL, and ACID
Chapter 3 is where things get real. You stop talking about data in the abstract and start working with the thing that actually holds it: databases. If you plan to do any data engineering at all, this is where your daily life begins.
Nwokwu opens by saying that databases are organized collections of data that allow for efficient storage, retrieval, and management. That is the textbook answer. Here is the practical one: a database is the place where your app keeps everything it needs to remember. User accounts, orders, logs, messages. Without a database, your application has amnesia.
The building blocks
Before you pick any database type, you need to know four terms. They show up everywhere.
Rows are single records. One customer, one order, one event. Columns define what kind of info each record stores: name, email, timestamp. Schema is the blueprint, the rules for what goes where and what types are allowed. And keys are how you find and connect records. A customer ID is a key. It is unique. It does not change. It lets you say “this order belongs to that customer.”
That is really it. These four concepts carry you through both SQL and NoSQL worlds.
Relational databases (SQL)
Relational databases store data in tables. Rows and columns. If you have ever used a spreadsheet, you already get the layout. Each table represents an entity: customers, products, orders.
Here is the thing that makes relational databases special: relationships. You do not dump everything into one giant table. You split it into separate tables and connect them with keys.
A primary key uniquely identifies each row. Think CustomerID = 1. No duplicates, no nulls, no changes over time.
A foreign key is a reference from one table to another. Your Orders table has a CustomerID column that points back to the Customer table. That link is a relationship. It says “order 101 was placed by customer 1.”
Sometimes a single column is not enough to be unique. Two students can share the same ID if they are in different courses. So you combine StudentID + CourseID into a composite primary key. Together they are unique, even if each one alone is not.
The book covers SQL commands in four groups:
- DDL (Data Definition Language):
CREATE,ALTER,DROP,TRUNCATE. These build and modify the structure. - DML (Data Manipulation Language):
SELECT,INSERT,UPDATE,DELETE. These work with the actual data. - DCL (Data Control Language):
GRANT,REVOKE. These control who can do what. - TCL (Transaction Control Language):
COMMIT,ROLLBACK,SAVEPOINT. These manage transactions.
A quick example. Creating a table:
CREATE TABLE Customer (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Email VARCHAR(100)
);
Adding a row:
INSERT INTO Customer (CustomerID, FirstName, LastName, Email)
VALUES (1, 'John', 'Doe', '[email protected]');
Nothing fancy. But this is the bread and butter of data engineering.
ACID: the trust contract
Nwokwu uses a Black Friday sale as an example, and it is a good one. Two customers grab the last item at the same time. Without safeguards, both transactions go through. You oversell. Someone gets charged for nothing. Your database is now lying.
ACID prevents this. Four properties:
Atomicity: all or nothing. If a checkout has two steps (deduct inventory, charge payment) and one fails, the whole thing rolls back. You do not end up with half a transaction.
Consistency: the database goes from one valid state to another. You set a rule like “stock cannot be negative.” If a transaction would break that rule, it fails.
Isolation: two transactions running at the same time do not step on each other. If two people try to buy the last concert ticket, only one wins. The database keeps them separate.
Durability: once committed, it is permanent. Even if the server crashes right after, the data survives. Databases use Write-Ahead Logging for this. Changes hit the log file on disk before they hit the database itself.
If your app handles money, medical records, or anything where wrong data causes real harm, you want ACID.
NoSQL databases
Here is where it gets interesting. Not all data fits neatly into rows and columns. Around the mid-1990s, with the rise of the internet, people needed databases that could handle JSON documents, user-generated content, social graphs, and sensor data. NoSQL, meaning “Not only SQL,” was the answer.
The book covers four types:
Document databases (like MongoDB) store data as JSON-like documents. Each document can have different fields. One product has color and size. Another has weight and voltage. No need to force them into the same table schema. Great for product catalogs, user profiles, and anything semi-structured.
Key-value databases (like Redis) are the simplest. You have a key, you have a value. That is it. Lightning fast for caching, session storage, and shopping carts. But do not expect complex queries. You look things up by key, and that is about it.
Graph databases (like Neo4j) store data as nodes and edges. Nodes are entities (people, places). Edges are relationships (follows, friends with, purchased). Perfect for social networks, recommendation engines, and fraud detection. If your data is all about connections, graph databases are the way to go.
Column-family databases (like Cassandra) store data by columns instead of rows. If you only need to read one column out of a million rows, this is very efficient. Common for time-series data, IoT logs, and analytics dashboards.
Each type has trade-offs. Document databases can get messy with large, deeply nested documents. Key-value stores cannot do complex joins. Graph databases struggle with data that has no relationships. Columnar databases are not great for transactions. Pick the one that fits your problem.
Choosing between SQL and NoSQL
The book gives a practical framework here, and I think it is one of the more useful parts of the chapter. Five questions to ask yourself:
- Is your data structured? If yes, go relational. If it is flexible or changes often, consider NoSQL.
- Does your data have lots of relationships? Relational databases handle joins well. NoSQL works better when each record is self-contained.
- How fast do you need to move? Prototyping? NoSQL is less setup. Building for the long term with critical data? Relational gives you integrity.
- How do you query? Need complex, ad hoc analysis? SQL is powerful and portable. NoSQL queries tend to be simpler and more limited.
- How do you scale? NoSQL scales horizontally (add more servers). Relational scales vertically (make one server bigger). If you expect millions of users, NoSQL has the edge.
But here is the thing the book gets right: it is not either/or. Many real systems use both. You might store transactions in PostgreSQL and cache hot data in Redis. You might keep your user profiles in MongoDB and your analytics in a columnar store.
The right answer depends on your data, your use case, and how your application will grow. There is no single correct choice. Just the correct choice for your situation.
My take
This is a solid chapter. Nwokwu covers a lot of ground without making it feel overwhelming. The ACID section is well explained, and the NoSQL breakdown gives you enough to understand when each type makes sense.
If you are completely new to databases, this chapter gives you the vocabulary and the mental models. If you have been around databases for a while, it is a good refresher on the fundamentals, especially the decision framework at the end.
Next up: we finally start writing SQL queries. Chapter 4 is where you get your hands dirty.
This is part 4 of 18 in my retelling of “Data Engineering for Beginners” by Chisom Nwokwu. See all posts in this series.
| < Previous: Intro to Data Engineering | Next: SQL Basics > |