Data Modeling and ER Diagrams - Data Engineering for Beginners (Ch.5 Part 1)
This is Part 1 of Chapter 5. Part 2 covers normalization and design best practices.
Chapter 5 of Data Engineering for Beginners by Chisom Nwokwu is about database design. And honestly, this is where things start to feel real. The previous chapters gave us SQL and database basics. Now we are drawing blueprints.
Here is the thing about database design: if you get it wrong early, every query you write later pays the price. Slow lookups, duplicate data, confusing joins. All of it traces back to bad design. Nwokwu uses a nice phrase here: a well-designed database is the backbone of any reliable system. I agree.
What Is Data Modeling?
Data modeling is the process of planning how your data will be organized, connected, and stored. Think of it as a blueprint for a building. Before you start pouring concrete (creating tables), you need a plan.
The book uses a bakery example. Imagine three customers named Emily show up in your system. One is “Emily J,” another is “Johnson, Emily,” and a third is “emily_j01.” Which one ordered the cupcakes? Nobody knows.
The fix is simple: give every customer a unique CustomerID and every order a unique OrderID. Now you look up orders by ID, not by name. No confusion.
This connects back to the ACID principles from earlier chapters. Good data models enforce things like foreign key relationships, so you cannot create an order for a customer that does not exist. Atomicity, consistency, isolation, durability. They all benefit from clean structure.
A solid data model should:
- Map real-world things (customers, orders, products) to database structures
- Create clear relationships between tables
- Stay flexible enough to change when the business changes
Three Types of Data Models
Nwokwu breaks data modeling into three levels. Each one zooms in closer.
Conceptual Data Model
This is the big-picture view. No technical details, no column types, no indexes. Just the main entities and how they relate.
For an online store, a conceptual model might say:
- A Customer places Orders
- An Order contains Products
- Each Order links to a Payment
Business rules live here too. Like “every customer must have a payment method” or “orders can’t process without valid payment.” This level is for stakeholders and architects to agree on what the system should track.
Logical Data Model
Now we add detail. Each entity gets specific attributes.
- Customer:
CustomerID,Name,Email - Order:
OrderID,Date,TotalAmount - Product:
ProductID,Name,Price
This is the bridge between “what the business wants” and “what the engineers will build.” You define the attributes, the relationships, and the primary keys. But you still have not picked a specific database technology.
Physical Data Model
This is where it gets real. You pick PostgreSQL (or MySQL, or whatever), and you define exact column types, constraints, and indexes.
Something like:
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(100) UNIQUE,
age INT CHECK (age > 18)
);
Here is where data type choices matter. The book goes into this nicely:
INTfor whole numbers, good for IDs (up to about 2 billion)BIGINTwhen you need bigger numbers (up to 9 quintillion, takes 8 bytes)DECIMALfor money. Never useFLOATfor money. Rounding errors will haunt youVARCHAR(n)for variable-length text. If you store “yes” in aVARCHAR(100), it only uses 3 charactersCHAR(n)for fixed-length. “yes” inCHAR(10)still takes 10 characters, padded with spacesTIMESTAMPfor dates with time zones,DATEfor just the date
Small choices here have big effects at scale. The book makes a good point: picking the wrong type for a transaction ID (say INT instead of BIGINT) can crash your system when you outgrow it.
Entities, Attributes, and Relationships
These are the building blocks of any data model.
Entities are the nouns. Customer, Product, Order, Student, Teacher. Each one becomes a table. Use singular names. Customer, not Customers.
Attributes are the columns. Name, Email, Price. Keep them atomic. That means no stuffing two values into one column. FullName should probably be FirstName and LastName.
Relationships define how entities connect. A customer places orders. A student enrolls in courses. In the database, these connections are built with primary keys and foreign keys.
Cardinality
Cardinality is about how many. How many instances of one entity can connect to another?
One-to-One (1:1): Each employee has exactly one security badge. One passport per person. Not super common, but it happens.
One-to-Many (1:N): One customer can place many orders, but each order belongs to one customer. This is the most common type you will see.
Many-to-Many (M:N): Students enroll in multiple courses, and each course has multiple students. In a database, you usually handle this with a junction table (a table in the middle that links both sides).
Getting cardinality right is important. If you model something as one-to-one when it should be one-to-many, you will end up restructuring tables later. Ask yourself: “Can this thing have more than one of those?” If yes, it is probably one-to-many or many-to-many.
Entity Relationship Diagrams (ERDs)
An ERD is just a picture of your logical model. Boxes for entities, lines for relationships, symbols for cardinality. It is the diagram you draw on a whiteboard (or in a tool like draw.io) before you write any SQL.
The book walks through an ERD for an online shopping system with these entities:
- Category (one-to-many with Products)
- Products (one-to-many with Seller)
- Customer (one-to-many with Shopping Order, one-to-many with Deliveries)
- Shopping Order (belongs to one Customer)
- Deliveries (belongs to one Customer)
- Payment (belongs to one Category)
- Transaction Reports (aggregates data from multiple tables)
Each box in the ERD shows the entity name, its attributes, and the primary/foreign keys. The lines between boxes show how they connect and what the cardinality is.
Here is how it works in practice. The Customer table has a CustomerID as its primary key. The Shopping Order table has a CustomerID as a foreign key. That foreign key is what creates the relationship. One customer, many orders.
If I were teaching this to a junior developer, I would say: draw the ERD first, always. It saves you from building the wrong thing. And it is a lot easier to move boxes around on a diagram than to refactor a production database.
My Take
This chapter is well organized. Nwokwu builds up from the abstract (conceptual models) to the concrete (physical models with SQL data types), and the bakery/e-commerce examples make it easy to follow.
One thing I would add: tools matter here. In real work, you will probably use something like dbdiagram.io, Lucidchart, or even just pen and paper to sketch ERDs. The book focuses on the concepts, which is the right call for a beginner book. But once you understand the concepts, get comfortable with a diagramming tool.
Next up in Part 2: normalization, denormalization, and how to actually keep your database fast as it grows.
This is part 7 of 18 in my retelling of “Data Engineering for Beginners” by Chisom Nwokwu. See all posts in this series.
| < Previous: SQL Advanced Queries | Next: Normalization and Database Design > |