Data Governance Explained - Chapter 10 Retelling

Data governance sounds like something a committee of suits invented to make your life harder. But here’s the thing: without it, everything falls apart quietly.

Imagine every team stores data however they want. Marketing has one version in a spreadsheet, Finance has another in a database, Sales has a third in some cloud tool. Which one is correct? Nobody knows. That’s what Chapter 10 is about.

The Hotel Analogy

The book uses a clever analogy. Picture this: you buy an abandoned hotel. Everything inside comes with it, but you have no idea what’s valuable and what’s junk.

First, you tour the place to see what you have. That’s discovery in data governance. Then you sort things into boxes: keep, trash, donate. That’s classification. You label each box with what’s inside, where you found it, and whether it’s valuable. Those labels are metadata. You write rules about how to handle certain items (lock up anything with personal information). Those are policies. You hire staff and assign responsibilities. Those are roles. And you set up processes so everyone follows the rules.

That’s data governance in a nutshell. Discovery, classification, metadata, policies, roles, and processes. The whole framework.

The Governance Framework

A data governance framework sits on three pillars: policies, processes, and roles and responsibilities.

Policies set the rules. Processes enforce them. Roles assign who does what.

Policies: The Rules of the Game

Regulatory Compliance

The big ones you’ll hear about:

  • GDPR (Europe) - applies to any company handling EU citizens’ data, anywhere in the world. Seven principles covering lawfulness, data minimization, accuracy, and more. Serious fines for violations.
  • HIPAA (US healthcare) - protects patient records. Requires encryption and audit trails.
  • CCPA (California) - gives consumers rights to know, delete, and opt out of data sales.
  • SOC 2 - not a law, but a security checklist for cloud companies. Covers security, availability, confidentiality, processing integrity, and privacy.
  • PCI DSS - mandatory for credit card data. Never store card numbers in plain text.

Data Classification

Not all data needs the same protection. You sort it into tiers: Public (press releases, share freely), Private (internal memos, keep inside), Confidential (customer data, restricted access), and Restricted (trade secrets, medical records, maximum security). Each tier gets different access controls.

Data Retention and Disposal

How long should you keep data? The answer depends on the type:

Data TypeTypical Retention
Customer records5-7 years
Financial transactions7+ years
Employee records3-6 years after departure
Emails1-3 years
System logs30-90 days

The author mentions that her team auto-deletes PII after 30 days. The longer you hold personal data, the bigger the risk. When data expires, you overwrite it, physically destroy the storage, or archive it with restricted access.

Processes: Making the Rules Stick

Policies are useless without processes to enforce them. The book covers four key processes.

Metadata Management

Metadata is data about data. It answers questions like: where did this dataset come from? What columns does it have? Who owns it? When was it last updated?

There are three types:

  • Technical metadata - column names, data types, table relationships. The blueprint.
  • Business metadata - what does “active_customer” actually mean? Which department owns this dataset? The dictionary.
  • Operational metadata - when was this last updated? What transformations were applied? Where did the data come from? The activity log.

You store metadata in catalogs like Apache Atlas, AWS Glue Data Catalog, or Google Data Catalog. Good metadata management means an analyst can search “customer churn rate” and find the right dataset in seconds.

Data Lineage

Data lineage is a map of how data moves through your systems. From source, through transformations, to final destination. When a report shows wrong numbers, lineage lets you trace the problem back to where it started. Simple pipelines are easy to track. Modern data systems with dozens of transformations across multiple platforms? That takes proper tooling.

Incident Management

When something breaks, you need a structured response: identify the problem early, classify severity, respond by isolating systems, communicate with stakeholders, analyze root cause, and document everything in a postmortem. Best practices: regular audits, response playbooks, employee training, and simulated breach exercises.

Master Data Management

Master data is the core info shared across your whole organization: customers, products, employees. The problem? Different teams often have slightly different versions. One stores names as “First Last,” another as “Last, First.”

Master data management standardizes this: defines ownership, creates shared definitions, sets formatting rules, and manages updates and retirements over time.

Roles: Who Does What

A governance framework needs people assigned to specific responsibilities.

Data Owner - the decision-maker, usually a department head. Decides what data to collect, who can access it, and how long to keep it.

Data Steward - the day-to-day caretaker. Evaluates data quality, cleans duplicates, adds descriptions to catalogs. The book makes a good point: stewards are often “found, not made.” Someone is probably already doing this informally.

Data Custodian - the technical implementer, usually IT or engineering. Sets up access controls, manages encryption and backups. They don’t make business decisions about data. They protect it.

Chief Data Officer (CDO) - the executive sponsor. Defines data strategy and champions governance at the leadership level. Not every company has one.

In smaller companies, one person wears several hats. The titles matter less than making sure someone owns each responsibility.

Data Management vs Data Governance

The book ends with a distinction people often get confused about.

Data management is the doing. Cleaning data, modeling it, storing it, securing it. The day-to-day work.

Data governance is the guiding. Setting the rules, defining standards, assigning accountability. It answers “how should we handle data?” rather than “let me handle this data.”

Management without governance means everyone does things their own way. Governance without management means you have a beautiful rulebook that nobody follows.

My Take

Beginners skip this chapter because it sounds boring. Don’t. I’ve worked at places where nobody thought about governance until something broke. Data was everywhere, nobody knew which version was right, and auditors showed up asking questions nobody could answer.

You don’t need to own governance. But you need to understand it. Start simple: classify your data, document your metadata, and make sure someone owns each dataset. That gets you 80% of the way there.


This is part 14 of 18 in my retelling of “Data Engineering for Beginners” by Chisom Nwokwu. See all posts in this series.

| < Previous: Data Security | Next: Big Data and Distributed Systems > |

About

About BookGrill.net

BookGrill.net is a technology book review site for developers, engineers, and anyone who builds things with code. We cover books on software engineering, AI and machine learning, cybersecurity, systems design, and the culture of technology.

Know More