Golang DSA Chapter 10 Part 1: Memory Management and Garbage Collection
We’ve spent nine chapters building data structures, stacking nodes, balancing trees, linking lists. But here’s a question we haven’t really asked yet: what happens to all that memory when we’re done with it?
Chapter 10 answers that. It’s all about memory management, and specifically, how garbage collection works under the hood.
What Is Memory Management?
Memory management is how a program controls and organizes the memory it uses. Every time you create a variable, a struct, or push something onto a stack, that takes up a chunk of memory. These chunks are called blocks, and they get allocated to different processes as needed.
The goal is simple: give memory to programs when they need it, take it back when they don’t. Three main techniques handle this:
- Garbage collection, freeing memory from objects nobody uses anymore
- Cache management, keeping frequently accessed data close and fast
- Space allocation, deciding where in memory things go
This post covers garbage collection. We’ll get to caching and space allocation in Part 2.
Garbage Collection: The Big Picture
Garbage collection (GC) is automatic memory cleanup. Instead of you, the programmer, manually freeing memory (like you’d do in C), the runtime handles it for you. John McCarthy originally invented GC for Lisp, and the idea has been around since the 1960s.
The basic concept: find objects in memory that will never be used again, and reclaim that space.
GC solves some real headaches:
- Dangling pointers, where you free memory but still have a pointer to it
- Double-free bugs, where you accidentally free the same memory twice
- Memory leaks, where you forget to free memory and it slowly eats everything
The tradeoff? GC takes CPU time. Apple reportedly avoided GC in iOS because it needs roughly five times the memory to match manual memory management. That’s a real cost. In high-traffic systems, you need concurrent and incremental collectors to keep things running smoothly.
A GC algorithm has to juggle a lot of factors: throughput, heap overhead, pause times, pause frequency, compaction, concurrency, scaling, and more. There’s no single perfect solution, just different tradeoffs.
Reference Counting with a Stack
Before we get into the GC algorithms themselves, the book shows how references to objects can be tracked using a stack. This is the foundation for understanding how GC knows what’s still in use.
Here’s the ReferenceCounter struct. It tracks the number of references, a pool for reuse, and a count of removed references:
type ReferenceCounter struct {
num *uint32
pool *sync.Pool
removed *uint32
}
func newReferenceCounter() *ReferenceCounter {
return &ReferenceCounter{
num: new(uint32),
pool: &sync.Pool{},
removed: new(uint32),
}
}
Then a Stack struct holds an array of these counters:
type Stack struct {
references []*ReferenceCounter
Count int
}
func (stack *Stack) New() {
stack.references = make([]*ReferenceCounter, 0)
}
func (stack *Stack) Push(reference *ReferenceCounter) {
stack.references = append(stack.references[:stack.Count], reference)
stack.Count = stack.Count + 1
}
func (stack *Stack) Pop() *ReferenceCounter {
if stack.Count == 0 {
return nil
}
var length int = len(stack.references)
var reference *ReferenceCounter = stack.references[length-1]
if length > 1 {
stack.references = stack.references[:length-1]
} else {
stack.references = stack.references[0:]
}
stack.Count = len(stack.references)
return reference
}
And to use it, you just push reference counters on and pop them off:
func main() {
var stack *Stack = &Stack{}
stack.New()
var reference1 *ReferenceCounter = newReferenceCounter()
var reference2 *ReferenceCounter = newReferenceCounter()
var reference3 *ReferenceCounter = newReferenceCounter()
var reference4 *ReferenceCounter = newReferenceCounter()
stack.Push(reference1)
stack.Push(reference2)
stack.Push(reference3)
stack.Push(reference4)
fmt.Println(stack.Pop(), stack.Pop(), stack.Pop(), stack.Pop())
}
This prints the four reference counter pointers in reverse order (LIFO, as you’d expect from a stack). The point here is showing how the runtime can keep track of what’s referencing what.
Reference Counting Algorithms
Now let’s talk about the actual GC strategies. Reference counting is the simplest family of algorithms.
Simple Reference Counting
The idea: every object has a counter. When something points to the object, the counter goes up. When a reference is removed, the counter goes down. When the counter hits zero, nobody’s using it, so the object gets freed.
Here’s how that looks in Go using atomic operations for thread safety:
type ReferenceCounter struct {
num *uint32
pool *sync.Pool
removed *uint32
}
func newReferenceCounter() ReferenceCounter {
return ReferenceCounter{
num: new(uint32),
pool: &sync.Pool{},
removed: new(uint32),
}
}
func (referenceCounter ReferenceCounter) Add() {
atomic.AddUint32(referenceCounter.num, 1)
}
func (referenceCounter ReferenceCounter) Subtract() {
if atomic.AddUint32(referenceCounter.num, ^uint32(0)) == 0 {
atomic.AddUint32(referenceCounter.removed, 1)
}
}
The Add method bumps the count up by 1. The Subtract method decrements it, and if the result hits zero, it marks the object as removed. That ^uint32(0) trick is how you subtract 1 using unsigned atomics in Go, since there’s no atomic.SubtractUint32.
Simple reference counting works, but it has problems. If you have a big graph of connected objects, removing one can trigger a cascade of removals. And it’s slow when the object graph is large.
Deferred Reference Counting
Deferred reference counting is a smarter version. Instead of updating the count every single time a local variable changes, it only tracks references between objects. References from local variables (stack variables) get ignored until they’re actually needed.
This reduces the overhead a lot. Think about how often local variables come and go inside a function, that’s a ton of counter updates you can skip. Deutsch and Bobrow came up with this approach, and it’s supported by many compilers.
One-Bit Reference Counting
This one is clever. Instead of storing a full counter, you use a single bit flag: is this object referenced by exactly one thing, or more than one? That’s it. One bit.
The flag lives inside the object pointer itself, so you don’t need to allocate any extra space. This works surprisingly well because most objects in practice only have one reference pointing to them. When you only need to track “one vs. many,” a single bit gets the job done.
Weighted Reference Counting
Weighted reference counting assigns a weight to each reference. Instead of just counting how many references exist, you track the total weight across all references to an object.
This was invented by Bevan, Watson, and Watson in 1987. Here’s the basic structure:
type ReferenceCounter struct {
num *uint32
pool *sync.Pool
removed *uint32
weight int
}
func WeightedReference() int {
var references []ReferenceCounter
references = GetReferences(root)
var reference ReferenceCounter
var sum int
for _, reference = range references {
sum = sum + reference.weight
}
return sum
}
The advantage of weights is that copying a reference becomes cheap. You can split the weight between the original and the copy without talking to the object at all. This matters in distributed systems where objects might be on different machines.
Mark-and-Sweep
Reference counting has a big weakness: circular references. If object A points to B and B points to A, both counters stay at 1 forever even if nothing else references either of them. They’ll never get collected.
Mark-and-sweep fixes this. Dijkstra proposed the idea in 1978, and it works in two phases.
Phase 1, Mark: Start from the “root” objects (globals, stack variables). Walk through every reference you can reach and mark those objects as “alive.” The book uses a color metaphor: white objects are unmarked, gray objects are in progress, black objects are fully processed.
func Mark(root *object) {
var markedAlready bool
markedAlready = IfMarked(root)
if !markedAlready {
map[root] = true
}
var references *object[]
references = GetReferences(root)
var reference *object
for _, reference = range references {
Mark(reference)
}
}
Phase 2, Sweep: Go through the entire heap. Anything that wasn’t marked during Phase 1 is garbage. Free it.
func Sweep() {
var objects *[]object
objects = GetObjects()
var object *object
for _, object = range objects {
var markedAlready bool
markedAlready = IfMarked(object)
if markedAlready {
map[object] = true
}
Release(object)
}
}
The beauty of mark-and-sweep is that it handles circular references just fine. If A and B point to each other but nothing reachable from the roots points to either of them, they stay white and get swept.
The downside is that the program has to pause while the collector runs (though modern implementations use concurrent collectors to minimize this).
Generational Collection
Generational collection builds on an observation: most objects die young. A variable you create inside a function usually goes out of scope almost immediately. Long-lived objects (like global configs) are the exception.
So instead of scanning the entire heap every time, you split objects into generations based on age. Young objects go in generation 0, and if they survive a GC cycle, they get promoted to generation 1, then 2, and so on.
You collect younger generations more frequently (because that’s where most of the garbage is) and older generations less often. This saves a lot of work.
One catch: when you collect generation 3, you also need to scan generations 0 through 2 to handle cross-generation references.
func GenerationCollect() {
var currentGeneration int
currentGeneration = 3
var objects *[]object
objects = GetObjectsFromOldGeneration(3)
var object *object
for _, object = range objects {
var markedAlready bool
markedAlready = IfMarked(object)
if markedAlready {
map[object] = true
}
}
}
Go’s own garbage collector uses a concurrent, tri-color mark-and-sweep algorithm with some generational ideas mixed in. It’s optimized for low pause times, which is why Go is popular for servers and networked applications.
Quick Summary
| Algorithm | Key Idea | Strength | Weakness |
|---|---|---|---|
| Simple reference counting | Count refs per object | Immediate cleanup | Slow for large graphs, no circular refs |
| Deferred reference counting | Skip local var updates | Less overhead | More complex implementation |
| One-bit reference counting | Single bit flag | Minimal space | Less precise |
| Weighted reference counting | Weighted refs | Good for distributed systems | More bookkeeping |
| Mark-and-sweep | Mark reachable, sweep rest | Handles circular refs | Pause times |
| Generational | Collect young objects often | Efficient for typical workloads | Cross-gen scanning |
What’s Next
That covers the garbage collection side of Chapter 10. We looked at how reference counting works (from simple to weighted), how mark-and-sweep solves the circular reference problem, and how generational collection makes the whole thing faster by focusing on short-lived objects.
In Part 2, we’ll cover the rest of the chapter: cache management and space allocation algorithms.
This post is part of a series retelling “Learn Data Structures and Algorithms with Golang” by Bhagvan Kommadi (Packt, 2019, ISBN: 978-1-78961-850-1). You can find the full series in the intro post.
Previous: Chapter 9 Part 2: Sparse Matrices and Real-World Uses