Reading path · 5 stops · ~85 min

Weak on data modelling

Targeted path for engineers whose designs are strong on compute and weak on storage. Covers how to pick the right store and partition it right.

For: Engineers whose feedback often cites "data model unclear" or "why that database?"

After this path

Pick the right store, the right partition key, and the right indexes for a given prompt — with defensible reasoning.

1
Skill
Data model design
"We'll put it in Postgres" is not a data model. The data model is entities, keys, relationships, cardinalities, and the access patterns each one has to serve — and it locks in every trade-off you will chase for the rest of the design.
Why this, here: Nail the entity model before you touch storage. Junior candidates skip this; seniors don't.
2
Skill
Storage choice justification
Picking a database is a first-principles decision, not a defaults one. "We use Postgres" is a cultural statement; "the access pattern is point-lookup at 100k QPS with eventual consistency, so we use DynamoDB" is a design.
Why this, here: Match access pattern to storage shape — not defaults, not "we use Postgres".
3
Skill
Sharding & partitioning
The partition key is the single most consequential decision in a distributed data design. Pick it wrong and no amount of horsepower recovers you — the hot shard stays hot, the rebalance never finishes, and the team spends a quarter migrating.
Why this, here: The single most consequential decision in a distributed data design.
Checkpoint
Stop and defend: pick a partition key for a messaging app’s messages table and say what breaks if you partition by sender_id versus conversation_id. If both sound fine, the choice isn’t load-bearing yet — re-read hot partitions.
4
Skill
indexing-strategies
The index you picked three months ago decides your query latency today — and the one you didn't create decides which queries you can't ship. Indexing is not "add indexes until it's fast"; it's a first-principles match between query shape and index structure.
Why this, here: The indexes decide your read latency. B-tree vs LSM vs inverted is a first-principles choice.
5
Skill
Consistency trade-offs
CAP is not a trivia question. It's the trade-off that every distributed system lives under, and getting it wrong is how you end up with "strong consistency" backed by a single node — or "eventual consistency" on data that absolutely cannot be eventually wrong.
Why this, here: Ties the storage choices back to the business-level consistency requirements.

Data model design

Storage choice justification

Sharding & partitioning

indexing-strategies

Consistency trade-offs

Data model design

Storage choice justification

Sharding & partitioning

indexing-strategies

Consistency trade-offs