Senior → staff calibration
The staff signal is scope: you reason about failure, capacity, and cost from the first minute. This path tightens the framings interviewers listen for.
For: Strong seniors pursuing staff-level rounds
After this path
Sound like an engineer who has shipped systems at scale, not one who has read about them.
- 1Skill
Capacity estimation (back-of-envelope)
Every downstream decision — cache size, shard count, replica count — collapses onto one question: what are the numbers? Candidates who skip this are designing vibes, not systems.
Why this, here: The staff signal is the numeric reasoning in the first 3 minutes.
- 2Skill
Failure mode analysis
Systems don't fail because you didn't think they could. They fail the way you failed to think about. Failure-mode analysis is structured paranoia — and interviewers grade on whether you can produce it on demand.
Why this, here: Pre-empt the "what fails" probe by raising it yourself.
Checkpoint
Staff framing drill: volunteer a failure mode and its mitigation before the interviewer asks. Say the sentence now, out loud, for the last system you worked on. If it sounds rehearsed, good — it should.
- 3Pattern
Multi-region active-passive / active-active
Geographic distribution for latency, DR, and compliance. Active-passive is operationally sane; active-active is a conflict-resolution project.
Why this, here: DR, latency, residency — staff rounds always touch this.
- 4Skill
consensus-leader-election
Raft and Paxos aren't trivia — they're the reason your leader-election design either works or deadlocks. Most interview failures here are: "we'll elect a leader somehow" with no quorum story and no fencing.
Why this, here: Names the real primitive instead of "we have a leader somehow".
- 5Skill
Observability & operations
You cannot operate what you cannot see; you cannot page on what you cannot measure. Candidates who design beautiful systems with no metrics, no logs, and no alerts are designing systems their on-call team will hate.
Why this, here: SLOs and error budgets are a staff-level vocabulary.
Checkpoint
State an SLO for a system you own (availability or latency), the error budget math behind it, and what you’d cut first when the budget is exhausted. Vague answers here get filtered at staff-plus.
- 6Skill
cdc-eventing
Change-data-capture is how you keep read models fresh without dual writes. The DB's log is already your most reliable event stream — use it.
Why this, here: How mature systems avoid dual writes. Outbox + CDC is table stakes.