Distributed Systems Cheat Sheet

System Characteristics

CharacteristicDescriptionChallengesSolutions
ConcurrencyMultiple nodes operating simultaneouslyCoordination, synchronizationLocks, semaphores, consensus protocols
No Global ClockEach node has its own clockOrdering events, causalityLogical clocks, vector clocks
Independent FailuresComponents can fail independentlySystem reliability, fault toleranceReplication, redundancy, error recovery
Network PartitionsCommunication failures between nodesData consistency, availabilityConsensus protocols, conflict resolution
HeterogeneityDifferent hardware, OS, and programming languagesInteroperability, communicationStandard protocols, middleware

Consistency Models

ModelDescriptionCharacteristicsTrade-offs
Strong ConsistencyAll nodes see same data at same timeLinearizability, sequential consistencyHigh consistency, low availability
Eventual ConsistencyAll nodes eventually converge to same stateAvailability over consistencyHigh availability, potential inconsistency
Causal ConsistencyCausally related operations appear in orderMaintains causality, relaxes global orderBalanced approach
Weak ConsistencyNo guarantees about data consistencyBest effort, fast responsesHigh performance, no consistency

Consensus Algorithms

AlgorithmApproachStrengthsWeaknessesUse Cases
RaftLeader-based consensusSimple, understandable, efficientSingle point of failureEtcd, Consul, CockroachDB
PaxosMessage passing with quorumsProven, fault-tolerantComplex, hard to implementGoogle Chubby, Spanner
Two-Phase CommitPrepare and commit phasesEnsures atomicityBlocking, single point of failureDistributed databases
Three-Phase CommitNon-blocking extension of 2PCReduces blocking issuesComplex, assumes reliable networkDistributed transactions
Byzantine Fault ToleranceHandles malicious nodesSecure against arbitrary failuresHigh resource overheadBlockchain, security-critical systems

System Architectures

ArchitectureDescriptionAdvantagesDisadvantagesExamples
Client-ServerCentralized model with clients requesting servicesSimple, centralized controlSingle point of failure, scalabilityWeb applications, databases
Peer-to-PeerDecentralized model with equal nodesScalable, resilientComplex management, securityBitTorrent, blockchain
MicroservicesDecomposed into small, independent servicesScalability, technology diversityComplexity, network overheadNetflix, Amazon, Uber
Master-SlaveMaster coordinates work among slavesLoad distribution, fault toleranceMaster as bottleneckHadoop, MapReduce
Event-DrivenComponents communicate through eventsLoose coupling, scalabilityComplex debugging, event orderingMessage queues, event sourcing

Distributed Data Management

ConceptDescriptionTechniquesTrade-offs
PartitioningDistributing data across multiple nodesRange, hash, consistent hashingScalability vs complexity
ReplicationDuplicating data across nodesSynchronous, asynchronous, semi-synchronousAvailability vs consistency
ShardingHorizontal partitioning of databaseKey-based, directory-based, consistent hashingScalability vs query complexity
Cache CoherenceKeeping cached data consistentWrite-through, write-back, invalidationPerformance vs consistency
Data LocalityProcessing data near where it's storedMapReduce, Hadoop, edge computingNetwork efficiency vs load balancing

Distributed Computing Patterns

PatternDescriptionUse CaseBenefits
MapReduceProcess large datasets in parallelBig data processing, analyticsScalability, fault tolerance
Actor ModelConcurrency through message passingConcurrent, distributed systemsIsolation, concurrency
Service DiscoveryDynamic service locationMicroservices, cloud systemsDynamic scaling, resilience
Circuit BreakerPrevent cascading failuresMicroservices, API callsFault isolation, resilience
Load BalancingDistribute requests across nodesHigh-traffic applicationsScalability, availability
Leader ElectionSelect coordinator among nodesConsensus, coordinationCoordination, fault tolerance

CAP Theorem

PropertyDescriptionSystems that prioritize itTrade-offs
ConsistencyAll nodes see same data at same timeRDBMS, SpannerMay sacrifice availability
AvailabilitySystem remains operational despite failuresCassandra, DynamoDBMay sacrifice consistency
Partition ToleranceSystem continues despite network failuresAll distributed systemsMust choose between C or A