Compiler Design Cheat Sheet

Compiler Phases

PhasePurposeOutputTools/Techniques
Lexical AnalysisConvert source code to tokensToken streamRegular expressions, finite automata
Syntax AnalysisParse tokens into syntax treeParse tree/ASTContext-free grammars, parsers
Semantic AnalysisCheck semantic rulesDecorated ASTType checking, attribute grammars
Intermediate Code GenerationCreate intermediate representationIR codeThree-address code, postfix
Code OptimizationImprove code efficiencyOptimized IRData flow analysis, loop optimization
Code GenerationCreate target machine codeAssembly/machine codeInstruction selection, register allocation

Lexical Analysis

ConceptDescriptionImplementationExample
TokenBasic lexical unitKeywords, identifiers, literalsint x = 5; → int, x, =, 5, ;
PatternsRules for token recognitionRegular expressionsIdentifier: letter(letter|digit)*
Finite AutomataRecognition mechanismDFA, NFAState transition diagrams
LexemesCharacter sequences forming tokensActual input stringswhile in source code

Syntax Analysis (Parsing)

TypeMethodDirectionComplexityBest For
Top-DownRecursive descent, LL parsingRoot to leavesO(n)Programming languages
Bottom-UpShift-reduce, LR parsingLeaves to rootO(n)Most programming languages
LL(1)Left-to-right, leftmost derivationTop-downO(n)Simple grammars
LR(0)Left-to-right, rightmost derivationBottom-upO(n)Deterministic grammars
SLRSimple LRBottom-upO(n)Extended grammars
LALRLookahead LRBottom-upO(n)Most programming languages

Symbol Table Management

MethodDescriptionAdvantagesDisadvantages
Linear ListSimple linked listEasy to implementSlow search O(n)
Hash TableHash function for indexingFast search O(1)Hash collisions
Tree StructureBinary search treeSorted access, O(log n)Complex implementation
Scope HandlingMultiple tables for nested scopesProper scope managementMemory overhead

Intermediate Code Forms

TypeRepresentationAdvantagesDisadvantages
Three-Address Codeop arg1, arg2, resultSimple, general purposeMore instructions
Postfix Notationab+ (operands before operator)No parentheses neededHard to read
Abstract Syntax TreeTree representationCompact, hierarchicalMore complex
Quadruples(op, arg1, arg2, result)Easy to rearrangeMore memory
Triples(op, arg1, arg2)Less memoryDifficult to rearrange

Code Optimization

LevelTechniquePurposeExample
LocalConstant foldingReplace constant expressionsx = 3 + 5x = 8
LocalCommon subexpression eliminationAvoid recomputationa = b + c; d = b + ct = b + c; a = t; d = t
LocalDead code eliminationRemove unused codeRemove assignments to unused variables
LoopLoop invariant code motionMove code outside loopsMove calculations out of loops
LoopLoop unrollingReduce loop overheadReplicate loop body multiple times
GlobalData flow analysisOptimize across basic blocksGlobal common subexpressions

Runtime Environment

ComponentPurposeContentsManagement
Code AreaStore executable codeMachine instructionsStatic allocation
Static DataGlobal and static variablesGlobal variables, constantsStatic allocation
Stack AreaFunction calls and local variablesActivation recordsLast-in-first-out
Heap AreaDynamic memory allocationDynamically allocated dataManual/Garbage collection

Error Handling

Error TypePhaseDetectionRecovery Strategy
LexicalLexical AnalysisInvalid charactersIgnore invalid characters
SyntacticSyntax AnalysisInvalid syntaxPhrase-level, Panic mode
SemanticSemantic AnalysisType mismatchesReport and continue
LogicalRuntimeProgram logic errorsDebugging, testing