Compiler Design Cheat Sheet

Compiler Phases

Phase	Purpose	Output	Tools/Techniques
Lexical Analysis	Convert source code to tokens	Token stream	Regular expressions, finite automata
Syntax Analysis	Parse tokens into syntax tree	Parse tree/AST	Context-free grammars, parsers
Semantic Analysis	Check semantic rules	Decorated AST	Type checking, attribute grammars
Intermediate Code Generation	Create intermediate representation	IR code	Three-address code, postfix
Code Optimization	Improve code efficiency	Optimized IR	Data flow analysis, loop optimization
Code Generation	Create target machine code	Assembly/machine code	Instruction selection, register allocation

Lexical Analysis

Concept	Description	Implementation	Example
Token	Basic lexical unit	Keywords, identifiers, literals	int x = 5; → int, x, =, 5, ;
Patterns	Rules for token recognition	Regular expressions	Identifier: letter(letter\|digit)*
Finite Automata	Recognition mechanism	DFA, NFA	State transition diagrams
Lexemes	Character sequences forming tokens	Actual input strings	while in source code

Syntax Analysis (Parsing)

Type	Method	Direction	Complexity	Best For
Top-Down	Recursive descent, LL parsing	Root to leaves	O(n)	Programming languages
Bottom-Up	Shift-reduce, LR parsing	Leaves to root	O(n)	Most programming languages
LL(1)	Left-to-right, leftmost derivation	Top-down	O(n)	Simple grammars
LR(0)	Left-to-right, rightmost derivation	Bottom-up	O(n)	Deterministic grammars
SLR	Simple LR	Bottom-up	O(n)	Extended grammars
LALR	Lookahead LR	Bottom-up	O(n)	Most programming languages

Symbol Table Management

Method	Description	Advantages	Disadvantages
Linear List	Simple linked list	Easy to implement	Slow search O(n)
Hash Table	Hash function for indexing	Fast search O(1)	Hash collisions
Tree Structure	Binary search tree	Sorted access, O(log n)	Complex implementation
Scope Handling	Multiple tables for nested scopes	Proper scope management	Memory overhead

Intermediate Code Forms

Type	Representation	Advantages	Disadvantages
Three-Address Code	op arg1, arg2, result	Simple, general purpose	More instructions
Postfix Notation	ab+ (operands before operator)	No parentheses needed	Hard to read
Abstract Syntax Tree	Tree representation	Compact, hierarchical	More complex
Quadruples	(op, arg1, arg2, result)	Easy to rearrange	More memory
Triples	(op, arg1, arg2)	Less memory	Difficult to rearrange

Code Optimization

Level	Technique	Purpose	Example
Local	Constant folding	Replace constant expressions	x = 3 + 5 → x = 8
Local	Common subexpression elimination	Avoid recomputation	a = b + c; d = b + c → t = b + c; a = t; d = t
Local	Dead code elimination	Remove unused code	Remove assignments to unused variables
Loop	Loop invariant code motion	Move code outside loops	Move calculations out of loops
Loop	Loop unrolling	Reduce loop overhead	Replicate loop body multiple times
Global	Data flow analysis	Optimize across basic blocks	Global common subexpressions

Runtime Environment

Component	Purpose	Contents	Management
Code Area	Store executable code	Machine instructions	Static allocation
Static Data	Global and static variables	Global variables, constants	Static allocation
Stack Area	Function calls and local variables	Activation records	Last-in-first-out
Heap Area	Dynamic memory allocation	Dynamically allocated data	Manual/Garbage collection

Error Handling

Error Type	Phase	Detection	Recovery Strategy
Lexical	Lexical Analysis	Invalid characters	Ignore invalid characters
Syntactic	Syntax Analysis	Invalid syntax	Phrase-level, Panic mode
Semantic	Semantic Analysis	Type mismatches	Report and continue
Logical	Runtime	Program logic errors	Debugging, testing