| Phase | Purpose | Output | Tools/Techniques |
|---|---|---|---|
| Lexical Analysis | Convert source code to tokens | Token stream | Regular expressions, finite automata |
| Syntax Analysis | Parse tokens into syntax tree | Parse tree/AST | Context-free grammars, parsers |
| Semantic Analysis | Check semantic rules | Decorated AST | Type checking, attribute grammars |
| Intermediate Code Generation | Create intermediate representation | IR code | Three-address code, postfix |
| Code Optimization | Improve code efficiency | Optimized IR | Data flow analysis, loop optimization |
| Code Generation | Create target machine code | Assembly/machine code | Instruction selection, register allocation |
| Concept | Description | Implementation | Example |
|---|---|---|---|
| Token | Basic lexical unit | Keywords, identifiers, literals | int x = 5; → int, x, =, 5, ; |
| Patterns | Rules for token recognition | Regular expressions | Identifier: letter(letter|digit)* |
| Finite Automata | Recognition mechanism | DFA, NFA | State transition diagrams |
| Lexemes | Character sequences forming tokens | Actual input strings | while in source code |
| Type | Method | Direction | Complexity | Best For |
|---|---|---|---|---|
| Top-Down | Recursive descent, LL parsing | Root to leaves | O(n) | Programming languages |
| Bottom-Up | Shift-reduce, LR parsing | Leaves to root | O(n) | Most programming languages |
| LL(1) | Left-to-right, leftmost derivation | Top-down | O(n) | Simple grammars |
| LR(0) | Left-to-right, rightmost derivation | Bottom-up | O(n) | Deterministic grammars |
| SLR | Simple LR | Bottom-up | O(n) | Extended grammars |
| LALR | Lookahead LR | Bottom-up | O(n) | Most programming languages |
| Method | Description | Advantages | Disadvantages |
|---|---|---|---|
| Linear List | Simple linked list | Easy to implement | Slow search O(n) |
| Hash Table | Hash function for indexing | Fast search O(1) | Hash collisions |
| Tree Structure | Binary search tree | Sorted access, O(log n) | Complex implementation |
| Scope Handling | Multiple tables for nested scopes | Proper scope management | Memory overhead |
| Type | Representation | Advantages | Disadvantages |
|---|---|---|---|
| Three-Address Code | op arg1, arg2, result | Simple, general purpose | More instructions |
| Postfix Notation | ab+ (operands before operator) | No parentheses needed | Hard to read |
| Abstract Syntax Tree | Tree representation | Compact, hierarchical | More complex |
| Quadruples | (op, arg1, arg2, result) | Easy to rearrange | More memory |
| Triples | (op, arg1, arg2) | Less memory | Difficult to rearrange |
| Level | Technique | Purpose | Example |
|---|---|---|---|
| Local | Constant folding | Replace constant expressions | x = 3 + 5 → x = 8 |
| Local | Common subexpression elimination | Avoid recomputation | a = b + c; d = b + c → t = b + c; a = t; d = t |
| Local | Dead code elimination | Remove unused code | Remove assignments to unused variables |
| Loop | Loop invariant code motion | Move code outside loops | Move calculations out of loops |
| Loop | Loop unrolling | Reduce loop overhead | Replicate loop body multiple times |
| Global | Data flow analysis | Optimize across basic blocks | Global common subexpressions |
| Component | Purpose | Contents | Management |
|---|---|---|---|
| Code Area | Store executable code | Machine instructions | Static allocation |
| Static Data | Global and static variables | Global variables, constants | Static allocation |
| Stack Area | Function calls and local variables | Activation records | Last-in-first-out |
| Heap Area | Dynamic memory allocation | Dynamically allocated data | Manual/Garbage collection |
| Error Type | Phase | Detection | Recovery Strategy |
|---|---|---|---|
| Lexical | Lexical Analysis | Invalid characters | Ignore invalid characters |
| Syntactic | Syntax Analysis | Invalid syntax | Phrase-level, Panic mode |
| Semantic | Semantic Analysis | Type mismatches | Report and continue |
| Logical | Runtime | Program logic errors | Debugging, testing |