What:

In a traditional 2-step Compiler, the frontend is in charge of ensuring the following:

  • Checking if program is legal
  • Report errors usefully
  • Produce IR

High Level:

Components:

Lexer:

  • Converts the code into a stream of useable tokens. Also gets rid of whitespace and comments here actually.

Parser:

  • Constructs an Abstract Syntax Tree. Detects if the tree is valid (e.g. missing semicolon)
    • Top Down Parser: Build an AST by working from the start symbol to the input sentence
    • Bottom Up Parser: Builds an AST by working from the input sentence BACK to the start symbol.

Semantic Analyser:

  • Checks the AST is semantically correct.
    • Type Checking: E.g. Are variable types consistent (adding integers to strings)
    • Scope Validation: Variables are used within their proper scope
    • Function Validation: Functions are called with the correct number and type of arguments

IR Generator:

  • Converts the AST into Internal Representation (IR) Graph, which is fed into the Compiler Backend.
  • There’s a fundamental problem with the AST. It doesn’t actually denote the order of which the code occurs (control-flow or data-flow).