qccompiler

Design and implementation of a compiler from scratch, including:

  1. Lexical analysis
  2. Syntactic analysis
  3. Abstract syntax tree (AST) construction
  4. Semantic analysis
  5. Code generation

It’s written in C and it compiles input files written in qC, a small subset of the language ANSI C (C89/C90). The generated code is in C, but very close to Assembly.

Features of qC

  • use variables and literals of types character and integer (both with signal)
  • function declarations/calls, with recursion support
  • pointers to variables and literals and to other pointers
  • unidimensional arrays for integers, characters or pointers
  • literals of type string
  • arithmetic and logic expressions (check language grammar)
  • simple relational operations
  • pointer operations
  • assign operations
  • control operations (if-else and while)
  • output operations (simplified version of printf)
  • conversion between integers and strings - operations itoa and atoi
  • comments of type /* … */

Tokens

Token Meaning
ID alphameric case sensitive sequences beginning with a letter where ‘_’ is also allowed
INTLIT sequence of digits without unnecessary left pad zeros
CHRLIT single character (except newline or single quote) or escape sequence (\n, \t, , ', " and \0) between single quotes
STRLIT sequence of characters (except newline or single quote) and/or escape sequences between double quotes
AMP &
AND &&
ASSIGN =
AST *
ATOI atoi
CHAR char
COMMA ,
DIV /
ELSE else
EQ ==
GE >=
GT >
IF if
INT int
ITOA itoa
LBRACE {
LE <=
LPAR (
LSQ [
LT <
MINUS -
MOD %
NE !=
NOT !
OR ||
PLUS +
PRINTF printf
RBRACE }
RETURN return
RPAR )
RSQ ]
SEMI ;
WHILE while
RESERVED C keywords not used in qC

Grammar (EBNF notation)

Start → (FunctionDefinition | FunctionDeclaration | Declaration) {FunctionDefinition | FunctionDeclaration | Declaration}
FunctionDefinition → TypeSpecifier FunctionDeclarator LBRACE {Declaration} {Statement} RBRACE
FunctionDeclaration → TypeSpecifier FunctionDeclarator SEMI
FunctionDeclarator → {AST} ID LPAR [ParameterList] RPAR
ParameterList → ParameterDeclaration {COMMA ParameterDeclaration}
ParameterDeclaration → TypeSpecifier {AST} ID
Declaration → TypeSpecifier Declarator {COMMA Declarator} SEMI
TypeSpecifier → CHAR | INT
Declarator → {AST} ID [LSQ INTLIT RSQ]
Statement → [Expression] SEMI
Statement → LBRACE {Statement} RBRACE
Statement → IF LPAR Expression RPAR Statement [ELSE Statement]
Statement → WHILE LPAR Expression RPAR Statement
Statement → RETURN Expression SEMI
Expression → Expression ASSIGN Expression
Expression → Expression (AND | OR) Expression
Expression → Expression (EQ | NE | LT | GT | LE | GE) Expression
Expression → Expression (PLUS | MINUS | AST | DIV | MOD) Expression
Expression → (AMP | AST | PLUS | MINUS | NOT) Expression
Expression → Expression LSQ Expression RSQ
Expression → ID LPAR [Expression {COMMA Expression}] RPAR
Expression → (PRINTF | ATOI) LPAR Expression RPAR
Expression → ITOA LPAR Expression COMMA Expression RPAR
Expression → ID | INTLIT | CHRLIT | STRLIT | LPAR Expression RPAR

Usage

$ make
$ ./qccompiler [OPTIONS] < input.qc

Options:

-t     print the abstract syntax tree and stop after syntatic analysis.
-s     print the symbol table and stop after semantic analisys.
-c     allways compile the program (unless errors occur).
-o     allways compile the program (unless errors occur) and print compiled program to file.

If both flags -s and -t are set, the proccess stops after the semantic analysis. The result of the compilation is written to the file `result.c

Dependencies

  • flex
  • yacc
  • gcc

Related