Skip to content

Learning rust by implementing the calc langauge using the llvm-sys crate

License

Notifications You must be signed in to change notification settings

e3m3/calcc-rust

Repository files navigation

Copyright

Copyright 2024, Giordano Salvador SPDX-License-Identifier: BSD-3-Clause

Author/Maintainer: Giordano Salvador [email protected]

Description (calcc language)

Ubuntu 22.04 Ubuntu 24.04 Fedora 40

MacOS 13 MacOS 14

Windows 2022

Learning Rust [1] by implementing the calc langauge using the llvm-sys [2] crate. Implements the calc language, inspired by the C++ [3] implementation presented by Nacke and Kwan in [4] and [5].

Accepted factors in the grammar have been extended for convenience (see src/{lex,parse}.rs and tests/lit-tests/). The output of the compiler is LLVM IR, LLVM bitcode, an object file, or executable file [6].

Language

The original license for the calc source can be found here [7].

Lexer

ident           ::= letter+ (letter | digit)*
number          ::= digit+ | (`0x` hex_digit+)
digit           ::= [0-9]
hex_digit       ::= [a-fA-F0-9]
letter          ::= letter_lower | letter_upper | `_`
letter_lower    ::= [a-z]
letter_upper    ::= [A-Z]
whitespace      ::= ` ` | `\r` | `\n` | `\t`

any             ::= _
token           ::= { tokenkind, text }
tokenkind       ::=
    | Unknown
    | Comma
    | Comment
    | Colon
    | Eoi
    | Eol
    | Ident
    | Minus
    | Number
    | ParenL
    | ParenR
    | Plus
    | Slash
    | Star
    | With
text            ::=
    | ``
    | `,`
    | `/``/` any*
    | `:`
    | ident
    | `-`
    | number
    | `(`
    | `)`
    | `+`
    | `/`
    | `*`
    | `with`

Grammar

calc    ::= ( With Colon Ident (Comma Ident)* Colon )? expr
expr    ::= term ( Plus | Minus ) term
factor  ::= Minus? ( Number | Ident | ParenL expr ParenR )
term    ::= factor ( Slash | Star ) factor

Notes:

  • The grammar rules above use the tokenkind as a shorthand for a token object as described by the lexer rules.

  • In the AST, a factor with a leading Minus token is represented as a subtraction expression where the left term is Number with the constant value 0.

Prerequisites

  • libstdc++

  • rust-2021

  • llvm-18 and llvm-sys (or llvm version matching llvm-sys)

  • clang-18 (for executables and -C|--c-main flags)

  • python3-lit, FileCheck (for testing)

    • By default, tests/lit-tests.rs will search for the lit executable in $PYTHON_VENV_PATH/bin (if it exists) or the system's /usr/bin.
  • [docker|podman] (for testing/containerization)

    • A Fedora [8] image can be built using containers/Containerfile.fedora*.

    • An Ubuntu [9] image can be built using containers/Containerfile.ubuntu*.

    • A Windows [10] image can be built using containers/Dockerfile.windows*.

Setup

  • Native build and test:

    cargo build
    cargo test -- --nocapture
  • Container build and test podman [11]:

    podman build -t calcc -f container/Containerfile .
  • Container build and test docker [12]:

    docker build -t calcc -f container/Dockerfile .
  • Container build and test docker-buildx [12] for Windows [10]:

    docker buildx build -t calcc -f container/Dockerfile.windows2022 --platform linux/amd64 --load .
  • If make is installed, you can build the image by running:

    make

Usage

From the help message (calcc --help):

usage: calcc [OPTIONS] <INPUT>
INPUT              '-' (i.e., Stdin) or a file path
OPTIONS:
--ast              Print the AST after parsing
-b|--bitcode       Output LLVM bitcode (post-optimization) (.bc if used with -o)
-c                 Output an object file (post-optimization) (.o if used with -o)
--drop             Drop unknown tokens instead of failing
-e|--expr[=]<E>    Process expression E instead of INPUT file
-h|--help          Print this list of command line options
--lex              Exit after running the lexer
--ir               Exit after printing IR (pre-optimization)
-S|--llvmir        Output LLVM IR (post-optimization) (.ll if used with -o)
-k|--no-main       Omit linking with main module (i.e., output kernel only)
                   When this option is selected, an executable cannot be generated
--notarget         Omit target specific configuration in LLVM IR/bitcode
-o[=]<F>           Output to file F instead of Stdout ('-' for Stdout)
                   If no known extension is used (.bc|.exe|.ll|.o) an executable is assumed
                   An executable requires llc and clang to be installed
-O<0|1|2|3>        Set the optimization level (default: O2)
--parse            Exit after running the parser
--sem              Exit after running the semantics check
-C|--c-main        Link with a C-derived main module (src/main.c.template)
                   This option is required for generating object files and executables on MacOS
                   and requires clang to be installed
-v|--verbose       Enable verbose output
--version          Display the package version and license information

References

  1. https://www.rust-lang.org/

  2. https://crates.io/crates/llvm-sys

  3. https://isocpp.org/

  4. https://www.packtpub.com/product/learn-llvm-17-second-edition/9781837631346

  5. https://github.com/PacktPublishing/Learn-LLVM-17

  6. https://llvm.org/

  7. https://github.com/PacktPublishing/Learn-LLVM-17/blob/main/LICENSE

  8. https://fedoraproject.org/

  9. https://ubuntu.com/

  10. https://www.microsoft.com/en-us/windows

  11. https://podman.io/

  12. https://www.docker.com/