dinfuehr/dora

JIT-compiler for the programming language Dora implemented in Rust

Use the specific nightly version listed in the Ruby and therefore a Ruby interpreter needs to be installed on your system (e

Dora

. Works on Linux, Windows and macOS (x86_64 and aarch64). Build with:

Compilation & Testing

Install Rust nightly through rustup.rs. Use the specific nightly version listed in the rust-toolchain file. Dora simply uses cargo for building:

# build in debug and release mode
cargo build && cargo build --release

# run all tests in debug and release mode (needs Ruby)
tools/test && tools/test-release # Linux and macOS
tools/test.bat && tools/test-release.bat # Windows

Note that the test runner is implemented in Ruby and therefore a Ruby interpreter needs to be installed on your system (e.g. brew/dnf/apt install ruby).

Working on the standard library

The standard library (stdlib) is included into the dora-binary at compile time. Changing the stdlib therefore requires recompiling Dora, even though the stdlib is written in Dora. In order to avoid this recompilation when working on the stdlib, simply pass your working directory of the stdlib to Dora using the --stdlib argument. With this parameter, Dora loads the stdlib from the specified directory instead of the one bundled in the executable.

Issues

Collection of the latest Issues

soc

soc

Comment Icon0

They weren't stripped off originally, but that was changed in https://github.com/dinfuehr/dora/commit/78d0f068f1410c8bd55b2085c2f637d411777ae2#diff-2e8fd6fa1728464bf16d731d758169acdbb204b74b6df173d91992d442e60524.

I did the original approach intentionally, because I felt that stripping of zeroes made it hard to visually compare different numbers in that format.

Now it turns out that it's not only a convenience, but that stripping off zeroes leads to bugs and other languages have added special lints for such APIs, see https://rules.sonarsource.com/java/RSPEC-4425.

I'd propose changing back to the previous behavior.

soc

soc

Comment Icon1

Here is the idea; consider an enum (ADT) definition like this:

enum Foo { A(String), B }

Now, instead of A and B being definition of variants on their own, they refer to existing types instead.

This means that one needs to provide an actual (class/struct/module) definition of A and B, which has multiple benefit (and no drawbacks from what I see):

  1. Enum variants have types because they have a "real" class/struct/... declaration. (This fixes a mistake that some languages like Rust or Haskell made.)
  2. Variants can be reference or value types (because they have a "real" class/struct/... declaration). So the enum discriminator can either be a number (like before) or the ref type's vtable.
  3. No "stutter", where variant names have to be invented to wrap existing types (Rust has this issue a lot).
  4. enum values can be passed/created more easily, because there are fewer layer of wrapping.
  5. Variants can be re-used in different enums.
  6. It makes it much easier to define ad-hoc enums when needed, obviating the need for a separate union type/type alias/etc. feature in the language.
  7. Nesting enums is straight-forward.

Example for 1., 2., 3.

So while ...

enum Option[T] {
  Some(value: T),
  None
}

... would receive little benefit from being written as ...

enum Option[T] { Some[T], None }
struct Some[T](value: T)
module None

... even trivial ADTs like a JSON tree would benefit. Instead of ...

enum JsonValue {
  JsonArray(value: Array[JsonValue]),
  JsonNumber(value: Float64),
  JsonString(value: String),
  JsonBool(value: Bool),
  JsonNull,
  ...
}

... one would write (with Array, Float64 and String being existing types in the language):

enum JsonValue {
  Array[JsonValue],
  Float64
  String,
  JsonNull,
  ...
}
module JsonNull

Another example would be ConstPoolEntry:

Example for 4.

It would also do away with having to wrap data into the enum's "variant" when passing arguments, as it's done with the "traditional" approach:

fun someValue(value: JsonValue) = ...
someValue("test") // not: someValue(JsonString("test"))

Whether it is desirable to support this for arbitrarily nested enums remains to be seen.

Example for 5.

Consider a class like

class Name(name: String)

With this approach we can use this Name type multiple times in different enums (and elsewhere):

enum PersonIdentifier {
  Name,
  ... // other identifiers like TaxId, Description, PhoneNumber etc.

enum DogTag { Name, ... // other identifiers like RegId, ...


So from my perspective, this approach reduces indirection at use-site and increases the utility of enums compared to more "traditional" enums, while not changing their runtime costs or representation.

soc

soc

Comment Icon1

Idea

Replace the different syntactic forms of

  • if expressions,
  • pattern matching and pattern guards,
  • if-let constructs

with a single, unified condition expression that scales from simple one-liners to complex pattern matches without requiring to switch to different syntactic constructs/keywords/...

This is done by recognizing the commonalities of if and match and pushing differences further down into the individual parts of the expression:

  • Conditions can be split between a common discriminator and individual cases. ... is used to indicate this.
  • A pattern matching operation can be signaled within the specific branch it is done; it does not require switching the whole syntactic from from if to match. is is used to indicate this.

I picked the if keyword over match because it is keyword the largest number of developers are familiar with, though the exact choice of keyword is not as important as ending up with a single one instead of the current two (if and match).

Out of scope

The intention is to cut the different syntax options down to a single one that is still easily recognizable by users. Minimizing keywords (i. e. a == b ? c : d) or turning conditions into methods (like Smalltalk) is not a goal.

Examples

Implementation

  • To avoid evaluation the "condition head" (common part) multiple times, it needs to be hoisted to a let-binding rather early in the process.
  • This has the benefit though that instead of some "magic" placeholder, we can refer to a "normal" let-binding during typechecking of the "condition tail" (branch-specific part).
soc

soc

Comment Icon2

Brought up in https://github.com/dinfuehr/dora/pull/254#issuecomment-907566068 by @dinfuehr, my plan as promised in https://github.com/dinfuehr/dora/pull/254#issuecomment-907648809.

Plan

Stringable#toString() focuses on general-purpose output:

  • A textual representation that provides useful information about a type for general consumption.
  • It avoids exposing innards or implementation details (capacity for collections, the structure of binary trees, the occupancy ratio of hash maps etc.).
  • The output remains relatively stable across version and implementation changes.
  • It requires implementing the Stringable traits, and putting that constraint on parameters/values when calling the method.
  • It can be implemented manually by users or derived by the compiler via an annotation
  • It has no access to runtime-internal facilities and has to follow the general rules of the language regarding visibility and accessibility.

runtime.debugString(value) provides debugging-oriented output:

  • A textual representation that provides insights into implementation details and helps investigating bugs in the type that is being printed.
  • It potentially exposes private fields, which would generally elided from general-purpose output.
  • The output is not stable and makes no guarantees.
  • It is an @internal static method implemented by the runtime.
  • It uses runtime-internal facilities, which may ignore language rules to inspect innards of instances and retrieve information not accessible from user code, e. g. pointer addresses.
  • Supported by these facilities, it makes reasonable efforts to avoid non-termination due to circular references, disambiguation of equal but not identical references etc.
  • Usage does not require any kind of trait.
  • Output of debugString is "" except in debug builds.
soc

soc

Comment Icon8

(This is not about supporting structs in the runtime, but purely about consolidating some of the syntax that solely exists for structs in favor of existing syntax for classes.)

The intention is make it straight-forward to migrate between class-ness and struct-ness by making struct definitions a subset of class definitions. The restrictions are:

  • A struct cannot be @open, it is always final.
  • A struct "constructor" defines all struct fields – the body is not allowed to contain additional value or variable definitions.
  • All struct "constructor" parameters must be defined as let, not var.

Now:

Proposed:

soc

soc

Comment Icon0

Rough draft of the approach I would suggest:

Packages

  • Packages specify the namespace of the classes and modules that reside in a given file.
  • Packages can only be declared at the top of a source file.
  • Packages do not import things implicitly.
  • The package path has to mirror the directory structure.

The idea that package and module could be unified can be investigated independently of this. @dinfuehr has mentioned some concerns in this direction, so I think it makes sense to approach this conservatively.

Imports

  • Imports bring the contents of a namespace into the scope of the file the import is declared.
  • Only classes and modules can be imported, no packages or functions/values/...
  • No "wildcard", "star", "glob" imports.

The import system is intentionally limited, based on the poor experiences with both glob imports and static imports.

One of the core idea is that it should be possible to move a file manually, without needing an IDE to help rewriting the imports.

One limited extension of the import mechanism I could imagine is support for renaming imports like import lib.time.{LocalDate -> Date}. I feel there is only a limited usecase for this – importing two classes/modules with the same name from different packages – and the workaround of importing one and specifying the full path to the other isn't too cumbersome.

soc

soc

Comment Icon0

Best example is the clone method:

If a class implements it, we ideally want to enforce that each and every subclass overrides that method with its own implementation, overriding the method's return type co-variantly:

In practice, this has multiple issues:

  1. Overriding the implementation in subclasses can simply be forgotten, because the compiler does not complain about missing it.
  2. It's easy to forget to also refine the return type, even if the method is implemented in subclasses.

Using a Self type is sometimes considered to be the obvious solution to deal with problem number 2., but poses new problems:


Multiple ideas:

  • One could turn Self types' problems into a solution: The code is broken, so report it as a compiler error and demand that the user supplies a working one.
  • Have the general rule that methods returning(/using?) Self do not inherit their implementation. I'm not sure it's possible to come up with a rule that is easy to specify and easy to implement and easy to verify and easy to understand, though.
  • Have some general @noImplementationInheritance modifier that allows inheriting the method signature, but not its implementation, of arbitrary methods.
soc

soc

Comment Icon0

This is probably a not-so-easy task given the amount of possible rounding modes and the poor support for rounding on x86 (for instance GCC believes that the best implementation of a standard halfway-up round requires 5 AVX512F instructions): https://gcc.godbolt.org/z/endCrA

(Though the Nehalem version might be ok: https://gcc.godbolt.org/z/G3SFsv)

For completeness, here is a list of signatures for possible rounding methods for Double:

working on this issue does not require implementing all of them, even a single one is fine!

soc

soc

Comment Icon3

How to initialize modules?

Semantics

General
  • Modules are holders for static functions and values (and variables), serving as a replacement for global/free-standing/static members.
  • Modules are instances, they have vtables, can be passed around as values and may extend classes and support traits.
Initialization
  • Initialization a module means initializing its members.
  • Initialized modules and their values/variables are never garbage collected.
  • Modules should be initialized on their first use, such that loading some library does not require initializing and retaining arbitrary stuff that is kept forever, if it's not even used.

Requirements

  • Modules should only be initialized on first access.
  • Modules should only ever be initialized once.
  • Module members (except variables) should be able to be treated as effectively constant from an optimization POV.

Problems

  • Guarding every use of a module with a check whether it needs to be initialized is undesirable, because that check would exist for the rest of the application’s run at every use-site (imagine the module is accessed from within a loop, as in the example below).
  • Guarding every use of a module with a check is undesirable because it acts as an optimization/inlining barrier and would require these techniques to be more sophisticated to optimize through modules.

Idea

Instead of emitting code that checks for initialization at every use-site, let the uninitialized module access the zero page, trap it, trigger the initialization and resume execution.

A page fault is massively slower than a null-check, but the difference is that the page fault happens at most once and does not add code to every use-site, while the null check would need to be done at every access.


Additional thought: Pretty much all modules of the standard library should be immutable, such that instead of shipping the standard library as source code or bytecode, it should be possible to memory-map an fully initialized image of the standard library. The module design should enable this.

soc

soc

Comment Icon5

Extracted from https://github.com/dinfuehr/dora/issues/39:

I'd like to get rid of those two control-flow keywords. Reasoning:

  • The alternative of using an additional method feels only a slightly bit clunkier to write and vastly easier to read and understand when coming back after a month.¹
  • In my experience the implementation complexity and the mental complexity for users has never been worth the "convenience".
  • There aren't much reasons for break and continue to exist in general – it feels like this is something that got copied from C and keeps getting copied without anyone questioning it much. I certainly think that if break and continue didn't exist today, we wouldn't invent it.

Prior art:

One language that did away with continue is Scala; and in the last 10 years I never heard a single complaint about it. (break in Scala is done as a library (that throws exceptions) – I rather not go this way either.)


¹ I ported Java's java.time implementation from Java to Scala and had to deal with a lot of breaks and continues, and the replacements were never too bad. Replacing breaks is almost trivial – continue is a bit more involved, but barely ever used.

The biggest trouble was always understanding what the break or continue was trying to do in the first place – a problem we won't have without break and continue in the first place.

soc

soc

Comment Icon6

I think it would make sense to allow classes (and structs) to directly implement traits, such that

can be written as

In addition to that, I think it would make sense that impls – whose fun-to-be-implemented exists with the exact signature in the class – do not require writing down the function if the classes' implementation matches:

playXE

playXE

Comment Icon0

This feature allows interfacing with other languages much easier. I know there is loadFunction function for loading extern symbols but it's not easy to use since you need to convert your objects to long type. The easiest way to load external symbols is to use dlsym, that's what cranelift does.

And this is how syntax may look for external functions: extern fun printf(String,...) -> Int; And this is how static values can be loaded

Also as I understand Dora calling convention for class methods similar to C++ calling convention and maybe this can be used to interface C++ classes,example:

soc

soc

Comment Icon12

I don't really know what your goals for this project are, but what do you believe is required to release a minor version?

I'm fine if you don't care about this at all, my personal approach is "language quality and language popularity – pick one", so I'd be perfectly fine with keeping this language at ~2 users/contributors. :-)

But if you intend to do a release in the future, would you be interested in brainstorming and identifying what we think we need to ship a 0.0.x release? Maybe also considering none-core things like a website, documentation or IDE support (LSP)?

soc

soc

Comment Icon2

Currently we have two different kinds of functions:

fun function(...) -> Something { ... }

and

fun procedure() { ... }

I propose removing the second syntax ("procedure syntax") in favor of

fun procedure() -> Unit { ... }

for multiple reasons:

  • It makes the language more regular.
  • I think it discourages programming without side-effects if side-effecting functions get better syntax than functions without side-effects.

Information - Updated Nov 26, 2021

Stars: 387
Forks: 19
Issues: 31

Repositories & Extras

A fantasy deathcrawl in Rust

To run, with Rust compiler and Cargo package manager installed:

A fantasy deathcrawl in Rust

MIRAI is an abstract interpreter for the Rust compiler's mid-level intermediate

MIRAI is an abstract interpreter for the mid-level intermediate

MIRAI is an abstract interpreter for the Rust compiler's mid-level intermediate

Rust compiler toolkit for WebAssembly apps in the WasmEdge Runtime

Developers: Getting started with the Tencent Serverless Functions for AI inference, or WasmEdge Runtime

Rust compiler toolkit for WebAssembly apps in the WasmEdge Runtime

guessing_game_rust

A repo used to learn rust using the Rust compiler

guessing_game_rust

owner-thing-rust

A repo used to learn rust using the Rust compiler

owner-thing-rust

enums_thing_rust

A repo used to learn rust using the Rust compiler

enums_thing_rust

collections-rust

A repo used to learn rust using the Rust compiler

collections-rust

A snake game written in Rust

Download Rust compiler from

A snake game written in Rust

C Compiler in Rust

A basic C compiler written in Rust, roughly following the tutorial official Rust compiler was taken as inspiration

C Compiler in Rust

It is rust bindings and wrapper around libconfig library

It is rust bindings and wrapper around Rust Compiler

It is rust bindings and wrapper around libconfig library

Toy Rust Compiler

A compiler can be broken down into 4 parts

Toy Rust Compiler
Facebook Instagram Twitter GitHub Dribbble
Privacy