yoav-lavi/melody

Melody is a language that compiles to ECMAScript regular expressions, while aiming to be more...

Twitter Hashtag  try in playground

Melody is a language that compiles to ECMAScript regular expressions, while aiming to be more readable and maintainable.

Examples

Note: these are for the currently supported syntax and may change

Batman Theme  try in playground

16 of "na";

2 of match {
  <space>;
  "batman";
}

// 🦇🦸&zwj;♂️

Turns into

(?:na){16}(?: batman){2}

Twitter Hashtag  try in playground

"#";
some of <word>;

// #melody

Turns into

#\w+

Introductory Courses  try in playground

some of <alphabetic>;
<space>;
"1";
2 of <digit>;

// classname 1xx

Turns into

[a-zA-Z]+ 1\d{2}

Indented Code (2 spaces)  try in playground

some of match {
  2 of <space>;
}

some of <char>;
";";

// let value = 5;

Turns into

(?: {2})+.+;

Semantic Versions  try in playground

<start>;

option of "v";

capture major {
  some of <digit>;
}

".";

capture minor {
  some of <digit>;
}

".";

capture patch {
  some of <digit>;
}

<end>;

// v1.0.0

Turns into

^v?(?<major>\d+)\.(?<minor>\d+)\.(?<patch>\d+)$

Playground

You can try Melody in your browser using the playground

Book

Read the book here

Install

Cargo

cargo install melody_cli

From Source

git clone https://github.com/yoav-lavi/melody.git
cd melody
cargo install --path crates/melody_cli

Binary

  • macOS binaries (aarch64 and x86_64) can be downloaded from the release page

Community

  • Brew (macOS and Linux)

    Installation instructions
    brew install melody
    
  • Arch Linux (maintained by @ilai-deutel)

    Installation instructions
    1. Installation with an AUR helper, for instance using paru:

      paru -Syu melody
      
    2. Install manually with makepkg:

      git clone https://aur.archlinux.org/melody.git
      cd melody
      makepkg -si
      
  • NixOS (maintained by @jyooru)

    Installation instructions
    1. Declarative installation using /etc/nixos/configuration.nix:

      { pkgs, ... }:
      {
        environment.systemPackages = with pkgs; [
          melody
        ];
      }
      
    2. Imperative installation using nix-env:

      nix-env -iA nixos.melody
      

CLI Usage

USAGE:
    melody [OPTIONS] [INPUT_FILE_PATH]

ARGS:
    <INPUT_FILE_PATH>    Read from a file
                         Use '-' and or pipe input to read from stdin

OPTIONS:
    -f, --test-file <TEST_FILE>
            Test the compiled regex against the contents of a file

        --generate-completions <COMPLETIONS>
            Outputs completions for the selected shell
            To use, write the output to the appropriate location for your shell

    -h, --help
            Print help information

    -n, --no-color
            Print output with no color

    -o, --output <OUTPUT_FILE_PATH>
            Write to a file

    -r, --repl
            Start the Melody REPL

    -t, --test <TEST>
            Test the compiled regex against a string

    -V, --version
            Print version information

Changelog

See the changelog here or in the release page

Syntax

Quantifiers

  • ... of - used to express a specific amount of a pattern. equivalent to regex {5} (assuming 5 of ...)
  • ... to ... of - used to express an amount within a range of a pattern. equivalent to regex {5,9} (assuming 5 to 9 of ...)
  • over ... of - used to express more than an amount of a pattern. equivalent to regex {6,} (assuming over 5 of ...)
  • some of - used to express 1 or more of a pattern. equivalent to regex +
  • any of - used to express 0 or more of a pattern. equivalent to regex *
  • option of - used to express 0 or 1 of a pattern. equivalent to regex ?

All quantifiers can be preceded by lazy to match the least amount of characters rather than the most characters (greedy). Equivalent to regex +?, *?, etc.

Symbols

  • <char> - matches any single character. equivalent to regex .
  • <whitespace> - matches any kind of whitespace character. equivalent to regex \s or [ \t\n\v\f\r]
  • <newline> - matches a newline character. equivalent to regex \n
  • <tab> - matches a tab character. equivalent to regex \t
  • <return> - matches a carriage return character. equivalent to regex \r
  • <feed> - matches a form feed character. equivalent to regex \f
  • <null> - matches a null characther. equivalent to regex \0
  • <digit> - matches any single digit. equivalent to regex \d or [0-9]
  • <vertical> - matches a vertical tab character. equivalent to regex \v
  • <word> - matches a word character (any latin letter, any digit or an underscore). equivalent to regex \w or [a-zA-Z0-9_]
  • <alphabetic> - matches any single latin letter. equivalent to regex [a-zA-Z]
  • <alphanumeric> - matches any single latin letter or any single digit. equivalent to regex [a-zA-Z0-9]
  • <boundary> - Matches a character between a character matched by <word> and a character not matched by <word> without consuming the character. equivalent to regex \b
  • <backspace> - matches a backspace control character. equivalent to regex [\b]

All symbols can be preceeded with not to match any character other than the symbol

Special Symbols

  • <start> - matches the start of the string. equivalent to regex ^
  • <end> - matches the end of the string. equivalent to regex $

Unicode Categories

Note: these are not supported when testing in the CLI (-t or -f) as the regex engine used does not support unicode categories. These require using the u flag.

  • <category::letter> - any kind of letter from any language
    • <category::lowercase_letter> - a lowercase letter that has an uppercase variant
    • <category::uppercase_letter> - an uppercase letter that has a lowercase variant.
    • <category::titlecase_letter> - a letter that appears at the start of a word when only the first letter of the word is capitalized
    • <category::cased_letter> - a letter that exists in lowercase and uppercase variants
    • <category::modifier_letter> - a special character that is used like a letter
    • <category::other_letter> - a letter or ideograph that does not have lowercase and uppercase variants
  • <category::mark> - a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.)
    • <category::non_spacing_mark> - a character intended to be combined with another character without taking up extra space (e.g. accents, umlauts, etc.)
    • <category::spacing_combining_mark> - a character intended to be combined with another character that takes up extra space (vowel signs in many Eastern languages)
    • <category::enclosing_mark> - a character that encloses the character it is combined with (circle, square, keycap, etc.)
  • <category::separator> - any kind of whitespace or invisible separator
    • <category::space_separator> - a whitespace character that is invisible, but does take up space
    • <category::line_separator> - line separator character U+2028
    • <category::paragraph_separator> - paragraph separator character U+2029
  • <category::symbol> - math symbols, currency signs, dingbats, box-drawing characters, etc
    • <category::math_symbol> - any mathematical symbol
    • <category::currency_symbol> - any currency sign
    • <category::modifier_symbol> - a combining character (mark) as a full character on its own
    • <category::other_symbol> - various symbols that are not math symbols, currency signs, or combining characters
  • <category::number> - any kind of numeric character in any script
    • <category::decimal_digit_number> - a digit zero through nine in any script except ideographic scripts
    • <category::letter_number> - a number that looks like a letter, such as a Roman numeral
    • <category::other_number> - a superscript or subscript digit, or a number that is not a digit 0–9 (excluding numbers from ideographic scripts)
  • <category::punctuation> - any kind of punctuation character
    • <category::dash_punctuation> - any kind of hyphen or dash
    • <category::open_punctuation> - any kind of opening bracket
    • <category::close_punctuation> - any kind of closing bracket
    • <category::initial_punctuation> - any kind of opening quote
    • <category::final_punctuation> - any kind of closing quote
    • <category::connector_punctuation> - a punctuation character such as an underscore that connects words
    • <category::other_punctuation> - any kind of punctuation character that is not a dash, bracket, quote or connectors
  • <category::other> - invisible control characters and unused code points
    • <category::control> - an ASCII or Latin-1 control character: 0x00–0x1F and 0x7F–0x9F
    • <category::format> - invisible formatting indicator
    • <category::private_use> - any code point reserved for private use
    • <category::surrogate> - one half of a surrogate pair in UTF-16 encoding
    • <category::unassigned> - any code point to which no character has been assigned

These descriptions are from regular-expressions.info

Character Ranges

  • ... to ... - used with digits or alphabetic characters to express a character range. equivalent to regex [5-9] (assuming 5 to 9) or [a-z] (assuming a to z)

Literals

  • "..." or '...' - used to mark a literal part of the match. Melody will automatically escape characters as needed. Quotes (of the same kind surrounding the literal) should be escaped

Raw

  • `...` - added directly to the output without any escaping

Groups

  • capture - used to open a capture or named capture block. capture patterns are later available in the list of matches (either positional or named). equivalent to regex (...)
  • match - used to open a match block, matches the contents without capturing. equivalent to regex (?:...)
  • either - used to open an either block, matches one of the statements within the block. equivalent to regex (?:...|...)

Assertions

  • ahead - used to open an ahead block. equivalent to regex (?=...). use after an expression
  • behind - used to open an behind block. equivalent to regex (?<=...). use before an expression

Assertions can be preceeded by not to create a negative assertion (equivalent to regex (?!...), (?<!...))

Variables

  • let .variable_name = { ... } - defines a variable from a block of statements. can later be used with .variable_name. Variables must be declared before being used. Variable invocations cannot be quantified directly, use a group if you want to quantify a variable invocation

    example:

    let .a_and_b = {
      "a";
      "b";
    }
    
    .a_and_b;
    "c";
    
    // abc
    
    

Extras

  • /* ... */, // ... - used to mark comments (note: // ... comments must be on separate line)

File Extension

The Melody file extensions are .mdy and .melody

Crates

  • melody_compiler - The Melody compiler 📦 📖
  • melody_cli - A CLI wrapping the Melody compiler 📦 📖
  • melody_wasm - WASM bindings for the Melody compiler

Extensions

  • VSCode
  • IntelliJ

Packages

  • NodeJS
  • Deno

Integrations

  • Babel Plugin

Performance

Last measured on v0.13.10

Measured on an 8 core 2021 MacBook Pro 14-inch, Apple M1 Pro using criterion:

  • 8 lines:

    compiler/normal (8 lines)                        
                            time:   [3.6734 us 3.6775 us 3.6809 us]
    slope  [3.6734 us 3.6809 us] R^2            [0.9999393 0.9999460]
    mean   [3.6726 us 3.6854 us] std. dev.      [3.8234 ns 15.619 ns]
    median [3.6703 us 3.6833 us] med. abs. dev. [1.3873 ns 14.729 ns]
    
  • 1M lines:

    compiler/long input (1M lines)                        
                            time:   [344.68 ms 346.83 ms 349.29 ms]
    mean   [344.68 ms 349.29 ms] std. dev.      [1.4962 ms 4.9835 ms]
    median [344.16 ms 350.06 ms] med. abs. dev. [407.85 us 6.3428 ms]
    
  • Deeply nested:

    compiler/deeply nested  
                            time:   [3.8017 us 3.8150 us 3.8342 us]
    slope  [3.8017 us 3.8342 us] R^2            [0.9992078 0.9989523]
    mean   [3.8158 us 3.8656 us] std. dev.      [8.8095 ns 65.691 ns]
    median [3.8144 us 3.8397 us] med. abs. dev. [2.5630 ns 40.223 ns]
    

To reproduce, run cargo bench or cargo xtask benchmark

Future Feature Status

🐣 - Partially implemented

❌ - Not implemented

❔ - Unclear what the syntax will be

❓ - Unclear whether this will be implemented

Melody Regex Status
not "A"; [^A] 🐣
variables / macros 🐣
<...::...> \p{...} 🐣
not <...::...> \P{...} 🐣
file watcher ❌
multiline groups in REPL ❌
flags: global, multiline, ... /.../gm... ❔
(?) \# ❔
(?) \k<name> ❔
(?) \uYYYY ❔
(?) \xYY ❔
(?) \ddd ❔
(?) \cY ❔
(?) $1 ❔
(?) $` ❔
(?) $& ❔
(?) x20 ❔
(?) x{06fa} ❔
any of "a", "b", "c" * [abc] ❓
multiple ranges * [a-zA-Z0-9] ❓
regex optimization ❓
standard library / patterns ❓
reverse compiler ❓

* these are expressable in the current syntax using other methods

Issues

Collection of the latest Issues

Aloso

Aloso

Comment Icon3

Hello Yoav,

(This is the continuation of this comment.)

I'm the maintainer of Pomksy. I wrote a page that compares Pomsky with several other tools and languages, including Melody. You can find it here.

Since I'm not all that familiar with Melody, I'd appreciate it if you could check that all the information about melody is correct, or if any part needs clarification. I put a lot of work into accumulating this data, but I want to be sure that I don't misrepresent your project, since I'm obviously biased.

Thank you in advance! And if you have any questions about Pomsky, feel free to ask!

trezy

trezy

enhancement
Comment Icon1

Copied from this Reddit conversation

I'd love to see support for interpolation of template literals in the compiled RegEx from the Babel plugin. For example, I can almost accomplish this with the Babel compiler using Melody's raw method:

It seems that this could be fixed just by wrapping the string output in backticks instead of quotes. I originally assumed this would have unintended consequences, but since $, {, and } are all special RegEx characters, they're automatically escaped in string literals. This protects us from misinterpreted literals:

I did come up with this while sleepy, sp=o it's entirely possible that I may be missing something. 🤔

kgutwin

kgutwin

enhancement
Comment Icon1

Imagine something like this:

If you have the ability to embed tiny unit tests in your regex declaration, then this could substantially help both to catch regressions and to document the intent behind the regex. The unit tests would be run at compile time, raising something akin to a syntax error if they fail (with clear output as to why they failed, to make it easier to fix).

gjvnq

gjvnq

enhancement
Comment Icon4

Just some syntax ideas you may find useful.

Melody Regex Status
maybe a little of or lazily *? ❔
rematch 𝐷 \𝐷 ❔
rematch 𝑛𝑎𝑚𝑒 \k<𝑛𝑎𝑚𝑒> ❔
unicode class ... \p{...} ❔
unicode except clas ... \P{...} ❔
U+𝑋𝑋𝑋𝑋 \u𝑋𝑋𝑋𝑋 ❔
X+𝑋𝑋 \x𝑋𝑋 ❔
o𝐷𝐷𝐷 (e.g. o700) \𝐷𝐷𝐷 ❔
^𝑌 \c𝑌 ❔
word boundary \b ❔
word non boundary \B ❔
rematch 1 $1 ❔
insert { until match } $` ❔
insert { full match } $& ❔
Could not find this notation anywhere. Typo? x20 ❔
Could not find this notation anywhere. Typo? x{06fa} ❔

Also, allow multiple regexps per file and let them refer to previous ones, e.g.:

Versions

Find the latest versions by id

v0.18.1 - Jun 25, 2022

Fixes

  • Fixes playground link (#83)

Dependencies

  • Updates dependencies

Refactoring

  • Clippy fixes

v0.18.0 - Apr 24, 2022

Features

Misc.

  • Update dependencies

v0.17.0 - Apr 23, 2022

Features

  • Add support for testing matches in a file in the CLI

Refactoring

  • Remove anyhow in compiler in favor of emitting specific error variants

v0.16.0 - Apr 13, 2022

Features

  • Adds support for testing matches in CLI

v0.15.0 - Apr 13, 2022

Features

  • Add shell completions for CLI
  • Add Deno support

v0.14.3 - Apr 11, 2022

Fixes

  • Fixes the REPL not working due to argument validation

v0.14.2 - Apr 11, 2022

Fixes

  • Fixes the CLI output to add a newline

v0.14.0 - Apr 11, 2022

Features

  • Support stdin in CLI
  • Emit proper exit codes on specific errors

v0.13.10 - Apr 05, 2022

Fixes

  • Fixes unnecessary grouping in quantifiers

v0.13.5 - Mar 11, 2022

Tooling

  • Strips binaries (v0.13.4)

Dependencies

  • Updates dependencies (v0.13.4)

Refactoring

  • Reports a few possible panics with a ParseError (v0.13.2)
  • Replaces lazy_static with once_cell (v0.13.3)

Performance

  • Improves literal parse performance (v0.13.2)

v0.13.1 - Mar 08, 2022

Fixes

  • Fixes an issue with single letter variable identifiers matching a following space
  • Fixes a clash between REPL commands and variables

v0.13.0 - Mar 08, 2022

Breaking

  • <alphabet> is now <alphabetic>

Features

  • Support for lazy quantifiers
  • All symbols now have negative counterparts
  • <alphanumeric> symbol added
  • Adds an experimental implementation of variables

v0.12.4 - Mar 06, 2022

Fixes

Fixes an issue with identifying negative char ranges

Performance

Performance improvements (v0.12.2)

v0.12.0 - Mar 04, 2022

Breaking

  • Produces clean output (no // and new newline after output)

Features

  • Adds favicons for documentation and playground
  • The Melody playground now supports add to homescreen
  • Adds #![forbid(unsafe_code)]

Benchmarks

  • Adds benchmarks

v0.11.0 - Mar 02, 2022

Breaking

  • ParseError now contains only one message field, may be changed in the future
  • Line comments (//) may only be used in a separate line
  • The REPL currently accepts blocks on a single line but not multiple lines
  • Semicolons are no longer optional

Features

  • Uses a Pest grammar and an AST to parse Melody
  • Adds support for nested groups
  • Adds support for negative ranges
  • Adds initial support for negative character classes
  • Adds support for <backspace>, <boundary>
  • Adds support for inline comments
  • Enforces group closing
  • Supports NO_COLOR in CLI
  • -n removes color from REPL as well

v0.10.0 - Feb 26, 2022

Breaking

  • Changes the -f, --file CLI argument to -o, --output

Features

  • Adds descriptions to CLI commands

v0.9.0 - Feb 26, 2022

Features

  • Adds ahead, not ahead, behind and not behind assertions

v0.8.0 - Feb 26, 2022

Features

  • Changes <space> to <whitespace> (thanks @amirali #34)
  • Adds <space> and <alphabet> (thanks @amirali #34)
  • Adds long versions for REPL commands
  • Adds .s, .source to print the current source in the REPL
  • Adds .c, .clear to clear REPL history
  • Adds better error reporting to the playground

Fixes

  • Fixes some undo / redo issues in the REPL

Refactoring

  • Better error handling in the CLI

v0.7.0 - Feb 24, 2022

Features

  • Adds a REPL for melody_cli
  • Adds better error messages for the playground

v0.6.0 - Feb 23, 2022

Features

  • Adds support for raw sequences (`...`)
  • Allows any word character in capture names
  • Adds auto escaping for literals
  • Adds the Melody version number to the documentation

Syntax Changes

  • Changes start, end, and char to symbols (<start>, <end>, <char>)
  • either creates a non capturing group

Refactoring

  • cargo clippy fixes in melody_wasm

Fixes

  • Uses the correct url in the documentation site config

v0.5.0 - Feb 22, 2022

Features

  • Adds any of

v0.4.2 - Feb 22, 2022

Fixes

  • Adds quantifier support for either

v0.4.1 - Feb 22, 2022

Fixes

  • Disallows nested groups

v0.4.0 - Feb 22, 2022

Features

  • Adds support for either blocks

v0.3.0 - Feb 22, 2022

Features

Adds support for not for <word>, <space> and <digit> #18 (thanks @Omikorin!)

v0.2.0 - Feb 22, 2022

Features

  • Adds the option keyword

v0.1.1 - Feb 19, 2022

Documentation

Adds documentation for the melody_cli crate

melody_cli-v0.1.1 - Feb 19, 2022

Features

Melody CLI first cargo release

Information - Updated Sep 18, 2022

Stars: 3.9K
Forks: 55
Issues: 8

Repositories & Extras

Rust bindings for libinjection

Add libinjection to dependencies of Cargo

Rust bindings for libinjection

Rust bindings for the C++ api of PyTorch

LIghtweight wrapper for pytorch eg libtorch in rust

Rust bindings for the C++ api of PyTorch

Rust leveldb bindings

Almost-complete bindings for leveldb for Rust

Rust leveldb bindings

rust-analyzer is a modular compiler frontend for the Rust language

It also contains some tips &amp; tricks to help you be more productive when using rust-analyzer

rust-analyzer is a modular compiler frontend for the Rust language

Rust-Lightning is a Bitcoin Lightning library written in Rust

lightning, does not handle networking, persistence, or any other I/O

Rust-Lightning is a Bitcoin Lightning library written in Rust

Rust FUSE - Filesystem in Userspace

Rust library crate for easy implementation of Crate documentation

Rust FUSE - Filesystem in Userspace
CLI

5.2K

grex regular expression (regex) rust library that makes using regex a snap

Do I still need to learn to write regexes in Rust then?

grex regular expression (regex) rust library that makes using regex a snap

Rust bindings for the Oniguruma regex library, a powerful and mature regular expression library with...

Oniguruma regex library, a powerful and mature regular expression library with support for a wide range of character sets and language syntaxes

Rust bindings for the Oniguruma regex library, a powerful and mature regular expression library with...

regress - REGex in Rust with EcmaScript Syntax

regress is a backtracking regular expression engine implemented in Rust, which targets JavaScript regular expression syntax

regress - REGex in Rust with EcmaScript Syntax

Rust wrapper of the Vimba library for Allied Vision cameras

To regenerate the bindings on Windows:

Rust wrapper of the Vimba library for Allied Vision cameras

Rust regex explanations in PWA

640 date: 2020-08-10 author: GitHub

Rust regex explanations in PWA

automata-toolbox

RUST library for creating DFAs, NFAs, and converting regex to them

automata-toolbox
Facebook Instagram Twitter GitHub Dribbble
Privacy