A Bytecode Alliance project

wasm-tools

A Bytecode Alliance project

Rust tooling for low-level manipulation of WebAssembly modules

Tools included

This project is intended to house a number of tools related to the low-level workings of WebAssembly. The top-level crate here ties everything together but isn't currently intended for general use. Instead you probably want to take a look at the sub-crates:

  • wasmparser - a library to parse WebAssembly binaries
  • wat - a library to parse the WebAssembly text format
  • wast - like wat, except provides an AST
  • wasmprinter - prints WebAssembly binaries in their string form
  • wasm-smith - a WebAssembly test case generator
  • wasm-encoder - a crate to generate a binary WebAssembly module

License

This project is licensed under the Apache 2.0 license with the LLVM exception. See LICENSE for more details.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be licensed as above, without any additional terms or conditions.

Issues

Collection of the latest Issues

shidel-dev

shidel-dev

0

I am looking into generating Relocation Sections as described here. I believe in order to do this with wasm_encoder I would need to have a method on CodeSection, and Function to get the current offset so that when adding an instruction that for example references memory indexes I can create a corresponding entry in the reloc.CODE custom section.

I propose that a method be added to CodeSection, and Function (maybe other sections as well?) in order to get the current byte length.

Maybe fn byte_len() -> usize?

alexcrichton

alexcrichton

wast
0

Modules which look like:

get automatically expanded to:

During the expansion process for doing this, though, insertion of the new (type ...) uses Span::from_offset(0) for much of its span information. The reason for this is that the TypeUse<FunctionType> here specifically doesn't have a span within FunctionType and namely in the example above there's nothing to assign the span to (ideally it's sort of func). This is also the case with the component model for:

getting expanded to

Here though we don't keep track of the span of the list keyword and otherwise there isn't great span information to insert.

I don't know really how important this is to fix. I don't think it's actually possible to get any errors which use these spans, so it could be entirely benign to leave all these spans as 0. I still feel though that it would probably be best to figure out a good way to thread through a proper span here.

alexcrichton

alexcrichton

wasmparser
3

Currently values are conditionally defined in a module whether or not the start function returns a value. This is a little surprising to me (question for the spec layer less so here) that unit is specifically ignored, so we may want to confirm upstream that unit, and I guess only unit is supposed to be ignored?

Additionally the check here isn't 100% right in the sense that if a TypeId is used which points to unit (as-is the case today due to https://github.com/bytecodealliance/wasm-tools/issues/600) then a value is defined, meaning that a value definition happens depending on how the function type itself is defined.

nagisa

nagisa

5

Today we have a couple of WebAssembly specifications – the wasm-core-1 W3C Recommendation is a snapshot from 2019 and then there is a living draft for 1.1 which includes multiple new proposals, including, but not limited to, multi-value, simd, saturating-int-to-float and sign-extension-ops.

Some users (e.g. me) of wasmparser, wasm-smith etc. want to constrain themselves to the more conservative wasm-core-1 snapshot, as can be seen by PRs such as https://github.com/bytecodealliance/wasm-tools/pull/482 or my own https://github.com/bytecodealliance/wasm-tools/pull/525. In my case it is because accidentally allowing some functionality through forces me to maintain it in perpetuity, but also because the VM I maintain currently targets wasm-core-1.

I know that the wasm-tools are largely maintained for the purpose of wasmtime, which may or may not care at all about wasm-core-1. Adopting a more principled strategy here would naturally require more effort from people looking to add implementations for the new proposals. My thinking is still that it would be nice to make wasm-core-1 a first-class target for the tooling provided by this repository. I'm willing to dedicate some of my time to set this up in a way that reduces the overhead as much as possible for future contributors.

First, I could see myself investigating an implementation of a test suite or some such that ensures a validator for wasm-core-1 does not accept instructions or constructs that postdate wasm-core-1, or that wasm-smith and other tools does not deal in these same constructs, where appropriate.

I would also like to discuss adding additional APIs that focus wasm-core-1. For example a WasmFeatures::wasm_core_1 seems like it could be a pretty non-controversial addition.

Would such an effort/direction/strategy make sense to you?

Plecra

Plecra

wasmparser
1

Because they have an unbounded size, only determined by the actual implementation, it'd be great to have a CustomSectionStart payload which would give the caller with opportunity to skip the custom section when it is large.

fitzgen

fitzgen

3

Here is a list of what is required to migrate Bytecode Alliance projects to the new module linking spec that is based on the component model, rather than the old module linking spec that was based on core Wasm. Putting this list in the wasm-tools repo because most of the crates are here, and the work needs to start here. Let me know if I'm missing something and/or just edit this issue to add a checkbox for the missing item!

  • wasmparser (@peterhuene) PR: #484

  • wat and wast (@sunfishcode) PR: #621

  • wasm-encoder (@peterhuene) PR: #448

  • wasmprinter (@peterhuene) PR: #496

    • (depends on wasmparser)
  • wasm-smith (@fitzgen)

    • (depends on wasm-encoder)
  • wasm-mutate

    • (depends on wasm-encoder)
  • wasm-tools local tests

    • (depends on wat, wasmparser)
  • wasm-tools fuzz targets

    • (depends on wat, wasmparser, wasmprinter, and wasm-smith)
  • wasm-tools dump PR: https://github.com/bytecodealliance/wasm-tools/pull/549

    • (depends on wat, wasmparser)
  • wasm-tools objdump PR: https://github.com/bytecodealliance/wasm-tools/pull/555

    • (depends on wat, wasmparser)
  • Wasmtime

    • (depends on wat, wasm-encoder, and wasmparser)
  • wit-bindgen

    • (depends on wat, wasm-encoder, wasmparser, and Wasmtime)
  • wasm-link

    • (depends on wat, wasm-encoder, wasmparser, and Wasmtime)
  • Wizer (https://github.com/bytecodealliance/wizer/issues/48)

    • (depends on wat, wasm-encoder, wasmparser, and Wasmtime)
  • Interface Types + Canonical ABI test case generator

cc @peterhuene @alexcrichton @tschneidereit @lukewagner

fitzgen

fitzgen

good first issue
8

We should have an add, edit, reorder, and remove mutator for each Wasm section (where it makes sense). This is a tracking issue for each of these things.

In general, removing an entity should be something done optimistically, but which can fail if the entity is referenced in other sections (e.g. if we are trying to remove a global, but a function uses it via global.get or global.set).

Custom Sections

  • Add new custom section
  • Edit existing custom section
    • add bytes
    • remove bytes
    • edit bytes
  • Remove custom section
  • Move existing custom section

Type Section

  • Add new type
  • Edit type (if it isn't used)
  • Remove type
  • Reorder types

Import Section

  • Add new import
  • Edit import name/module
  • Edit import kind (if it isn't used)
  • Remove import
  • Reorder imports

Function Section

  • Add new function
  • Add a parameter (and update the code section as necessary)
  • Remove a parameter (and update the code section as necessary)
  • Remove function
  • Reorder functions

Table Section

  • Add a new table (if there aren't any tables or bulk memory is enabled)
  • Edit an existing table
  • Remove a table
  • Reorder tables

Memory Section

  • Add a new memory (if there aren't any memories or multi-memory is enabled)
  • Edit an existing memory
  • Remove a memory
  • Reorder memories

Global Section

  • Add a new global
  • Edit an existing global
  • Remove a global
  • Reorder globals

Export Section

  • Add a new export
  • Rename an existing export
  • Edit an existing export to export a different entity
  • Remove an export

Start Section

  • Add a start section (if there isn't one, and there is a function of the right type)
  • Change which function is the start function in an existing start section (if there is another of the right type)
  • Remove the start section

Element Section

  • Add a new element segment
  • Edit an existing element segment
    • Reorder functions in the segment
    • Add a new function to the segment
    • Remove a function from the segment (needs to be careful of ref.funcs referencing this function)
  • Remove an element segment
  • Reorder element segments

Code Section

  • Remove unused locals
  • Outline an expression into a new function
  • Inline a function call
  • Swap two expressions that don't have any ordering between each other

Peephole Mutations

  • Allow writing rules with wildcard immediates (e.g. (elem.drop.*) => (container) instead of (elem.drop.0) => (container) and (elem.drop.1) => (container), etc...)
  • Replace an f64 or f32 value with NaN (when not preserving semantics)
  • Replace a value with a use of an existing local of the same type (when not preserving semantics)
  • Replace a value with a use of an existing global of the same type (when not preserving semantics)
  • Swap operands of the same arity, e.g. turn (i32.add ?x ?y) into (i32.mul ?x ?y), etc... (when not preserving semantics)
  • Completely remove expressions that don't produce any result values, e.g. stores and void calls (when not preserving semantics)
  • Insert expressions that don't produce any result in arbitrary places, e.g. stores and void calls (when not preserving semantics)

Code Motion Mutations

  • Replace the body of an if/else/loop/block with unreachable (if not preserving semantics)
  • Replace if .. else .. end with just the consequent or just the alternative as a block (if not preserving semantics)
  • Replace loop with block and vice versa (if not preserving semantics)
  • Delete a whole if/else, block, or loop, dropping parameters and producing dummy value results as necessary (if not preserving semantics)
  • Inline a block .. end as just the .. part (will require checking that there are no br instructions targeting the block, as well as renumbering nested brs that jump to some label outside this block)

Data Section

  • Add a new data segment
  • Edit an existing data segment
    • Reorder bytes within the segment
    • Add bytes to the segment
    • Remove bytes from the segment
  • Remove a data segment
  • Reorder data segments

Data Count Section

Not really applicable. More of something we just have to keep up to date based on mutations of the data section.

evmar

evmar

wasmparser
3

I looked at wasmparser and naively expected the API to work with a std::io::BufReader as input, rather than requiring the full file contents up front.

Is there something fundamental to the WASM format that precludes this? Was it an intentional design decision here to not use such a thing? Glancing at binary_reader, it seems it works with a cursor-like concept anyway.

LHolten

LHolten

wast
1

I'm trying to parse a WAT Expression and emit the instructions into a wasm-encoder Function. This is for a language compiler i'm writing that has inlines WAT code.

The problem is that wast::Instruction is not compatible with wasm_encoder::Instruction. Also the wast::binary::Encode trait is private, so i can't use that in combination with wasm_encoder::Function::raw either.

My suggestion is to either make the wast Encode trait public, or implement From<wast::Instruction> for wasm_encoder::Instruction.

fitzgen

fitzgen

wasm-smith
0

I think we could do this with a post-processing pass, similar to what we do with ensure_termination.

We'd walk over each instruction and potentially insert some code right before it:

  • We would insert a couple instructions to ensure that a division instruction's denominator is never zero
  • We would insert a couple instructions to mask heap addresses to ensure they are within the memory's minimum size
  • Similar for table.get and table.set
  • Similar for trapping floating point conversion instructions
  • Every unreachable would be replaced with code to create dummy result values (ie zeroes) and then br out of the current control frame

We would also have to make sure that active data/elem segments were always in bounds of their memories/tables.

I think that's everything? I might be missing some trapping cases, but I think the approach would work for everything.

cc @alexcrichton

eqrion

eqrion

wast
4

We're now going to use this crate to parse .wast scripts and convert them to specialized .js for testing in SpiderMonkey. [1]

One issue I ran into was how to output modules into JS. Ideally we'd output the module in text form, for easy debugging. The problem is how to get from a wast::Module to a text module.

I looked into two ways of doing this:

  1. wasmprinter::print(module.encode()) - this has the disadvantage that we lose info, such as $idNames. Additionally, some spec tests are changed semantically when round-tripped like this.
  2. Use module.span and a handwritten scanner to mind the end of the S-expr for the module in the .wast file, then copying the string from the full-span.

(2) is the approach I went with and it works surprisingly well, once the scanner was written (look for closed_module in the phab diff. But I'm a little worried this is fragile, and it feels like getting the end of the span from the parser would be a better solution.

[1] https://phabricator.services.mozilla.com/D111306

alexcrichton

alexcrichton

wasmparser
0

Currently Validator stores many of its types and such in locally-defined vectors, but this necessitates heap allocation and support of all possible wasm modules. We should ideally support a form of generic validation which doesn't require any heap allocation at all, for example having maximums of functions, types, etc.

alexcrichton

alexcrichton

wasmparser
1

There's currently two primary reasons that a wasm module has bits and pieces of it which end up being "double parsed":

  • First is that the BinaryReader has some skip_* methods which are used. These methods are typically used to conform to the iterator protocol of Rust. For example when looking at ElementSectionReader when you call read it acts as an iterator, repositioning at the next element. The processing of the first element, however, happens by the consumer, which must happen afterwards. This means that the read method must skip to the next element for the next call to `read. Affected locations are:

    • ElementSectionReader::read
    • DataSectionReader::read
    • GlobalSectionReader::read
    • FunctionBody::get_operators_reader
    • FunctionLocalReader::read (and probably more in this file)
    • InstanceSectionReader::read
  • Secondly the API design of the Validator type is such that it will always parse "header" sections, and then consuming applications (like wasmtime) are likely to then re-parse content of the section again. For example Validator::import_section will parse the import section, but then wasmtime also will iterate over the import section, re-parsing everything.

In general this isn't a massive concern because the header sections are likely all dwarfed in size by the code section so parsing is quite fast. Nonetheless we should strive to parse everything in a wasm file precisely once. I think to fix this we'll need two features, one for each problem above:

  1. For the first issue I think we'll want to move towards a more advancing-style API rather than an iterator-based API. For example we'd have a dedicated type for reading the element section, and you'd say "read the header" followed by "read the elements". We might be able to use Drop and clever trickery to skip over data that wasn't explicitly read, or we could simply panic if methods aren't called in the right order. The downside of this is that consumers are likely going to get a little more complicated, but this may be fixable with clever strategies around APIs. I'm not sure how this would exactly look like.

  2. For the second issue we'll want to add more APIs to the validator. For example instead of taking the import section as a whole we'd probably want to add something like "the import section is starting with this many items" which gives you a "sub-validator" which is used to validate each import after parsing. What I'm roughly imagining is that the application does all the parsing and then just after parsing feeds in everything to the validator. Another possible alternative is a "validating parser" which automatically feeds parsed values into the validator before handing them to the application. I'm not sure if this alternative is possible with "parse everything precisely once", however, since for example the element section ideally shouldn't be parsed twice, just once.

Robbepop

Robbepop

wasmparser
0

Currently the Operator enum has a lifetime attached which makes it a bit harder to work with in some contexts.

The only reason for this is its BrTable { table: BrTable<'a> } variant with which it is possible as user of this crate to lazily (but necessarily) parse the targets of the branching table.

Proposed Design

The solution I wanted to propose here is to split the single Operator::BrTable variant into 3 variants:

  1. Operator::BrTableStart { count_targets: u32 }
  2. Operator::BrTableTarget { depth: u32 }
  3. Operator::BrTableDefault { depth: u32 } These variants will appear in exactly this order for every BrTable in the Wasm input if parsed by the wasmparser crate. The count_targets: u32 field tells the user how many Operator::BrTableTarget variants are to be expected directly afterwards, followed by a single Operator::BrTableDefault variant.

Alternative Design

An alternative design would be to treat the Operator::BrTableDefault as just another Operator::BrTableTarget and therefore the count_targets: u32 field in BrTableStart simply reflects the number of all targets as well as the additional default target.

What does this solve?

  1. This proposal would eliminate the lifetime in the very big Operator enum. However, it would also be a breaking change and would probably result in potentially significant changes in how users handle BrTable inputs.

  2. Additional benefits are that the Operator enum could then more or less easily derive a lot more useful traits such as:

    • Copy
    • PartialEq and Eq
    • PartialOrd and Ord
    • Hash
  3. Also this proposal solves the current double parsing of BrTable as convenience type. The current implementation first parses the br_table just for validating the structure but throws all the information away so that the user can use this convenience type to re-parse and store the information. With this proposal the double parsing would be gone.

Streamlined Design

In my opinion this design would streamline the design of the Operator enum since it already splits some entities into multiple operators such as If, Else and End or Block and End or Loop and End etc. The split of BrTable would be just another.

Existing Works

It seems that the widely used wasmi interpreter is already using a similar technique to handle branching tables as can be seen here: (Please note that I am not familiar with its codebase.) https://github.com/paritytech/wasmi/blob/master/src/isa.rs#L359

I am curious if this idea has already been discussed and what others think about it.

alexcrichton

alexcrichton

wasm-smith
2

Wasm-smith is proving very useful at testing whether or not engines resource-constrain wasm modules. For example it has expose https://github.com/bytecodealliance/wasm-tools/issues/179 as a weakness in validating module-linking modules.

I think it'd be useful to add a SwarmConfig-style generator for wasm-smith where items in the module itself are practically unlimited and may exceed implementation limits. That way we can test that wasmparser doesn't ever have a blowup of resources on these modules, nor does wasmtime itself blow up in resources. We would just need to thread through a boolean that the wasm-smith-generated-module may not be valid.

Robbepop

Robbepop

wasmparser
2

tl;dr

We might have a discoverability problem with all the Wasm proposals and extensions implemented in this crate.

Problem

When working with the very low level and generic wasmparser crate I often run into the same problem over and over again: The wasmparser crate provides support not only for the WebAssembly MVP but also for many of the proposals and extensions that are finished or even under development. For me as a person that does not have the entire set of proposals and extensions in my head it sometimes can be hard to infer what enum variant data structure, routine or even some fields are actually interesting for me and my projects that only supports a defined subset of the entire set of proposals and extensions of even just the MVP. As of today using the wasmparser often makes me feel that I have to know in advance all these proposals and how they extend the MVP set of definitions so that I can successfully filter out what I do not need and eventually provide useful information to the user for why something is (currently) not supported in my project.

Symptoms

My current solution when I find some Wasm definitions that are unknown to me is to google for them (with mostly not so much success) followed by a query through the entire set of proposals and extensions for Wasm which can becomes very frustrating quickly if it isn't very obvious sometimes where some definitions could potentially come from. Also not very productive although I have to admit that you learn a lot in the process, however, not very helpful if you want to progress with your project.

Solution

What would help me?

This is actually pretty simple: Each and every field, parameter, data type or enum variant that is the result of some non Wasm MVP proposal or extension needs to have one line of documentation that states which proposals or extensions imply its existence. This is all information I need to be able to do some further research and be able to easily identify parts that I do or do not need to support in case I do not know about all the proposals and all of their details that might even be under active development.

Example

If for every such definitions there was a line of documentation that states something like This definition is part of the Wasm proposal <proposal_name>. E.g. This definition is part of the Wasm proposal for bulk memory operations.. This allows a user of the library to then simply query this particular spec and directly find out all the details or immediate decide if this definitions is actually useful for the project at hand.

Why wasmparser ?

Now the obvious question is: Why should we add this to wasmparser? This might be interesting for literally all low level Wasm libraries which would be way too much work. The wasmparser crate is just the natural entry point into Wasm and therefore suited very well for an initial implementation of these docs.

I think those documentation strings are not hard to maintain since once a proposal is finished they won't be changing a lot anymore.

Implementation

I am sure we can incrementally get there. But what this issue demands is that in the future for everything we add to wasmparser we have this one line of documentation and eventually update the already existing parts to include it.

eqrion

eqrion

wast
1

wast::Instruction [1] lists all instructions that can be parsed by WebAssembly. Currently it is neither sorted by name, encoding, or proposal. This makes it a bit hard to find where an instruction is, and I'd be happy to open a PR to sort it in some way.

@alexcrichton What do you think about sorting by proposal and then sorting by encoding within proposal?

[1] https://github.com/bytecodealliance/wasm-tools/blob/e3f6090d993069c8abb15810ae82e12534cbbcc6/crates/wast/src/ast/expr.rs#L341

vitiral

vitiral

wast
1

I want to propose implementing the Display trait for all of the AST types to easily convert from AST to human readable wat/wast. I volunteer to implement this.

I am in the design phase of writing a language which is a strict extension of wat and wast (https://github.com/vitiral/wak-lang) and so would like to use this crate. However, one of the debug implementations of my compiler will be to export commented wat code so it is clear how code was generated. For instance:

Might "compile" into

(where (;@1:0;) is a nested comment representing the line number where the source code starts)

In order to implement this, I need all wat types to be printed to human-readable wat source. Obviously I could create my own traits and implement all the types into that, but I think this would be generally useful for other users of this crate.

Thanks!

yurydelendik

yurydelendik

wasmparser
4

There is still intent to support "no_std" feature for this crate. There is no need to support allocation hungry API such as WasmParser/WasmDecoder, so it will be beneficial to make this legacy API under feature -- the section readers is more performant API and has to be used at this moment.

TODOs:

  • Make WasmParser and friends optional via feature (and/or deprecate)
  • Cleanup sections readers to not use dynamic memory Box, Vec, HashMap

I think that will allow us to be no_std without depending on alloc or hashbrown crates.

RReverser

RReverser

wasmparser
2

Currently even data enums and structs (e.g. Type, FuncType, NameEntry, etc.), are missing Hash implementation, which makes it hard to use them in a HashMap for caching.

Would it be possible to add simple #[derive(Hash)] to all non-builder structs?

mbebenita

mbebenita

wasmparser
0

The intro README for this project doesn't say much. It would be great if we highlighted some of the performance and flexibility features of this parser, and why people should use it.

  • Highlight performance features.
  • Highlight that it's flexible and can be used in a streaming fashion, describe why this is important and why this architectural decision was made.
  • Give more context around the example, perhaps show the output of the example code.
  • Highlight that it is well tested, and that it is used as part of the Cretonne project.
  • Maybe draw a figure that describes the architecture.
stoklund

stoklund

wasmparser
0

In the read_function_body() function, the code reading the local variable declarations looks like this:

In a 32-bit build, the addition in locals_total += count as usize could overflow which causes a panic only in debug builds. In release builds it silently wraps.

A fuzz tester running on a 32-bit build would probably catch that.

Information - Updated Jun 11, 2022

Stars: 429
Forks: 84
Issues: 29

WebAssembly Smart Contracts for the Cosmos SDK

The following packages are maintained here:

WebAssembly Smart Contracts for the Cosmos SDK

Yew Parcel Template

to WebAssembly and hooking into a Brave Browser

Yew Parcel Template

rust-parcel-template

Kickstart your Rust, WebAssembly, and Parcel project!

rust-parcel-template

Rust WebAssembly A* Pathfinding Demo

This is a port of an A* implementation of mine from an old Unity maze project

Rust WebAssembly A* Pathfinding Demo

WebAssembly for Proxies (Rust SDK)

Articles &amp; blog posts from the community

WebAssembly for Proxies (Rust SDK)

Rust Web assembly game 1024

The game logic has been developed by Rust Programming Language

Rust Web assembly game 1024

Spellchecker + WebAssembly

When you absolutely, positively have to have the fastest spellchecker in the room, accept no substitutes

Spellchecker + WebAssembly

SAX (Simple API for XML) for WebAssembly

When you absolutely, positively have to have the fastest parser in the room, accept no substitutes

SAX (Simple API for XML) for WebAssembly
Facebook Instagram Twitter GitHub Dribbble
Privacy