Rust implementation of

Parsing CSS involves a series of steps:

rust-cssparser

Documentation

Rust implementation of CSS Syntax Module Level 3

Overview

Parsing CSS involves a series of steps:

  • When parsing from bytes, (e.g. reading a file or fetching an URL from the network,) detect the character encoding (based on a Content-Type HTTP header, an @charset rule, a BOM, etc.) and decode to Unicode text.

    rust-cssparser does not do this yet and just assumes UTF-8.

    This step is skipped when parsing from Unicode, e.g. in an HTML <style> element.

  • Tokenization, a.k.a. lexing. The input, a stream of Unicode text, is transformed into a stream of tokens. Tokenization never fails, although the output may contain error tokens.

  • This flat stream of tokens is then transformed into a tree of component values, which are either preserved tokens, or blocks/functions ({ … }, [ … ], ( … ), foo( … )) that contain more component values.

    rust-cssparser does this at the same time as tokenization: raw tokens are never materialized, you only get component values.

  • Component values can then be parsed into generic rules or declarations. The header and body of rules as well as the value of declarations are still just lists of component values at this point. See the Token enum for the data structure.

  • The last step of a full CSS parser is parsing the remaining component values into Selectors, specific CSS properties, etc.

    By design, rust-cssparser does not do this last step which depends a lot on what you want to do: which properties you want to support, what you want to do with selectors, etc.

    It does however provide some helper functions to parse CSS colors and An+B (the argument to :nth-child() and related selectors.

    See Servo’s style crate for an example of a parser based on rust-cssparser.

Issues

Collection of the latest Issues

MatsPalmgren

MatsPalmgren

0

There's an open cssom issue about serialization of <custom-ident>. Gecko currently serialize "\\31st" as "\\31 st" which violates the general principle that values should serialize to the shortest form (the space is redundant in this case). It appears it comes from hex_escape which unconditionally adds a space.

CC @emilio

makotokato

makotokato

0

From https://bugzilla.mozilla.org/show_bug.cgi?id=1503656

Actually there is no way to get TokenSerializationType directly from Token type. If it is TokenSerializationTypeVariants::Nothing, it has nothing() function to get it.

Even parsing CSS that is hardcoded value such as 0px, to get type, we have to create Token object like the following.

I am happy if cssparser has a function to get TokenSerializationType without creating object like TokenSerializationType::nothing().

darrell-roberts

darrell-roberts

9

Hi,

I was wondering why the ParseError struct does not implement the std::error::Error trait? I was hoping to wrap it in a custom error enum type and use the "?" operator when calling parse functions. From what I've tried so far has not allowed me to do so.

RazrFalcon

RazrFalcon

2

phf_codegen is a pretty big dependency and colors will not change from build to build, so maybe it's better to keep the colors map as a "prebuilt" file? Something like this.

The source of the problem:

It's like 20 dependencies just to build a phf map, that never changes.

jdm

jdm

3

The generated assembly for code using next_byte_unchecked like skip_whitespace has a suprising number of indirections in order to actually get the byte value. This is caused by the need to get the slice, then get the byte at the desired offset from it at https://github.com/servo/rust-cssparser/blob/682087fca5ba5f2f05a09bba72c62dac6b3d778d/src/tokenizer.rs#L372. If we use store a pointer and increase its instead of the offset, it should cause more efficient code to be generated.

SimonSapin

SimonSapin

5

System colors are deprecated, but what exactly should Servo do with them?

  • Not support them at all: treat them as invalid like any other unknown keyword
  • Interpret them all like initial
  • Treat as white those that have the word "background" in their description, others as black (per Tab’s Level 4 proposal)
  • Something else?

@fantasai, do you have an opinion?

Information - Updated Jun 20, 2022

Stars: 504
Forks: 110
Issues: 18

Parsing Expression Grammars in Rust

rust-peg is a simple yet flexible parser generator that makes it easy to write robust parsers

Parsing Expression Grammars in Rust

A library for parsing Backus–Naur form context-free grammars

Wikipedia page on Backus-Naur form

A library for parsing Backus–Naur form context-free grammars

Rust library for parsing configuration files

The 'option' can be any string with no whitespace

Rust library for parsing configuration files

Body parsing plugins for the Iron web framework

Body parsing plugins for the core bundle

Body parsing plugins for the Iron web framework

A library for parsing and running selenium

side files used in a thirtyfour - default

A library for parsing and running selenium

rust crates for parsing stuff

Tokenizers for math expressions, splitting text, lexing lisp-like stuff, etc

rust crates for parsing stuff

Parsing, transformation and validation for feL4 manifests

The primary purpose of this library is to parse and validate fel4

Parsing, transformation and validation for feL4 manifests

Rust library for parsing COLLADA files

Notice: This library is built around files exported from Blender 2

Rust library for parsing COLLADA files

Rust JSON parsing benchmarks

This project aims to provide benchmarks to show how various JSON-parsing libraries in the Rust programming language perform at various JSON-parsing tasks

Rust JSON parsing benchmarks

Rust library for parsing English time expressions into start and end timestamps

This takes English expressions and returns a time range which ideally matches the expression

Rust library for parsing English time expressions into start and end timestamps
Facebook Instagram Twitter GitHub Dribbble
Privacy