servo/rust-cssparser

Rust implementation of

Parsing CSS involves a series of steps:

rust-cssparser

Documentation

Rust implementation of CSS Syntax Module Level 3

Overview

Parsing CSS involves a series of steps:

  • When parsing from bytes, (e.g. reading a file or fetching an URL from the network,) detect the character encoding (based on a Content-Type HTTP header, an @charset rule, a BOM, etc.) and decode to Unicode text.

    rust-cssparser does not do this yet and just assumes UTF-8.

    This step is skipped when parsing from Unicode, e.g. in an HTML <style> element.

  • Tokenization, a.k.a. lexing. The input, a stream of Unicode text, is transformed into a stream of tokens. Tokenization never fails, although the output may contain error tokens.

  • This flat stream of tokens is then transformed into a tree of component values, which are either preserved tokens, or blocks/functions ({ … }, [ … ], ( … ), foo( … )) that contain more component values.

    rust-cssparser does this at the same time as tokenization: raw tokens are never materialized, you only get component values.

  • Component values can then be parsed into generic rules or declarations. The header and body of rules as well as the value of declarations are still just lists of component values at this point. See the Token enum for the data structure.

  • The last step of a full CSS parser is parsing the remaining component values into Selectors, specific CSS properties, etc.

    By design, rust-cssparser does not do this last step which depends a lot on what you want to do: which properties you want to support, what you want to do with selectors, etc.

    It does however provide some helper functions to parse CSS colors and An+B (the argument to :nth-child() and related selectors.

    See Servo’s style crate for an example of a parser based on rust-cssparser.

Issues

Collection of the latest Issues

MatsPalmgren

MatsPalmgren

Comment Icon0

There's an open cssom issue about serialization of <custom-ident>. Gecko currently serialize "\\31st" as "\\31 st" which violates the general principle that values should serialize to the shortest form (the space is redundant in this case). It appears it comes from hex_escape which unconditionally adds a space.

CC @emilio

makotokato

makotokato

Comment Icon0

From https://bugzilla.mozilla.org/show_bug.cgi?id=1503656

Actually there is no way to get TokenSerializationType directly from Token type. If it is TokenSerializationTypeVariants::Nothing, it has nothing() function to get it.

Even parsing CSS that is hardcoded value such as 0px, to get type, we have to create Token object like the following.

I am happy if cssparser has a function to get TokenSerializationType without creating object like TokenSerializationType::nothing().

darrell-roberts

darrell-roberts

Comment Icon9

Hi,

I was wondering why the ParseError struct does not implement the std::error::Error trait? I was hoping to wrap it in a custom error enum type and use the "?" operator when calling parse functions. From what I've tried so far has not allowed me to do so.

RazrFalcon

RazrFalcon

Comment Icon2

phf_codegen is a pretty big dependency and colors will not change from build to build, so maybe it's better to keep the colors map as a "prebuilt" file? Something like this.

The source of the problem:

It's like 20 dependencies just to build a phf map, that never changes.

jdm

jdm

Comment Icon3

The generated assembly for code using next_byte_unchecked like skip_whitespace has a suprising number of indirections in order to actually get the byte value. This is caused by the need to get the slice, then get the byte at the desired offset from it at https://github.com/servo/rust-cssparser/blob/682087fca5ba5f2f05a09bba72c62dac6b3d778d/src/tokenizer.rs#L372. If we use store a pointer and increase its instead of the offset, it should cause more efficient code to be generated.

SimonSapin

SimonSapin

Comment Icon6

System colors are deprecated, but what exactly should Servo do with them?

  • Not support them at all: treat them as invalid like any other unknown keyword
  • Interpret them all like initial
  • Treat as white those that have the word "background" in their description, others as black (per Tab’s Level 4 proposal)
  • Something else?

@fantasai, do you have an opinion?

Information - Updated Sep 12, 2022

Stars: 525
Forks: 112
Issues: 19

Parsing Expression Grammars in Rust

rust-peg is a simple yet flexible parser generator that makes it easy to write robust parsers

Parsing Expression Grammars in Rust

A library for parsing Backus–Naur form context-free grammars

Wikipedia page on Backus-Naur form

A library for parsing Backus–Naur form context-free grammars

Rust library for parsing configuration files

The 'option' can be any string with no whitespace

Rust library for parsing configuration files

Body parsing plugins for the Iron web framework

Body parsing plugins for the core bundle

Body parsing plugins for the Iron web framework

rust crates for parsing stuff

Tokenizers for math expressions, splitting text, lexing lisp-like stuff, etc

rust crates for parsing stuff

Parsing, transformation and validation for feL4 manifests

The primary purpose of this library is to parse and validate fel4

Parsing, transformation and validation for feL4 manifests

Rust JSON parsing benchmarks

This project aims to provide benchmarks to show how various JSON-parsing libraries in the Rust programming language perform at various JSON-parsing tasks

Rust JSON parsing benchmarks

Minimal Parsing Language (MPL)

This is minimal parser combinator of like Top-Down Parsing Language (TDPL)

Minimal Parsing Language (MPL)

Parsing the &quot;version core&quot; of semver numbers and their shorthands

A crate to parse two- and three component version numbers

Parsing the &quot;version core&quot; of semver numbers and their shorthands

Parsing Expression Grammars in Rust

rust-peg is a simple yet flexible parser generator that makes it easy to write robust parsers

Parsing Expression Grammars in Rust

Parsing + State machine

Parsing JSON with serde which become input steps to a state machine written in Rust

Parsing + State machine

A DSL parsing library for human readable text documents

Piston-Meta makes it easy to write parsers for human readable text documents

A DSL parsing library for human readable text documents
Facebook Instagram Twitter GitHub Dribbble
Privacy