A high-level API to control headless Chrome or Chromium over the DevTools Protocol

Rust equivalent of network request interception

Headless Chrome

. It is the Rust equivalent of Puppeteer, a Node library maintained by the Chrome DevTools team.

It is not 100% feature compatible with Puppeteer, but there's enough here to satisfy most browser testing / web crawling use cases, and there are several 'advanced' features such as:

  • network request interception
  • JavaScript coverage monitoring
  • Opening incognito windows
  • taking screenshots of elements or the entire page
  • saving pages to PDF
  • 'headful' browsing
  • automatic downloading of 'known good' Chromium binaries for Linux / Mac / Windows
  • extension pre-loading

Quick Start

use headless_chrome::{Browser, protocol::page::ScreenshotFormat};

fn browse_wikipedia() -> Result<(), failure::Error> {
    let browser = Browser::default()?;

    let tab = browser.wait_for_initial_tab()?;

    /// Navigate to wikipedia
    tab.navigate_to("https://www.wikipedia.org")?;

    /// Wait for network/javascript/dom to make the search-box available
    /// and click it.
    tab.wait_for_element("input#searchInput")?.click()?;

    /// Type in a query and press `Enter`
    tab.type_str("WebKit")?.press_key("Enter")?;

    /// We should end up on the WebKit-page once navigated
    let elem = tab.wait_for_element("#firstHeading")?;
    assert!(tab.get_url().ends_with("WebKit"));

    /// Take a screenshot of the entire browser window
    let _jpeg_data = tab.capture_screenshot(
                        ScreenshotFormat::JPEG(Some(75)),
                        None,
                        true)?;

    /// Take a screenshot of just the WebKit-Infobox
    let _png_data = tab
        .wait_for_element("#mw-content-text > div > table.infobox.vevent")?
        .capture_screenshot(ScreenshotFormat::PNG)?;

    // Run JavaScript in the page
    match elem.call_js_fn(r#"
        function getIdTwice () {
            // `this` is always the element that you called `call_js_fn` on
            const id = this.id;
            return id + id;
        }
    "#, false)? {
        serde_json::value::Value::String(returned_string) {
            assert_eq!(returned_string, "firstHeadingfirstHeading".to_string());
        }
        _ => unreachable!()
    };

    Ok(())
}

assert!(browse_wikipedia().is_ok());

For fuller examples, take a look at tests/simple.rs and examples.

Before running examples. Make sure add failure crate in your cargo project dependency of Cargo.toml

What can't it do?

The Chrome DevTools Protocol is huge. Currently, Puppeteer supports way more of it than we do. Some of the missing features include:

  • Dealing with frames
  • Handling file picker / chooser interactions
  • Tapping touchscreens
  • Emulating different network conditions (DevTools can alter latency, throughput, offline status, 'connection type')
  • Viewing timing information about network requests
  • Reading the SSL certificate
  • Replaying XHRs
  • HTTP Basic Auth
  • Inspecting EventSources (aka server-sent events or SSEs)
  • WebSocket inspection

If you're interested in adding one of these features but would like some advice about how to start, please reach out by creating an issue or sending me an email at [email protected].

Related crates

  • fantoccini uses WebDriver, so it works with browsers other than Chrome. It's also asynchronous and based on Tokio, unlike headless_chrome, which has a synchronous API and is just implemented using plain old threads. Fantoccini has also been around longer and is more battle-tested. It doesn't support Chrome DevTools-specific functionality like JS Coverage.

Testing

For debug output, set these environment variables before running cargo test:

RUST_BACKTRACE=1 RUST_LOG=headless_chrome=trace

Version numbers

Starting with v0.2.0, we're trying to follow SemVar strictly.

Troubleshooting

If you get errors related to timeouts, you likely need to enable sandboxing either in the kernel or as a setuid sandbox. Puppeteer has some information about how to do that here

By default, headless_chrome will download a compatible version of chrome to XDG_DATA_HOME (or equivalent on Windows/Mac). This behaviour can be optionally turned off, and you can use the system version of chrome (assuming you have chrome installed) by disabling the default feature in your Cargo.toml:

[dependencies.headless_chrome]
default-features = false

Contributing

Pull requests and issues are most welcome, even if they're just experience reports. If you find anything frustrating or confusing, let me know!

Issues

Collection of the latest Issues

Edu4rdSHL

Edu4rdSHL

0

Hello, I'm currently facing the following issue when taking screenshots:

I have been looking at what the debug ports are for and I don't need that, I just need to take the screenshot, that's it. Is there a way to disable that behavior?

wiseman

wiseman

0

Using both the latest released version (0.9.0) and the latest from git, I often run into an issue when taking screenshots where the tab doesn't have the dimensions I've specified. Instead of 800x600, it will sometimes be 72x600 (for some reason it's always 72). I can run the same code multiple times and get output images with different dimensions each time.

The code I'm using looks like this:

The dimensions and viewport don't change, but these are images output from two consecutive runs: test-n737dw-track-labels test-n737dw-track-labels

This happens on my Mac running macOS 12.2.1 and a Linux machine with Ubuntu 20.0.4.

Weirdly, it seems to be a new bug: It didn't happen for months, and then it started happening rarely, and then it started happening a lot. Which makes me wonder if it started happening with a particular version of Chrome?

In case it helps, here's some data on how often I saw it happen on various dates, in Ubuntu 20.0.4. 2022-02-21 is the first day I saw it happen. The first column is the number of screenshots that were 72 pixels wide. The second column is the date. Each day I take about 400-500 screenshots.

elpiel

elpiel

0

I see that there hasn't been a new release since 2019 and that there has been quite some work done since then.

Is it possible to create a new release?

NJAldwin

NJAldwin

enhancement
0

I recently wrote a quick script using rust-headless-chrome to take screenshots of a webpage every minute (it was super easy to get started -- love it!). For certain reasons, I wanted a fresh page each time, so the script effectively started a new headless browser, took a screenshot, and killed the browser every minute.

After leaving this running for a few days, my disk ran out of space. It seems that each new invocation of the headless browser uses a new profile named rust-headless-chrome-profileABCdef (where ABCdef is a random alphanumeric string). This leaves behind ~20MB per profile in my C:/Users/USER/AppData/Temp/ folder (in my case). I ended up with 90GB of these profiles, each with their own separate cache.

This is understandable behavior, but ideally, I would like a way to control the profile via one of the following:

  • control of the profile name (to force a single profile and reuse the cache)
  • ability to disable the profile cache altogether
  • ability to control the profile caching location
  • ability to identify the profile cache location (so I can remove the cache as part of the script)

I'm not seeing any of these in the current API; could we consider adding support for something similar? (If I'm missing it, please let me know -- happy to use the existing API if this is supported!)

FrankenApps

FrankenApps

bug
2

I have a simple repro:

Cargo.toml

main.rs

when I run this using cargo run, the application hangs for about 5 minutes and then finally exits with:

Version 0.9.0 on macos. I also tried:

but it does not build due to a missing dependency.

mockfox

mockfox

6

I enable request interception with tab.enable_request_interception(), but network events are not being intercepted. Am I missing something?

seanaye

seanaye

7

It looks like there hasn't been a release in over 2 years, even though this repo is still being maintained. Is there an up to date version of this package on crates.io? If not, are there plans to build one?

CliffHan

CliffHan

enhancement
0

In browser::transport::handle_incoming_messages(), it could listen to the shutdown of the connection.

But there is no way that the owner of browser could receive that event.

Maybe there could be a channel that gives notification for the owner of browser when such event happens.

nikitavbv

nikitavbv

bug
2

Hi! Thank you for maintaining this crate!

It seems that browser process creation fails when running inside gVisor (used in environments like Google Cloud Run). Browser::new returns DebugPortInUse even though the port specified in LaunchOptions is actually free.

Steps to reproduce

Cargo.toml:

Here is src/main.rs, where we start a headless browser with a specific debugging port set (if we don't set it, it will give up after certain number of retries saying that no port was found):

Dockerfile:

Build it using:

Now run the container:

You may see the following output:

That is something you may expect to see - the browser was successfully started, the webpage was loaded and the screenshot was made.

Now let's try to run the same container inside gVisor. You can find this runtime in Cloud Run (inside Google Cloud), or by installing it locally and running using:

You will probably get the following output:

Cause of this issue

If you take a look at chromium binary output, you may notice the following message:

I am not sure completely at the moment what this error message means, but I do know that it can be ignored. If you don't interrupt the process the moment you see this message, the browser finishes the startup and webtools websocket becomes available at the port, which was specified in the LaunchOptions.

But the process actually gets interrupted. The regular expression in process.rs matches the error message above.

Proposed fix

One simple way to fix this problem would be to change this regexp to a more specific one or to ignore the Could not bind NETLINK socket message. I will send a PR soon with this change. Please let me know if you have any ideas for a better fix.

Thank you!

timbodeit

timbodeit

0

The Readme still contains "By default, headless_chrome will download a compatible version of chrome" which as of 6bba915a167b3a61d9369d517763202f69e1edb5 is no longer the case.

Readme should be updated to reflect, that "automatic downloading of 'known good' Chromium binaries for Linux / Mac / Windows" requires the fetch feature to be enabled.

paxbun

paxbun

enhancement
1

I want to use headless_chrome using browserless. I used the browserless/chrome docker image to test my rust application locally, and it worked well. However, when I used headless_chrome with the remote browser hosted on browserless's service, Browser::connect panics with StatusCodeError(BadRequest). I found that the reason is that headless_chrome uses ClientBuilder::connect_insecure, which makes a plain TCP connection (not an SSL connection). Is there a way to make a secure connection using headless_chrome?

naddika

naddika

0

Apart from the libraries required by the runtime of Rust, are there any special libraries required by DevTools protocol itself?

Or will any library that uses DevTools, Puppeteer and the like, work as is, as long as Chrome/Chromium the browser has been installed?

I can't test this on my computer because my computer isn't fresh and up to this date I've already installed a whole bunch of different kinds of libraries, some of which may have to do with DevTools. But my question about fresh computers. For projects that use Selenium, for instance, headless browser driver, or something like this, is required.

blue-cp

blue-cp

0

Hi there,

Thanks for building this.

I am new to rust andI am trying to get site title and dom source

Equivalent of this in Javascript puppeteer const html = await page.content();

Would appreciate any pointers.

Thanks,

ryanmcgrath

ryanmcgrath

2

I noticed that tab.find_elements will return a Result, which... is kind of odd, at least coming from the perspective of the web.

document.querySelectorAll() doesn't error if something doesn't exist, iteration over it just becomes a no-op. That's the behavior I'd expect from tab.find_elements, personally - is there a reason to do it the current way that I'm missing?

adenine-dev

adenine-dev

enhancement
3

All in the name, there is no way (that I can see) to select a node inside a shadow dom. I have tried a dozen things but it boils down to there is no way to extract an element from call_js_fn, and there is no way to an accurate shadow element from a node, that I could figure out.

Versions

Find the latest versions by id

v0.8.0 - Aug 22, 2019

0.8.0 - 2019-08-22

Added

Removed

Changed

v0.7.0 - Aug 20, 2019

0.7.0 - 2019-08-20

Added

Removed

Changed

v0.6.0 - Aug 20, 2019

0.6.0 - 2019-08-20

Added

Removed

Changed

  • procotol::runtime::RemoteObject.object_type is now an enum rather than any string.

v0.4.0 - Aug 02, 2019

Added

Removed

Changed

  • Changed protocol::dom::NodeId from u16 to u32.

v0.3.0 - Jul 07, 2019

0.3.0 - 2019-07-07

Added

  • Re-export Element struct in top level module
  • Better crate-level docs, and also docs for the Element struct
  • Browser::default convenience method for quickly getting a headless browser with default options

Removed

Changed

v0.2.0 - Jul 07, 2019

0.2.0 - 2019-07-07

Note: starting with this release we're going to bump the minor version whenever anything new is added to the public API.

Added

Removed

Changed

Information - Updated May 14, 2022

Stars: 1.0K
Forks: 116
Issues: 56

Wasm template for Rust hosting without npm-deploy on github pages using Travis script

It automatically hosts your wasm projects on gh-pages using a travis script on the latest commit

Wasm template for Rust hosting without npm-deploy on github pages using Travis script

Roctogen: a rust client library for the GitHub v3 API

This client API is generated from the Isahc HTTP client

Roctogen: a rust client library for the GitHub v3 API

A rust github template for ease of use

Install the rust toolchain in order to have cargo installed by following

A rust github template for ease of use

📓 Relnotes: Automatic GitHub Release Notes

Tera templates for release notes format

📓 Relnotes: Automatic GitHub Release Notes

Rust-generated WebAssembly GitHub action template

A template to bootstrap the creation of a Rust-generated WebAssembly GitHub action

Rust-generated WebAssembly GitHub action template

Template for Rust lib/bin module with built-in GitHub Action to build and test

You will want to change the lib name and bin name in Cargo

Template for Rust lib/bin module with built-in GitHub Action to build and test

cargo_auto_github_lib

Library for cargo-auto automation tasks written in rust language with functions for github

cargo_auto_github_lib

Huber is to simplify the package management from GitHub projects with a builtin awesome list...

Huber is to simplify the package management from GitHub projects with a builtin awesome list (live updating) of popular projects

Huber is to simplify the package management from GitHub projects with a builtin awesome list...

labels is a CLI utility to synchornize your labels in a specific GitHub repo

At this time labels is only available from GitHub

labels is a CLI utility to synchornize your labels in a specific GitHub repo

Renote is a CLI to extend GitHub operation experience, which is a complementary tool to...

Renote is a CLI to extend GitHub operation experience, which is a complementary tool to use with gh advanced search options

Renote is a CLI to extend GitHub operation experience, which is a complementary tool to...
Facebook Instagram Twitter GitHub Dribbble
Privacy