This document provides tips and tricks for reproducing and debugging flakes in Web Tests. If you are debugging a flaky Web Platform Test (WPT), you may wish to check the specific Addressing Flaky WPTs documentation.
This document assumes you are familiar with running Web Tests via run_web_tests.py
; if you are not then see here.
Often (e.g. by Flake Portal), you will be pointed to a particular build in which your test has flaked. You will need the name of the specific build step that has flaked; usually for Web Tests this is blink_web_tests
but there are variations (e.g. not_site_per_process_blink_web_tests
).
On the builder page, find the appropriate step:
While you can examine the individual shard logs to find your test output, it is easier to view the consolidated information, so scroll down to the archive results for blink_web_tests step and click the web_test_results
link:
This will open a new tab with the results viewer. By default your test should be shown, but if it isn't then you can click the ‘All’ button in the ‘Query’ row, then enter the test filename in the textbox beside ‘Filters’:
There are a few ways that a Web Test can flake, and what the result means may depend on the test type:
FAIL
- the test failed. For reference or pixel tests, this means it did not match the reference image. For JavaScript tests, the test either failed an assertion or did not match the baseline -expected.txt
file checked in for it.IMAGE
(as in an image diff).TEXT
(as in a text diff).TIMEOUT
- the test timed out before producing a result. This may happen if the test is slow and normally runs close to the timeout limit, but is usually caused by waiting on an event that never happens. These unfortunately do not produce any logs.CRASH
- the browser crashed while executing the test. There should be logs associated with the crash available.PASS
- this can happen! Web Tests can be marked as expected to fail, and if they then pass then that is an unexpected result, aka a potential flake.Clicking on the test row anywhere except the test name (which is a link to the test itself) will expand the entry to show information about the failure result, including actual/expected results and browser logs if they exist.
In the following example, our flaky test has a FAIL
result which is a flake compared to its (default) expected PASS
result. The test results (TEXT
- as explained above this is equivalent to FAIL
), output, and browser log links are highlighted.
TODO: document how to get the args.gn that the bot used
TODO: document how to get the flags that the bot passed to
run_web_tests.py
Flakes are by definition non-deterministic, so it may be necessary to run the test or set of tests repeatedly to reproduce the failure. Two flags to run_web_tests.py
can help with this:
--repeat-each=N
- repeats each test in the test set N times. Given a set of tests A, B, and C, --repeat-each=3
will run AAABBBCCC.--iterations=N
- repeats the entire test set N times. Given a set of tests A, B, and C, --iterations=3
will run ABCABCABC.TODO: document how to attach gdb
When debugging flaky tests, it can be useful to add LOG
statements to your code to quickly understand test state. In order to see these logs when using run_web_tests.py
, pass the --driver-logging
flag:
./third_party/blink/tools/run_web_tests.py --driver-logging path/to/test.html
When debugging a specific test, it can be useful to skip run_web_tests.py
and directly run the test under content_shell
in an interactive session. For many tests, one can just pass the test path to content_shell
:
out/Default/content_shell third_party/blink/web_tests/path/to/test.html
Caveat: running tests like this is not equivalent to run_web_tests.py
, which passes the --run-web-tests
flag to content_shell
. The --run-web-tests
flag enables a lot of testing-only code in content_shell
, but also runs in a non-interactive mode.
Useful flags to pass to get content_shell
closer to the --run-web-tests
mode include:
--enable-blink-test-features
- enables status=test and status=experimental features from runtime_enabled_features.json5
.TODO: document how to deal with tests that require a server to be running