AB Sniffer v6 Testing Manual
This manual covers local miner testing. Local results help improve a submission, but the official score is produced by the production scoring environment.
Prerequisites
- Docker with the Docker Compose plugin
- A normal browser for human checks
- Miner-controlled runners for all eight target frameworks
- A bot-runner service compatible with the challenge scoring API when using
/score - Test hosts or virtual machines for the operating systems being evaluated
- Network access from each runner to the challenge API
The challenge repository does not provide miner-side framework runners. Miners must create and configure their own.
1. Add And Validate The Detectors
Clone the current repository and enter it:
Place the eight framework files and headless.js in:
Keep every filename and exported window function unchanged. Validate the submission:
The validator checks the exact nine files, required exports, the 500-line limit, and the repository ESLint configuration.
2. Configure And Start The Challenge
Create the environment file:
Set a private API key and the local port in .env:
The included Compose file starts the challenge API only. To use /score, deploy your
bot-runner separately and configure at least:
ABS_CHALLENGE_BOT_RUNNER__SERVERS=[{"url":"http://bot-runner-host:8080","device_type":"linux"}]
ABS_CHALLENGE_BOT_RUNNER__API_KEY=replace_with_your_bot_runner_api_key
ABS_CHALLENGE_BOT_RUNNER__PUBLIC_BASE_URL=http://challenge-host:10001
The bot-runner URL must be reachable from the challenge container. The public base URL must be reachable from browsers launched by the bot-runner. Use addresses appropriate to your network rather than copying the example hostnames unchanged.
Start the challenge:
Confirm that it is healthy:
Protected endpoints such as /score and /results require the API key in the X-API-Key
header.
3. Choose A Reachable URL
The detection page is:
Use localhost only when the browser runner and challenge are on the same host. A runner in
another container or machine must use a reachable service name, LAN address, or DNS name.
Do not use 0.0.0.0 as a destination address.
4. Test Framework And Headless Classification
Run every target framework in headed and headless modes where supported:
seleniumbaseseleniumdriverlesspydollpatchrightzendrivernodriverbotasauruspuppeteerextra
For each test, start a clean browser session, open /_web, wait for detection to complete,
record the results, close the browser, and repeat the run.
A correct headed automation result has:
A correct headless automation result has:
headless.js detects the browser's execution mode independently from the framework files.
It must not rely only on the user-agent string.
5. Test Human Sessions
Open /_web manually in a normal headed browser without an automation framework. A correct
human result has:
Human is inferred from the absence of framework and headless detections; it is not a separate detector file. Test multiple clean sessions with normal navigation, typing, scrolling, pointer movement, and developer tools usage where relevant.
Any framework or headless detection during a production human task makes the final score zero. Human false-positive testing is therefore mandatory.
6. Run Local Scoring
After configuring a compatible bot-runner, submit the nine files to the authenticated
/score endpoint using the API documentation or your own client. Scoring schedules
automation runs and human verification in a shuffled order. When the logs request human
verification, open the displayed /_web URL before the human timeout expires.
After scoring, inspect authenticated /results output for:
- expected and submitted frameworks;
- missed detections;
- framework collisions;
- expected and reported headless state;
- headless failures; and
- missing or failed tasks.
Do not optimize only for framework recall. A broad detector may increase local recall while creating collisions or human false positives that reduce or zero the official score.
Testing Versus Production
Local testing does not reproduce production exactly.
| Testing environment | Production scoring environment |
|---|---|
| Miner-created framework runners | Private RedTeam framework runners and presets |
| Locally installed browser, driver, and framework versions | Versions installed in production runner images |
| Miner-selected OS, hardware, flags, patches, extensions, locale, timing, and network | Production infrastructure, device types, launch settings, resource limits, timing, and network |
| Run counts and configurations selected for development | Run schedule and configuration controlled by the scorer |
| Full access to local browser and service diagnostics | Only the official score and permitted result details determine the outcome |
Noticeable score differences are expected when these environments expose different automation signals. Passing local tests does not guarantee the same production result. Build robust detectors by testing multiple versions, operating systems, launch modes, and runner configurations. Avoid signals that identify only one local setup.
Test Matrix
Record repeated framework and human sessions:
| OS | Framework/session | Browser version | Mode | Expected | Actual true | Headless | Collision | Pass |
|---|---|---|---|---|---|---|---|---|
| Linux | nodriver | Chromium version | Headed | nodriver | nodriver | false | No | Yes |
| Linux | human | Chromium version | Headed | human | none | false | No | Yes |
Troubleshooting
- Page is unreachable: check Compose status, port mapping, firewall rules, and the URL used by remote runners.
- Changes do not appear: rebuild or recreate the container and use a fresh browser session without cached assets.
- No detector runs: inspect browser console output and challenge logs for JavaScript errors.
- Multiple framework results are true: remove shared or overly broad signals.
- Human is false: a framework detector or
headless.jsproduced a false positive. - Headless is unstable: repeat clean sessions across browser versions and launch modes.
Stop the environment when testing is complete: