HumanAgent Interface

browsergym_env

---
title: BrowserGym Environment Server
emoji: 🌐
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---

BrowserGym Environment

BrowserGym is a unified framework for web-based agent tasks that provides access to multiple benchmarks under a single Gymnasium-compatible API. This integration brings the complete training-to-evaluation pipeline for web agents into OpenEnv.

Why BrowserGym?

BrowserGym provides a complete pipeline for developing web agents: train on simple tasks, then evaluate on realistic websites.

What are these benchmarks?

MiniWoB++ (Training): 100+ synthetic web tasks like "click this button", "fill out this form", "select from dropdown". Each task is a simple webpage with a clear objective. Fast resets, randomized variations, dense rewards. Perfect for learning basic web navigation skills. No external setup needed - tasks run in isolated browser sessions.

WebArena (Evaluation): 812 tasks on real websites (e-commerce, forums, GitLab, Wikipedia). Tasks like "find the cheapest laptop and add to cart" or "create a merge request for bug #123". Multi-step, requires reasoning, sparse rewards. Tests if your agent can handle actual websites. Requires running 7 backend services (shopping site, GitLab instance, etc).

VisualWebArena: Similar to WebArena but requires visual understanding - agents need to interpret images, identify UI elements visually, handle multimodal content.

WorkArena: Enterprise software tasks (CRM, project management, business workflows). Tests automation on corporate-style applications.

The training → evaluation pipeline:

Key advantage

Quick Start - Training (MiniWoB)

No Setup Required! 🎉

from envs.browsergym_env import BrowserGymEnv, BrowserGymAction

Create environment for MiniWoB training task

env = BrowserGymEnv.from_docker_image(
    "ghcr.io/openenv/browsergym-env:latest",
    environment={
        "BROWSERGYM_BENCHMARK": "miniwob",
        "BROWSERGYM_TASK_NAME": "click-test",  # or "click-button", "click-dialog", etc.
    }
)

Train your agent!

for episode in range(1000):
    result = env.reset()
    print(f"Goal: {result.observation.goal}")

    done = False
    while not done:
        # Your agent decides what to do
        action_str = agent.get_action(result.observation.text)
        action = BrowserGymAction(action_str=action_str)

        result = env.step(action)
        done = result.done

        print(f"Reward: {result.reward}")

env.close()

Available Tasks by Benchmark

Click Tasks

click-test

click-button

click-button-sequence

click-checkboxes

click-checkboxes-soft

click-checkboxes-large

click-checkboxes-transfer

click-dialog

click-dialog-2

click-link

click-option

click-pie

click-scroll-list

click-shades

click-shape

click-tab

click-tab-2

click-widget

Text Entry Tasks

enter-text

enter-text-dynamic

enter-text-2

enter-password

enter-date

enter-time

login-user

login-user-popup

Navigation Tasks

navigate-tree

search-engine

use-autocomplete

book-flight

choose-date

choose-date-easy

choose-date-medium

choose-list

Visual/Spatial Tasks

count-sides

count-shape

find-word

focus-text

focus-text-2

grid-coordinate

guess-number

identify-shape

read-table

read-table-2

Email/Social Tasks

email-inbox

email-inbox-forward

email-inbox-nl

email-inbox-star-reply

social-media

social-media-some

Total:

Usage:

Easy task for quick testing

env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-test"})

Medium difficulty for training

env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-checkboxes"})

Hard task for evaluation

env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "email-inbox"})

By Website:

By Difficulty:

Usage:

Task 0 (usually easy)

env = BrowserGymEnv(environment={
    "BROWSERGYM_BENCHMARK": "webarena",
    "BROWSERGYM_TASK_NAME": "0",
    "SHOPPING": "http://your-server:7770",
    # ... other URLs
})

Task 156 (GitLab merge request)

env = BrowserGymEnv(environment={
    "BROWSERGYM_BENCHMARK": "webarena",
    "BROWSERGYM_TASK_NAME": "156",
    # ... URLs
})

Note:

Image-based reasoning

Visual element identification

Multimodal interaction (text + images)

CRM operations

Project management

Business workflows

Full task lists:

[MiniWoB++ tasks](https://github.com/Farama-Foundation/miniwob-plusplus/tree/master/miniwob/environment)

[WebArena tasks](https://github.com/web-arena-x/webarena/blob/main/config_files/)

[BrowserGym documentation](https://github.com/ServiceNow/BrowserGym)

Evaluation (WebArena)

Prerequisites

Usage

from envs.browsergym_env import BrowserGymEnv, BrowserGymAction

Create environment for WebArena evaluation

env = BrowserGymEnv.from_docker_image(
    "ghcr.io/openenv/browsergym-env:latest",
    environment={
        "BROWSERGYM_BENCHMARK": "webarena",
        "BROWSERGYM_TASK_NAME": "0",  # Task ID
        # WebArena backend URLs (required)
        "SHOPPING": "http://your-server:7770",
        "SHOPPING_ADMIN": "http://your-server:7780/admin",
        "REDDIT": "http://your-server:9999",
        "GITLAB": "http://your-server:8023",
        "MAP": "http://your-server:3000",
        "WIKIPEDIA": "http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing",
        "HOMEPAGE": "http://your-server:4399",
    }
)

Evaluate your trained agent

result = env.reset()
while not result.done:
    action_str = agent.get_action(result.observation)
    action = BrowserGymAction(action_str=action_str)
    result = env.step(action)

print(f"Success: {result.reward}")
env.close()

Building the Docker Image

Prerequisites

Base Image

From the OpenEnv repository root

docker build -t openenv-base:latest -f src/core/containers/images/Dockerfile .

Build the BrowserGym Environment

From the OpenEnv repository root

docker build -t browsergym-env:latest -f src/envs/browsergym_env/server/Dockerfile .

Run the Server

docker run -p 8000:8000 \
  -e BROWSERGYM_BENCHMARK="miniwob" \
  -e BROWSERGYM_TASK_NAME="click-test" \
  browsergym-env:latest

docker run -p 8000:8000 \
  -e BROWSERGYM_BENCHMARK="webarena" \
  -e BROWSERGYM_TASK_NAME="0" \
  -e SHOPPING="http://your-server:7770" \
  -e SHOPPING_ADMIN="http://your-server:7780/admin" \
  -e REDDIT="http://your-server:9999" \
  -e GITLAB="http://your-server:8023" \
  -e MAP="http://your-server:3000" \
  -e WIKIPEDIA="http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing" \
  -e HOMEPAGE="http://your-server:4399" \
  browsergym-env:latest

Environment Details

Action

from envs.browsergym_env import BrowserGymAction

Click actions

action = BrowserGymAction(action_str="click('Submit button')")
action = BrowserGymAction(action_str="click('element_id_123')")

Type actions

action = BrowserGymAction(action_str="fill('username', 'john@example.com')")
action = BrowserGymAction(action_str="fill('password', 'secret123')")

Navigate actions

action = BrowserGymAction(action_str="goto('https://example.com')")

Keyboard actions

action = BrowserGymAction(action_str="press('Enter')")
action = BrowserGymAction(action_str="press('Tab')")

Scroll actions

action = BrowserGymAction(action_str="scroll('down')")

Observation

result = env.step(action)
obs = result.observation

Text observations

print(obs.text)          # Primary text representation (AXTree or DOM)
print(obs.axtree_txt)    # Accessibility tree
print(obs.pruned_html)   # Pruned HTML (interactive elements only)

Page metadata

print(obs.url)           # Current URL
print(obs.goal)          # Task goal/instruction

Visual (if enabled)

if obs.screenshot is not None:
    print(obs.screenshot.shape)  # [height, width, channels]

Error handling

if obs.last_action_error:
    print(f"Action failed: {obs.error}")

Episode status

print(obs.done)          # True if episode ended
print(obs.reward)        # Reward for the step

Access full BrowserGym data (includes timestamps, etc.)

print(obs.metadata["browsergym_obs"])  # Full observation dict from BrowserGym
print(obs.metadata["browsergym_info"]) # Full info dict (timestamps, page state, etc.)

metadata

result = env.step(action)

Access timestamps (if available)

info = result.observation.metadata["browsergym_info"]
if "timestamp" in info:
    print(f"Action timestamp: {info['timestamp']}")

Access additional observation fields

obs_dict = result.observation.metadata["browsergym_obs"]
if "dom_object" in obs_dict:
    dom = obs_dict["dom_object"]
    # Work with raw DOM object

Access page performance data

if "performance" in info:
    print(f"Page load time: {info['performance']}")

State

state = env.state()

print(f"Benchmark: {state.benchmark}")     # 'miniwob', 'webarena', etc.
print(f"Task: {state.task_name}")          # Task name/ID
print(f"Episode: {state.episode_id}")      # Unique episode ID
print(f"Steps: {state.step_count}")        # Number of steps taken
print(f"Total Reward: {state.cum_reward}") # Cumulative reward
print(f"Goal: {state.goal}")               # Task instruction
print(f"URL: {state.current_url}")         # Current page URL

Configuration

Common Settings

BROWSERGYM_BENCHMARK: Benchmark to use (miniwob, webarena, visualwebarena, workarena)

BROWSERGYM_TASK_NAME: Specific task name (optional, will use first available if not set)

BROWSERGYM_HEADLESS: Run browser in headless mode (default: true)

BROWSERGYM_VIEWPORT_WIDTH: Browser viewport width (default: 1280)

BROWSERGYM_VIEWPORT_HEIGHT: Browser viewport height (default: 720)

BROWSERGYM_TIMEOUT: Action timeout in milliseconds (default: 10000)

WebArena-Specific (only needed for WebArena benchmark)

SHOPPING: Shopping website URL

SHOPPING_ADMIN: Shopping admin panel URL

REDDIT: Reddit-like forum URL

GITLAB: GitLab instance URL

MAP: Map service URL

WIKIPEDIA: Wikipedia instance URL

HOMEPAGE: Homepage URL

Supported Benchmarks

1. MiniWoB++ (Training) ✅ Recommended for Training

100+ tasks ranging from simple (click buttons) to complex (form filling, navigation)

Fast: Instant resets, quick episodes

Randomized: Task variations for generalization

No setup: Works out-of-the-box

Dense rewards: Immediate feedback for learning

Use Case

2. WebArena (Evaluation) 📊 Benchmark

812 realistic tasks across 6 websites

Complex: Multi-step reasoning, real web interfaces

Requires setup: Need to run 7 backend services

Sparse rewards: Binary success/failure

Evaluation-focused: Test real-world performance

Use Case

3. VisualWebArena (Evaluation) 👁️ Visual Benchmark

910 tasks requiring visual understanding

Multimodal: Both text and visual observations

Requires setup: Similar to WebArena

Challenging: Requires visual reasoning

Use Case

4. WorkArena (Evaluation) 💼 Enterprise Benchmark

Enterprise tasks: CRM, project management, etc.

Realistic workflows: Real enterprise software

Requires setup: Enterprise software instances

Use Case

Typical Training Pipeline

from envs.browsergym_env import BrowserGymEnv, BrowserGymAction

Stage 1: Train on MiniWoB (simple tasks, fast)

train_env = BrowserGymEnv.from_docker_image(
    "browsergym-env:latest",
    environment={
        "BROWSERGYM_BENCHMARK": "miniwob",
        "BROWSERGYM_TASK_NAME": "click-button",
    }
)

Train your agent (RL, imitation learning, etc.)

agent.train(train_env, num_episodes=10000)
train_env.close()

Stage 2: Evaluate on WebArena (complex tasks, realistic)

eval_env = BrowserGymEnv.from_docker_image(
    "browsergym-env:latest",
    environment={
        "BROWSERGYM_BENCHMARK": "webarena",
        "BROWSERGYM_TASK_NAME": "0",
        # ... WebArena URLs
    }
)

Test performance

success_rate = agent.evaluate(eval_env, num_tasks=812)
print(f"WebArena Success Rate: {success_rate:.2%}")
eval_env.close()

Development & Testing

Running Tests

From the OpenEnv repository root

pytest tests/envs/test_browsergym_env.py

Local Development

Install in development mode

cd /path/to/OpenEnv
pip install -e .

Install BrowserGym

pip install browsergym browsergym-miniwob browsergym-webarena

Run the server locally

cd src/envs/browsergym_env/server
export MINIWOB_URL="http://localhost:8888/miniwob/"
export BROWSERGYM_BENCHMARK=miniwob
export BROWSERGYM_TASK_NAME=click-test
python app.py

Project Structure

browsergym_env/
├── __init__.py              # Module exports
├── models.py                # Action, Observation, State dataclasses
├── client.py                # HTTPEnvClient implementation
├── README.md                # This file
└── server/
    ├── __init__.py
    ├── app.py               # FastAPI application
    ├── browsergym_environment.py  # Environment implementation
    ├── Dockerfile           # Container specification
    └── requirements.txt     # Python dependencies

References

[BrowserGym GitHub](https://github.com/ServiceNow/BrowserGym)

[MiniWoB++ Paper](https://arxiv.org/abs/1802.08802)

[WebArena Paper](https://arxiv.org/abs/2307.13854)

[WebArena Website](https://webarena.dev/)

[VisualWebArena Paper](https://jykoh.com/vwa)

[OpenEnv Documentation](https://github.com/openenv/openenv)

Take Action

Current State

Status: Not initialized

Episode ID: -

Step Count: 0

browsergym_env

BrowserGym Environment

Why BrowserGym?

Quick Start - Training (MiniWoB)

No Setup Required! 🎉

Create environment for MiniWoB training task

Train your agent!

Available Tasks by Benchmark

Easy task for quick testing

Medium difficulty for training

Hard task for evaluation

Task 0 (usually easy)

Task 156 (GitLab merge request)

Evaluation (WebArena)

Prerequisites

Usage

Create environment for WebArena evaluation

Evaluate your trained agent

Building the Docker Image

Prerequisites

From the OpenEnv repository root

Build the BrowserGym Environment

From the OpenEnv repository root

Run the Server

Environment Details

Action

Click actions

Type actions

Navigate actions

Keyboard actions

Scroll actions

Observation

Text observations

Page metadata

Visual (if enabled)

Error handling

Episode status

Access full BrowserGym data (includes timestamps, etc.)

Access timestamps (if available)

Access additional observation fields

Access page performance data

State

Configuration

Common Settings

WebArena-Specific (only needed for WebArena benchmark)

Supported Benchmarks

1. MiniWoB++ (Training) ✅ Recommended for Training

2. WebArena (Evaluation) 📊 Benchmark

3. VisualWebArena (Evaluation) 👁️ Visual Benchmark

4. WorkArena (Evaluation) 💼 Enterprise Benchmark

Typical Training Pipeline

Stage 1: Train on MiniWoB (simple tasks, fast)

Train your agent (RL, imitation learning, etc.)

Stage 2: Evaluate on WebArena (complex tasks, realistic)

Test performance

Development & Testing

Running Tests

From the OpenEnv repository root

Local Development

Install in development mode

Install BrowserGym

Run the server locally

Project Structure

References

Take Action

Current State

Current Observation

Action History