Skip to content

How Does Selenium 4 Work to Revolutionize Web Automation?

4 min read

Over 20% of software testing is automated, and Selenium 4 has fundamentally shifted how this is done with a major architectural overhaul. This transition from Selenium 3's JSON Wire Protocol to the W3C WebDriver standard is key to understanding how Selenium 4 works, offering improved stability, performance, and powerful new features.

Quick Summary

Selenium 4 revamps its architecture by adopting the W3C WebDriver protocol for direct browser interaction, replacing the older JSON Wire Protocol. It adds robust features like relative locators, native Chrome DevTools Protocol access, and a re-architected Grid for easier parallel testing.

Key Points

  • W3C Compliance: Selenium 4 operates on the W3C WebDriver protocol, enabling direct communication between client code and browser drivers for increased stability and performance.

  • Relative Locators: This new feature allows testers to find web elements based on their position relative to other known elements on a page, improving script reliability on dynamic sites.

  • Redesigned Grid 4: Selenium Grid has been completely rebuilt with a modular architecture and native Docker support, making parallel test execution significantly easier and more scalable.

  • CDP Integration: Testers can now access the Chrome DevTools Protocol (CDP) to perform advanced actions like network throttling, device emulation, and performance monitoring on Chromium-based browsers.

  • Enhanced API: Improvements to the Actions class, easier window/tab management, and more robust waiting mechanisms streamline test script development and maintenance.

  • Deprecation of Legacy Code: The JSON Wire Protocol is removed, and the DesiredCapabilities class is deprecated in favor of browser-specific Options classes.

In This Article

Understanding the W3C WebDriver Protocol

At the core of how Selenium 4 works is its full compliance with the W3C (World Wide Web Consortium) WebDriver protocol. Unlike Selenium 3, which relied on the JSON Wire Protocol to translate commands between the client library and browser drivers, Selenium 4 uses the W3C standard for direct, two-way communication. This standardization results in several key advantages:

  • Improved Stability: Direct communication between the client and browser driver eliminates a layer of encoding and decoding, reducing potential points of failure and making tests more stable.
  • Cross-Browser Consistency: Since all major browser vendors (Chrome, Firefox, Edge, Safari) now follow the W3C standard, test scripts behave more consistently across different browsers.
  • Enhanced Performance: The direct communication channel removes the overhead of the intermediary JSON Wire Protocol, leading to faster execution of test commands.
  • Simplified Client-Side Code: Developers no longer need to manage browser-specific communication quirks, leading to more straightforward and maintainable test code.

Harnessing Relative Locators

One of the most practical new features that define how Selenium 4 works for testers is the introduction of relative locators, also known as 'Friendly Locators'. This capability allows you to find elements based on their visual position relative to other known elements on the page, rather than relying solely on attributes like ID or XPath.

Relative locators are useful for handling dynamic websites and elements that lack unique identifiers. Selenium 4 provides five methods for this purpose:

  • above(WebElement element): Locates the element directly above a specified reference element.
  • below(WebElement element): Finds the element positioned below the reference element.
  • toLeftOf(WebElement element): Identifies the element to the left of the reference element.
  • toRightOf(WebElement element): Finds the element to the right of the reference element.
  • near(WebElement element): Locates the element approximately 50 pixels away from the reference element.

The Re-architected Selenium Grid 4

Selenium Grid is used for parallel testing across multiple machines, and Selenium 4 features a completely rebuilt Grid with a modern, modular architecture. Unlike the older Hub-and-Node setup, Grid 4 is easier to configure and more scalable, leveraging Docker for containerization.

The Grid 4 architecture consists of several components that communicate via an Event Bus:

  • Router: The entry point for all test requests.
  • Distributor: Assigns test sessions to the appropriate Node based on browser requirements.
  • Node: The machine that runs the browser instances and executes the tests.
  • Session Map: Tracks which Node is running which session.
  • New Session Queue: Holds incoming test requests until a suitable Node is available.

This new architecture simplifies setup significantly. For smaller grids, a single command can launch a server in standalone mode, which functions as both a Hub and Node, automatically detecting available browser drivers. This new design makes distributed testing more powerful and manageable than ever before. https://www.selenium.dev/documentation/grid/

Integrating with Chrome DevTools Protocol (CDP)

Selenium 4's integration with the Chrome DevTools Protocol (CDP) provides a new level of control over Chromium-based browsers (Chrome and Edge). The CDP gives programmatic access to the browser's internal workings, enabling advanced actions far beyond standard user simulation. Using the getDevTools() and send() methods, testers can now:

  • Intercept Network Requests: Modify, block, or mock network requests and responses for robust testing.
  • Capture Performance Metrics: Gain insight into a web application's performance, including load times and resource usage.
  • Emulate Device Conditions: Mimic specific device metrics, geolocation, and network throttling directly from a test script.
  • Access Console Logs: Capture and analyze browser console output, which is crucial for advanced debugging.

Enhanced Window and Tab Management

Modern web applications frequently open new tabs or windows. Selenium 4 offers a more robust and intuitive API for handling these scenarios. The driver.switchTo().newWindow() method simplifies opening a new tab or window and automatically switches the driver's focus to it, eliminating the need for manual window handle retrieval. This makes managing multiple concurrent browser contexts much cleaner and more readable.

Comparison: Selenium 3 vs. Selenium 4

Feature Selenium 3 Selenium 4
WebDriver Protocol Deprecated JSON Wire Protocol. Fully W3C WebDriver compliant.
Communication Client communicates with a separate JSON server. Direct, bi-directional communication with the browser.
Stability Occasional instability due to protocol translation. More stable and reliable cross-browser execution.
Relative Locators Not available. New feature for finding elements based on position.
Selenium Grid Legacy Hub-and-Node architecture, complex setup. Rebuilt architecture with Docker support, easier setup.
DevTools Access No native access, requires third-party libraries. Native access via the CDP for Chromium browsers.
Capabilities Used DesiredCapabilities class (deprecated). Replaced by browser-specific Options classes.
Window Management Required manual handling of window handles. Simplified API with newWindow() for new tabs/windows.

Conclusion

In essence, how Selenium 4 works is a profound re-engineering of the automation framework, shifting its foundation to the W3C standard for improved stability and performance. The addition of relative locators makes element identification more resilient, while the enhanced Selenium Grid provides a powerful, scalable solution for parallel testing. Native support for the Chrome DevTools Protocol and a modernized API for window handling further empower testers to build more robust and comprehensive test suites. For modern web automation, upgrading to Selenium 4 is not just an option but a necessary step to leverage the latest browser capabilities and testing efficiencies.

Frequently Asked Questions

The most significant change is the shift from the JSON Wire Protocol used in Selenium 3 to full adherence to the W3C WebDriver protocol in Selenium 4, which standardizes direct communication with the browser and improves stability.

Relative locators allow you to find elements based on their visual position relative to other elements, using methods like above(), below(), and toLeftOf(). This is crucial for elements that have dynamic attributes or lack unique identifiers.

No, Selenium 4 has completely removed the JSON Wire Protocol in favor of the W3C WebDriver standard, streamlining communication between the client and browser drivers.

Selenium Grid 4 has a new modular architecture and native Docker support that makes setting up and scaling a grid much easier. The new standalone mode combines the hub and node into a single process, simplifying local setup.

Using the Chrome DevTools Protocol (CDP), Selenium 4 can perform advanced actions like intercepting network requests, simulating network conditions, emulating mobile devices, and capturing performance metrics.

Selenium 4 introduces a new, simplified API for handling windows and tabs. The driver.switchTo().newWindow() method automatically opens a new tab or window and switches the driver's focus, making the process cleaner and more intuitive.

For most basic scripts, the migration should not require major code changes, especially if the code was already W3C-compliant. Some deprecated commands, like using DesiredCapabilities, will need to be updated to the new Options classes.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8

Medical Disclaimer

This content is for informational purposes only and should not replace professional medical advice.