The Gymnasium library is an open-source library for developing and comparing reinforcement learning algorithms. It provides a collection of environments for various tasks, including classic control problems, Atari games, and robotics simulations. Gymnasium is a fork of the widely-used OpenAI Gym library, designed to be a community-driven project with a focus on long-term sustainability and extensibility.
In this topic, we will overview the basics of Gymnasium.
The setup
Installing Gymnasium is straightforward using pip:
pip install gymnasiumOnce installed, you can import the necessary modules and start working with Gymnasium environments. Here's a basic example:
import gymnasium as gym
env = gym.make('CartPole-v1')
observation = env.reset()
for _ in range(1000):
action = env.action_space.sample() # Take a random action
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation = env.reset()
env.close()In this example, we import the gymnasium module and create an instance of the CartPole-v1 environment using gym.make(). The env.reset() method resets the environment to its initial state and returns the initial observation.
The core components of Gymnasium are the Environment, Agent, and Observation/Action spaces. The Environment represents the problem to be solved, providing a standardized interface for interacting with it. The Agent is the decision-making entity that takes actions in the environment based on observations.
The Observation space defines the possible values that the agent can observe from the environment, while the Action space specifies the set of valid actions that the agent can take. These spaces can be discrete (a finite set of values) or continuous (a range of values).
In the example above, env.action_space.sample() selects a random action from the action space of the CartPole-v1 environment. The env.step(action) method advances the environment by one step, taking the specified action and returning the new observation, reward, termination status, truncation status, and additional information.
The environment
The Environment class is a core component of the Gymnasium library. It provides a standardized interface for interacting with different environments, allowing for consistent code structure and easy switching between environments.
The Environment class has several essential methods and attributes that define its behavior and interactions with the agent. The main methods include:
reset(): This method resets the environment to its initial state and returns the initial observation.step(action): This method advances the environment by one step, given an action taken by the agent. It returns a tuple containing the new observation, the reward for the previous action, a boolean indicating if the episode has terminated, a boolean indicating if the episode has been truncated due to a time limit, and additional information (optional).render(): This method renders the current state of the environment, typically for visualization or debugging purposes. The rendering mode (e.g., human-readable, RGB array) can vary depending on the environment.
Additionally, the Environment class has several important attributes:
action_space: This attribute specifies the set of valid actions that the agent can take in the environment. It can be a discrete space (a finite set of actions) or a continuous space (a range of values).observation_space: This attribute defines the possible values that the agent can observe from the environment. Like the action space, it can be discrete or continuous.metadata: This attribute provides additional metadata about the environment, such as the rendering modes supported, task-specific information, or citations.
Gymnasium environments are designed to be iteratively solved by agents through trial-and-error interactions. The agent takes actions based on the current observation, and the environment responds with a new observation, a reward signal, and termination/truncation status. This feedback loop continues until the episode ends, either by reaching a terminal state or by hitting a time limit (truncation).
Some environments may have additional methods or attributes specific to their domain or task. For example, robotics environments may have methods for setting joint positions or retrieving sensor data, while Atari game environments may have methods for capturing frame buffers or game scores.
The Gymnasium API
The Gymnasium API provides several components and utilities beyond just the core Environment class. These additional components facilitate working with environments, modifying their behavior, and collecting data for analysis and training.
One key component of the API is the Space class and its subclasses, which represent the observation and action spaces of an environment. The Discrete and Box spaces are commonly used to represent discrete and continuous spaces, respectively. These spaces define the valid ranges or sets of values for observations and actions, enabling consistency and compatibility across different environments.
Another important aspect of the Gymnasium API is the concept of wrappers. Wrappers are higher-order functions that modify the behavior of an existing environment without changing its core implementation. They allow developers to add additional functionality, preprocess observations or actions, or change the reward structure of an environment. Some common wrappers include TimeLimit (for setting a maximum episode length), GrayScaleObservation (for converting RGB observations to grayscale), and ClipAction (for clipping continuous actions to a valid range).
Monitors are another useful component of the Gymnasium API. They are used to collect data and statistics during agent-environment interactions, such as episode rewards, lengths, and other relevant information. This data can be used for analysis, debugging, or even for training agents using techniques like imitation learning. The Monitor wrapper provides a simple way to record episode data, while more advanced monitoring options like video recording are also available.
The Gymnasium API also includes various utility functions and classes for handling common tasks. For example, the EpisodeRunner class simplifies running multiple episodes of an environment, while the make function provides a convenient way to create environment instances from their identifiers. Additionally, there are helper functions for seeding random number generators, handling parallelization, and more.
Conclusion
As a result, you are now familiar with the basic settings of the Gymnasium library, the main aspects of the Environment class, and the capabilities of the Gymnasium API.