Comprehensive RL-Algorithmen Tools in One Place

Sponsored by Qoder - Qoder is an agentic coding platform for real software, Free to use the best model in preview.



Qoder - Qoder is an agentic coding platform for real software, Free to use the best model in preview.





AI News

RL-Algorithmen

PommerLearn
An RL framework offering PPO, DQN training and evaluation tools for developing competitive Pommerman game agents.

0


0
Visit AI
What is PommerLearn?
PommerLearn enables researchers and developers to train multi-agent RL bots in the Pommerman game environment. It includes ready-to-use implementations of popular algorithms (PPO, DQN), flexible configuration files for hyperparameters, automatic logging and visualization of training metrics, model checkpointing, and evaluation scripts. Its modular architecture makes it easy to extend with new algorithms, customize environments, and integrate with standard ML libraries such as PyTorch.
PommerLearn Core Features
RL-Agents
Open-source PyTorch library providing modular implementations of reinforcement learning agents like DQN, PPO, SAC, and more.

0


0
Visit AI
What is RL-Agents?
RL-Agents is a research-grade reinforcement learning framework built on PyTorch that bundles popular RL algorithms across value-based, policy-based, and actor-critic methods. The library features a modular agent API, GPU acceleration, seamless integration with OpenAI Gym, and built-in logging and visualization tools. Users can configure hyperparameters, customize training loops, and benchmark performance with a few lines of code, making RL-Agents ideal for academic research, prototyping, and industrial experimentation.
RL-Agents Core Features
Text-to-Reward
Text-to-Reward learns general reward models from natural language instructions to effectively guide RL agents.

0


0
Visit AI
What is Text-to-Reward?
Text-to-Reward provides a pipeline to train reward models that map text-based task descriptions or feedback into scalar reward values for RL agents. Leveraging transformer-based architectures and fine-tuning on collected human preference data, the framework automatically learns to interpret natural language instructions as reward signals. Users can define arbitrary tasks via text prompts, train the model, and then incorporate the learned reward function into any RL algorithm. This approach eliminates manual reward shaping, boosts sample efficiency, and enables agents to follow complex multi-step instructions in simulated or real-world environments.
Text-to-Reward Core Features
Text-to-Reward Pro & Cons
CybMASDE
CybMASDE provides a customizable Python framework for simulating and training cooperative multi-agent deep reinforcement learning scenarios.

0


0
Visit AI
What is CybMASDE?
CybMASDE enables researchers and developers to build, configure, and execute multi-agent simulations with deep reinforcement learning. Users can author custom scenarios, define agent roles and reward functions, and plug in standard or custom RL algorithms. The framework includes environment servers, networked agent interfaces, data collectors, and rendering utilities. It supports parallel training, real-time monitoring, and model checkpointing. CybMASDE’s modular architecture allows seamless integration of new agents, observation spaces, and training strategies, accelerating experimentation in cooperative control, swarm behavior, resource allocation, and other multi-agent use cases.
CybMASDE Core Features
MAPF_G2RL
MAPF_G2RL is a Python framework training deep reinforcement learning agents for efficient multi-agent path finding on graphs.

0


0
Visit AI
What is MAPF_G2RL?
MAPF_G2RL is an open-source research framework that bridges graph theory and deep reinforcement learning to tackle the multi-agent path finding (MAPF) problem. It encodes nodes and edges into vector representations, defines spatial and collision-aware reward functions, and supports various RL algorithms such as DQN, PPO, and A2C. The framework automates scenario creation by generating random graphs or importing real-world maps, and orchestrates training loops that optimize policies for multiple agents simultaneously. After learning, agents are evaluated in simulated environments to measure path optimality, makespan, and success rates. Its modular design allows researchers to extend core components, integrate new MARL techniques, and benchmark against classical solvers.
MAPF_G2RL Core Features



Featured

RL-Algorithmen

PommerLearn

RL-Agents

Text-to-Reward

CybMASDE

MAPF_G2RL