Checkers DQN & PPO

Back

Jan 2025 — May 2025

Apex Inc.

Overview

In this project, I developed AI agents capable of learning to play the game of Checkers using two distinct reinforcement learning algorithms: Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO). The goal was to compare value-based and policy-based approaches in a moderately complex, strategic environment. Environment Setup: The game logic and board environment were implemented using custom Python code and integrated into a Gym-compatible format for training compatibility. The state representation captured piece positions, king status, and turn information, encoded into a structured input for the agents. DQN Agent: The DQN agent used a convolutional or MLP-based Q-network to estimate action values from board states. Experience replay and target networks were used to stabilize learning. PPO Agent: The PPO agent followed an actor-critic framework, learning a stochastic policy and a value function. It was optimized using clipped surrogate objectives to ensure stable policy updates. Training & Evaluation: Agents were trained in self-play mode, allowing them to iteratively improve by facing increasingly stronger versions of themselves. Performance was evaluated using win rates, average reward, and strategic depth of gameplay over time.

DQN showed faster convergence in early training but struggled with exploration and long-term planning. PPO demonstrated more stable learning and better generalization, especially in complex board states requiring deeper foresight. Self-play was key to avoiding overfitting and discovering robust strategies.