Wheel Loader RL: Autonomous Scooping in Isaac Sim / IsaacLab

Learning contact-rich scooping motions for an articulated wheel loader in high-fidelity physics simulation.

Isaac Sim IsaacLab Granular Contact Reinforcement Learning Construction Robotics PhysX

Links

Main Portfolio

Objective & Metrics

The goal of this project is to train an autonomous wheel loader to perform stable, repeatable scooping motions under granular contact. The emphasis is on bucket fill consistency, actuator smoothness, and robustness to pile variation.

Primary objective

Consistent scooping behavior

Task metric

Bucket fill proxy

Stability metric

Torque / oscillation penalties

Generalization metric

Success across pile variations

Introduction

Wheel loaders are central to construction and mining operations, executing repetitive scooping and loading cycles in highly unstructured environments. Automating these tasks is challenging due to articulated dynamics, underactuated vehicle motion, and contact-rich interaction with granular material.

This project explores whether reinforcement learning can produce robust scooping policies in a simulation-first setting using Isaac Sim / IsaacLab, providing a testbed for future sim-to-real transfer.

System Pipeline

High-fidelity articulated loader simulation with granular rock piles.
Scripted/Bézier motion baselines for debugging and comparison.
PPO-style RL with reward shaping for fill, smoothness, and safety.
Evaluation across randomized pile configurations.

Methods

Simulation Environment

Custom IsaacLab environment with articulated arm and bucket joints.
Granular piles modeled using rigid bodies / particle-like approximations.
Joint limits, actuator constraints, and PD control integrated.

Observations & Actions

Observations: joint states, bucket pose, task-relevant contact signals.
Actions: continuous joint commands for arm and bucket actuation.

Reward Design

Positive reward for payload retained in the bucket.
Penalties for excessive torque, oscillations, and unsafe contacts.
Terminal penalties for failed scooping attempts.

Results

In simulation, the learned policy achieved smoother and more repeatable scooping motions than scripted baselines. The policy reduced abrupt reversals during pile penetration and exhibited improved stability across moderate pile variations.

Improved consistency in bucket fill across evaluation episodes.
Reduced torque spikes compared to baseline motions.
Better tolerance to pile shape variation.

Discussion

This project highlights the importance of reward design and stable contact modeling in learning contact-rich manipulation behaviors. While the policy performs well within the training regime, generalization to wider pile distributions and real hardware remains an open challenge.

Future work includes stronger domain randomization, perception-conditioned policies, and hierarchical task decomposition for full loading cycles.

My Contribution

Designed and implemented the Isaac Sim / IsaacLab scooping environment.
Built scripted motion baselines and reward shaping.
Trained and evaluated RL policies under granular contact.
Analyzed failure modes and prepared portfolio visualizations.
Authored this project webpage.