The goal of this project is to train an autonomous wheel loader to perform stable, repeatable
scooping motions under granular contact. The emphasis is on bucket fill consistency,
actuator smoothness, and robustness to pile variation.
Primary objective
Consistent scooping behavior
Task metric
Bucket fill proxy
Stability metric
Torque / oscillation penalties
Generalization metric
Success across pile variations
Introduction
Wheel loaders are central to construction and mining operations, executing repetitive
scooping and loading cycles in highly unstructured environments. Automating these tasks
is challenging due to articulated dynamics, underactuated vehicle motion, and
contact-rich interaction with granular material.
This project explores whether reinforcement learning can produce robust scooping policies
in a simulation-first setting using Isaac Sim / IsaacLab, providing a testbed
for future sim-to-real transfer.
System Pipeline
High-fidelity articulated loader simulation with granular rock piles.
Scripted/Bézier motion baselines for debugging and comparison.
PPO-style RL with reward shaping for fill, smoothness, and safety.
Evaluation across randomized pile configurations.
Methods
Simulation Environment
Custom IsaacLab environment with articulated arm and bucket joints.
Granular piles modeled using rigid bodies / particle-like approximations.
Joint limits, actuator constraints, and PD control integrated.
Actions: continuous joint commands for arm and bucket actuation.
Reward Design
Positive reward for payload retained in the bucket.
Penalties for excessive torque, oscillations, and unsafe contacts.
Terminal penalties for failed scooping attempts.
Results
In simulation, the learned policy achieved smoother and more repeatable scooping motions
than scripted baselines. The policy reduced abrupt reversals during pile penetration and
exhibited improved stability across moderate pile variations.
Improved consistency in bucket fill across evaluation episodes.
Reduced torque spikes compared to baseline motions.
Better tolerance to pile shape variation.
Discussion
This project highlights the importance of reward design and stable contact modeling
in learning contact-rich manipulation behaviors. While the policy performs well within
the training regime, generalization to wider pile distributions and real hardware
remains an open challenge.
Future work includes stronger domain randomization, perception-conditioned policies,
and hierarchical task decomposition for full loading cycles.
My Contribution
Designed and implemented the Isaac Sim / IsaacLab scooping environment.
Built scripted motion baselines and reward shaping.
Trained and evaluated RL policies under granular contact.
Analyzed failure modes and prepared portfolio visualizations.