Fixing Async `compute_old_log_prob` Error In PyTorch < 2.8
This article delves into a specific issue encountered within the Fully Async training framework when using PyTorch versions older than 2.8.0. We'll explore the root cause of the compute_old_log_prob shape mismatch error, the conditions under which it arises, and the solution we implemented. This is particularly relevant for researchers, developers, and practitioners working with asynchronous reinforcement learning and distributed training, especially those using hardware accelerators like NPUs with PyTorch versions predating 2.8.0.
Understanding the compute_old_log_prob Error
In Fully Async training, the computation of old_log_prob plays a crucial role in maintaining stability and convergence. This process involves comparing the log probabilities of actions taken by the current policy with those of an older policy, often referred to as the rollout policy. This comparison is essential for techniques like Proximal Policy Optimization (PPO), where the difference between these probabilities is used to constrain policy updates, preventing drastic changes that could destabilize training. The provided code snippet highlights the logic involved in computing old_log_prob within the Fully Async framework. Let's break down the relevant sections:
if async_training and async_training.use_rollout_log_probs:
# If local_triger_step == 1, load the training engine's parameters to the CPU
# and save a copy for subsequent MIS use.
# If local_trigger_step == 2, 3, ..., restore the parameters of version 1 to calculate the old_log_prob,
# then restore the parameters of the current version.
if local_trigger_step == 1:
self.actor_rollout_wg.save_model_to_cpu(1)
batch = compute_old_log_prob(batch)
elif local_trigger_step is not None:
self.actor_rollout_wg.save_model_to_cpu(local_trigger_step)
self.actor_rollout_wg.restore_model_from_cpu(1)
batch = compute_old_log_prob(batch)
self.actor_rollout_wg.restore_model_from_cpu(local_trigger_step)
self.actor_rollout_wg.clear_cpu_model(local_trigger_step)
else:
batch.batch["old_log_probs"] = batch.batch["rollout_log_probs"]
batch.meta_info["temperature"] = self.config.actor_rollout_ref.rollout.temperature
The code operates based on the local_trigger_step, which essentially dictates when and how the old log probabilities are computed. When local_trigger_step is 1, the current model's parameters are saved to the CPU, and then compute_old_log_prob is called. For subsequent steps (when local_trigger_step is not None), the model parameters from the first step are restored to calculate the old_log_prob, followed by restoring the parameters of the current version. This mechanism is designed to compare the current policy's actions with a consistent older policy. However, this process encountered a shape mismatch error in PyTorch versions below 2.8.0 due to how parameter sharding was handled.
The Root Cause: Parameter Sharding in Older PyTorch Versions
The shape mismatch error stems from a specific behavior in PyTorch versions prior to 2.8.0 related to parameter sharding. Parameter sharding is a technique used in distributed training to divide model parameters across multiple devices (e.g., GPUs or NPUs). This allows for training larger models that wouldn't fit on a single device. The issue arises because, in older PyTorch versions, certain parameters, particularly those within normalization layers (e.g., model.norm.weight), could become "unsharded" after a forward pass. This means that while the parameters might have been sharded initially, their shape would be altered during the forward computation, effectively removing the sharding. Consequently, when the code attempts to restore the model parameters from the CPU using self.actor_rollout_wg.restore_model_from_cpu(local_trigger_step), a shape mismatch error occurs. The restored parameters have a different shape than the current (unsharded) parameters in the model.
This issue was particularly prominent when using hardware accelerators like NPUs with torch-npu==2.7.1, which is the specific environment where the error was initially observed. The problem manifests itself during the parameter restoration process because the saved model on the CPU retains the original sharded shape, while the model on the NPU has unsharded parameters after the forward pass. This discrepancy in shapes leads to the compute_old_log_prob function failing due to the incompatible tensor dimensions.
The Solution: Upgrading to PyTorch 2.8.0 or Later
The most direct and effective solution to this shape mismatch error is to upgrade to PyTorch version 2.8.0 or later. The PyTorch team addressed this parameter sharding behavior in these later versions, ensuring that parameters maintain their sharded shape throughout the forward pass and subsequent operations. By upgrading, you eliminate the root cause of the error, preventing the shape mismatch from occurring during the model restoration process. In our case, upgrading from torch-npu==2.7.1 to a PyTorch version 2.8.0 or higher resolved the issue. This highlights the importance of staying up-to-date with the latest PyTorch releases, as they often include bug fixes and performance improvements that can significantly impact training stability and efficiency.
While upgrading PyTorch is the recommended solution, it's worth noting that alternative workarounds might exist depending on the specific setup and constraints. For instance, one could potentially modify the model architecture to avoid using normalization layers that exhibit this unsharding behavior. However, this approach might not be feasible or desirable in all cases, as it could impact model performance or require significant code changes. Therefore, upgrading PyTorch remains the most reliable and straightforward solution.
Implications for Asynchronous Training and Distributed Reinforcement Learning
This issue and its resolution have important implications for asynchronous training and distributed reinforcement learning. Asynchronous training is a paradigm where multiple actors or agents interact with the environment concurrently, collecting data and updating the policy in parallel. This can significantly speed up training compared to synchronous approaches. However, asynchronous methods introduce complexities related to maintaining consistency and stability, especially when dealing with distributed training across multiple devices. The compute_old_log_prob error highlights one such complexity, demonstrating how subtle differences in framework behavior can lead to significant issues in distributed asynchronous training setups.
The fact that this error surfaced specifically in the context of Fully Async training underscores the challenges associated with this advanced training paradigm. Fully Async aims to maximize training efficiency by decoupling the actor and learner processes, allowing them to operate independently and asynchronously. While this approach can lead to significant speedups, it also requires careful handling of synchronization and data consistency. The shape mismatch error we encountered serves as a reminder of the importance of thoroughly testing and validating asynchronous training implementations, particularly when using older versions of deep learning frameworks.
Conclusion
In conclusion, the compute_old_log_prob shape mismatch error encountered in Fully Async training with PyTorch versions below 2.8.0 highlights the importance of understanding the nuances of deep learning frameworks and their behavior in distributed settings. The root cause lies in the parameter sharding behavior of older PyTorch versions, where parameters could become unsharded after a forward pass, leading to inconsistencies during model restoration. The recommended solution is to upgrade to PyTorch 2.8.0 or later, which addresses this issue. This experience underscores the need for careful consideration of framework versions and their potential impact on training stability and efficiency, especially in advanced training paradigms like asynchronous reinforcement learning.
For more information on PyTorch and its capabilities, please visit the official PyTorch website.