Reward Phase Hybrid PPO · Visual Diagnostics
← Previous
Next →
←
→
Figure 01
Interpretation
These figures summarize training behavior under stress: return trends, entropy regulation, and stability indicators. The goal is to detect collapse modes early and verify phase-robust adaptation.