Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning
Abstract
Option-aware Temporally Abstracted (OTA) value learning improves offline goal-conditioned reinforcement learning performance by refining the high-level policy through better advantage estimates in long-horizon settings.
Offline goal-conditioned reinforcement learning (GCRL) offers a practical learning paradigm where goal-reaching policies are trained from abundant unlabeled (reward-free) datasets without additional environment interaction. However, offline GCRL still struggles with long-horizon tasks, even with recent advances that employ hierarchical policy structures, such as HIQL. By identifying the root cause of this challenge, we observe the following insights: First, performance bottlenecks mainly stem from the high-level policy's inability to generate appropriate subgoals. Second, when learning the high-level policy in the long-horizon regime, the sign of the advantage signal frequently becomes incorrect. Thus, we argue that improving the value function to produce a clear advantage signal for learning the high-level policy is essential. In this paper, we propose a simple yet effective solution: Option-aware Temporally Abstracted value learning, dubbed OTA, which incorporates temporal abstraction into the temporal-difference learning process. By modifying the value update to be option-aware, the proposed learning scheme contracts the effective horizon length, enabling better advantage estimates even in long-horizon regimes. We experimentally show that the high-level policy extracted using the OTA value function achieves strong performance on complex tasks from OGBench, a recently proposed offline GCRL benchmark, including maze navigation and visual robotic manipulation environments.
Community
Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Flattening Hierarchies with Policy Bootstrapping (2025)
- Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning (2025)
- Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach (2025)
- TW-CRL: Time-Weighted Contrastive Reward Learning for Efficient Inverse Reinforcement Learning (2025)
- Policy-labeled Preference Learning: Is Preference Enough for RLHF? (2025)
- VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning (2025)
- Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper