TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design
- Author(s)
- Cho, Geonwoo; Im, Jaegyun; Lee, Jihwan; Yi, Hojun; Kim, Sejin; Kim, Sundong
- Type
- Conference Paper
- Citation
- ICLR 2026 (The Fourteenth International Conference on Learning Representations), pp.1 - 36
- Issued Date
- 2026-04-25
- Abstract
- Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition-prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called Co-Learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED produces curricula that improve zero-shot generalization over strong baselines across multiple benchmarks. Ablation studies confirm that the transition-prediction error drives rapid complexity ramp-up and that Co-Learnability delivers additional gains when paired with the transition-prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED.
- Publisher
- ICLR
- Conference Place
- BL
브라질
- URI
- https://scholar.gist.ac.kr/handle/local/33592
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.