OAK

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

Metadata Downloads
Author(s)
Cho, GeonwooIm, JaegyunLee, JihwanYi, HojunKim, SejinKim, Sundong
Type
Conference Paper
Citation
ICLR 2026 (The Fourteenth International Conference on Learning Representations), pp.1 - 36
Issued Date
2026-04-25
Abstract
Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition-prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called Co-Learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED produces curricula that improve zero-shot generalization over strong baselines across multiple benchmarks. Ablation studies confirm that the transition-prediction error drives rapid complexity ramp-up and that Co-Learnability delivers additional gains when paired with the transition-prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED.
Publisher
ICLR
Conference Place
BL
브라질
URI
https://scholar.gist.ac.kr/handle/local/33592
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.