OAK

GIST Library Login

Metadata Downloads

Abstract: Reinforcement learning currently faces challenges in agents exhibit the ability to overfit to training sets and suffer in the generalization from small changes in their environment. To address this issue, recent studies are exploring regret-based curriculum learning approaches to enhance the robustness of the agents. These methods aim to accelerate learning by gradually providing agents with more challenging environments without prior domain knowledge. However, applying regret-based curriculum learning in a cooperative multi-agent setting presents difficulties. Unlike previous curriculum learning setups which single or competitive agent settings, each agent shares the same group reward and must consider the sub-optimal policy of the other agent. This aspect poses difficulties in accurately estimating an agent's regret which approximates the learning potential of the environment. In this paper, we present a suitable sampling method for the cooperation environment by applying environment-diverse metrics that use hamming distance to previous sampling techniques. Following the verification process conducted on the Overcooked environment, the sampling method based on minimizing agents' return demonstrates better zero-shot performance compared to random sampling. Furthermore, the proposed metric to measure the dissimilarity between environments effectively resolves the overfitting of replaying a specific map.

공개 및 라이선스

qrcode

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.