OAK

GIST Library Login

Metadata Downloads

Abstract: Understanding how transformer-based large language models make decisions re- mains a continuous challenge in artificial intelligence. While these models achieve im- pressive performance, their internal workings and reasoning processes remain unclear black boxes, especially on the attention level. In this thesis, I explore mechanistic in- terpretability methods and attempts to combine its findings with First-Order Logic to propose a framework idea that could systematically characterize strategic reason- ing in game-playing transformer models and formally express them. I focus on models trained to play games like Othello and Chess, which provides a controlled domain where rules and optimal strategies are fully known, which proves ideal for intepreting the transformer’s reasoning at the attention level. I synthesize insights from circuit- level interpretability, probing methodologies, neuro-symbolic systems with First-Order Logic, and emergence of world models to manifest a way to identify computational pathways, detect encoded strategy heuristics, and translate attention patterns into explicit First-Order Logic formulas.

공개 및 라이선스

qrcode

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.