OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of AI Convergence 4. Theses(Ph.D)

Automated Sound-Visualized Caption System to Enhance Accessibility for Deaf and Hard-of-Hearing Users: Focus on Speech and Non-Speech Sound Nuances

Metadata Downloads

Author(s): 김주영

Type: Thesis

Degree: Doctor

Department: 대학원 융합기술학제학부(문화기술프로그램)

Advisor: Hong, Jin-Hyuk

Abstract: The sounds around us create rich information—such as intent, emotion, and atmosphere—through both linguistic and paralinguistic cues. For individuals who are deaf or hard-of-hearing, these layers of auditory information are often inaccessible, highlighting the need to translate sound into visual formats. Captions have been a powerful tool in this endeavor, providing a visual means to interpret sounds and enabling users to interact with sound visually. However, conventional caption systems mainly focus on spoken words or basic sound categories, limiting their ability to convey sound nuances. This limitation restricts the depth of experience for users who rely on captions to fully engage with audio-rich content. Recent advances in artificial intelligence (AI)-based sound recognition technologies have improved caption quality by accurately identifying speech and sound classes. Still, there has been limited research on recognizing and visualizing paralinguistic expressions. Due to the scarcity of datasets and studies related to sound nuances, research on sound-visualized captions remains challenging. To address this gap, this thesis proposes sound-visualized caption systems that automatically visualize nuanced information into captions. These systems include both speech and non-speech visualizations, using ty- pographic design and punctuation to express speech nuances, and onomatopoeic descriptions to represent non- speech sound nuances. Specifically, this work explores methodologies for mapping caption design elements to effectively convey sound nuances, techniques for recognizing these nuances, and a system architecture that generates sound-visualized captions from auditory input. This thesis not only presents a novel framework for implementing sound-visualized captions but also offers an empirical understanding of how these captions can enhance sound accessibility and viewing experiences for deaf and hard-of-hearing users. In addition, it discusses the potential benefits and challenges of sound- visualized captions, design implications for improving the proposed systems, and future research opportunities to connect with other studies on sound accessibility and bidirectional sound-text conversion.

URI: https://scholar.gist.ac.kr/handle/local/18955

Fulltext: http://gist.dcollection.net/common/orgView/200000826500

Alternative Author(s): JooYeong Kim

Appears in Collections:: Department of AI Convergence > 4. Theses(Ph.D)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.