Identification of cell-type-specific, transcriptionally active transposable elements using long-read RNA-sequencing data-based comprehensive annotation
- Author(s)
- Lim, Chaemin; An, Hyunsu; Park, Jihwan
- Type
- Article
- Citation
- Genomics and Informatics, v.23, no.1
- Issued Date
- 2025-12
- Abstract
- Background: The biological functions of transposable element (TE)-derived transcripts during physiological development, disease development, and progression have been previously reported. However, research on locus-specific TE-derived transcript expression in various human cell types remains limited. Methods: We processed 2596 publicly available human long-read RNA-sequencing (LR RNA-seq) datasets covering 21 organs and 71 cell lines in both healthy individuals and diseased patients with various conditions to compile this TE-derived transcript annotation. We established a pipeline for assembling transcripts containing TE sequences to measure transcriptionally active TE-derived transcripts in diverse tissues and cell types. Next, we applied our TE annotation to the Genotype-Tissue Expression (GTEx) single-cell RNA-sequencing (scRNA-seq) data from eight tissues. Results: We constructed the first transcriptom6e-based TE annotation using massive amounts of human LR RNA-seq data for use as a comprehensive reference to detect locus-specific TE-derived transcripts. Our annotation showed better detection accuracy for TE-derived transcripts than the RepeatMasker and GENCODE nonTE gene annotations. This annotation enabled the identification of novel TE-derived transcripts and their isoforms. We also identified alternative transcription end sites for long noncoding genes and confirmed previously annotated TE-nonTE gene fusion transcripts. Next, we applied our TE-derived transcript annotation to public scRNA-seq data from various human tissues and identified several cell-type-specific TE-derived transcripts in a locus-specific manner. Conclusions: We generated a comprehensive, TE-derived transcript annotation using large-scale, LR RNA-seq data. Researchers can use our TE reference annotation to analyze active TE transcripts and their splicing isoforms in specific transcriptome datasets and to detect de novo TE transcripts. The discovery of cell-type-specific TE-derived transcripts may help explain mechanisms underlying the maintenance of cellular identity and provide new insights into the pathological mechanisms of various diseases. © 2025 Elsevier B.V., All rights reserved.
- Publisher
- BioMed Central Ltd
- ISSN
- 1598-866X
- DOI
- 10.1186/s44342-025-00048-1
- URI
- https://scholar.gist.ac.kr/handle/local/32007
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.