Comprehensive mapping of activated transposable elements (TEs) using public long-read RNA sequencing data
- Abstract
- Transposable elements (TEs) are highly repetitive DNA sequences that comprise more than half of the human genome. The biological functions of TE-derived transcript during development, disease, and evolution have been reported. However, transcriptome-guided annotation of TE-derived transcripts of the human genome is still lacking due to the several technical limitations of short-read RNA-sequencing (SR RNA-seq). To solve this problem, I compiled comprehensive annotations of TE-derived transcripts using 2,605 publicly available long-read RNA-sequencing data (LR RNA-seq). The established pipeline includes an in-house TE filtering tool, which was utilized for constructing the annotations. These novel TE annotations are examined and validated for the diverse characteristics of transcripts. Canonical splice acceptor and donor sites were validated through consensus sequence motif analysis. Spliced isoforms of full-length TE transcripts were identified. Also, precise alternative polyadenylation sites of known TE-derived transcripts, including long non-coding RNA (lncRNA), were characterized. Additionally, we validated previously reported fusion transcripts between TE and nonTE genes, including known chimeric transcripts from onco-exaptation events, including LOR1a-IRF5 and L1PA2-XCL1. Next, I analyzed single-cell RNA-sequencing (scRNA-seq) data of 174,419 cells from various human tissues using the novel TE annotations. I revealed dozens of cell type-specific TE-derived genes and isoforms in a locus-specific manner. Moreover, TE-based clustering and loop analysis in scRNA-seq data suggested the potential regulatory functions of TE-derived transcripts for maintaining cellular identity. I expect my long-read-based TE annotations can be useful for accurate quantification and characterization of onco-exaptation events in both long- and short-read RNA-seq experiments. Furthermore, my novel TE-derived transcript annotations can reveal previously unknown cell type-specific TE-derived RNA species transcribed from intergenic regions, which can provide valuable information about the cellular heterogeneity of a complex human tissue. In conclusion, these annotations will provide valuable biological insights and new information on the previously unknown function and biology of TE-derived transcripts.
- Author(s)
- Chaemin Lim
- Issued Date
- 2022
- Type
- Thesis
- URI
- https://scholar.gist.ac.kr/handle/local/19021
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.