Band Splitting-Based Online Music Source Separation with Low Latency
- Abstract
- Music source separation has rapidly advanced with the development of deep learning and neural network architectures. Band-split RNN (BSRNN), one of the recent models, shows superior performance by employing band splitting and dual-path recurrent layers. For online separation, however, there is a trade-off between separation performance and latency. Moreover, model capacity and complexity are crucial factors for real-time applications. In this paper, we extend BSRNN to an online separation model. We adopt a structure combining single-input-multi-output (SIMO) for separation and single-input-single-output (SISO) for enhancement. Additionally, we propose a finetuning strategy that utilizes the training songs themselves, which are expected to retain characteristics similar to those of the evaluation set. Experimental results show that our model outperforms a recent real-time model that employs additional data, and the finetuning stage demonstrates its effectiveness of 0.2 dB improvement in SDR.
- Author(s)
- Joohye Son
- Issued Date
- 2024
- Type
- Thesis
- URI
- https://scholar.gist.ac.kr/handle/local/18963
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.