Skip to main content
Have a personal or library account? Click to login
A Lightweight Two‑Branch Architecture for Multi‑Instrument Transcription via Note‑Level Contrastive Clustering Cover

A Lightweight Two‑Branch Architecture for Multi‑Instrument Transcription via Note‑Level Contrastive Clustering

By: Ruigang Li and  Yongxu Zhu  
Open Access
|Apr 2026

Abstract

Existing multi‑timbre transcription models struggle with generalization beyond pretrained instruments, rigid source‑count constraints, and high computational demands that hinder deployment on low‑resource devices. We address these limitations with a lightweight model that extends a timbre‑agnostic transcription backbone with a dedicated timbre encoder and performs deep clustering at the note level, enabling joint transcription and dynamic separation of arbitrary instruments given a specified number of instrument classes. Practical optimizations, including spectral normalization, dilated convolutions, and contrastive clustering, further improve efficiency and robustness. Despite its small size and fast inference, the model achieves competitive performance with heavier baselines in terms of transcription accuracy and separation quality and shows a promising generalization ability, making it highly suitable for real‑world deployment in practical and resource‑constrained settings.

DOI: https://doi.org/10.5334/tismir.300 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jun 28, 2025
Accepted on: Mar 25, 2026
Published on: Apr 15, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Ruigang Li, Yongxu Zhu, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.