
Figure 1
Overview of the RWC Music Database. Illustration includes a comic‑style portrait of Masataka Goto, generated using OpenAI’s DALL⋅E model via ChatGPT.

Figure 2
Citation counts from 2003–2024 for Goto et al. (2002), Goto et al. (2003), and Goto (2006), respectively.
Table 1
Key developments in the evolution of the RWC Music Database.
| Year | Comment | Sub.‑Coll. |
|---|---|---|
| 2000/01 | Dataset design and recordings | |
| 2002 | RWC presentation (Goto et al., 2002) | {P,R,C,J} |
| 2003 | RWC extension (Goto et al., 2003) | {G,I} |
| 2006 | AIST Annotations (Goto, 2006) | |
| 2009 | Music alignment (Ewert et al., 2009)3 | C |
| 2011 | Chord annotations (Cho and Bello, 2011)4 | P |
[i] Abbreviations: P = Pop, R = Royalty‑Free, C = Classical, J = Jazz, G = Genre, I = Instruments.
Table 2
Selection of key works building upon the RWC Music Database.
| Task @ Venue | Sub‑Collections | Citation |
|---|---|---|
| Chorus‑Section Detection @ ICASSP | P | (Goto, 2003) |
| Drum Sound Detection @ ISMIR | {P,I} | (Yoshii et al., 2004) |
| Musical Interfaces @ ISMIR | {P,R,C,J,G} | (Goto and Goto, 2005) |
| Singer Identification @ ISMIR | P | (Fujihara et al., 2005) |
| Instrument Identification @ ISMIR | {C,I} | (Kitahara et al., 2005) |
| Drum Loop Retrieval @ CBMI | P | (Gillet and Richard, 2005) |
| Audio Retrieval @ ISMIR | {P,R,C,J} | (Bertin and de Cheveigné, 2005) |
| Music Source Separation @ GRETSI | C | (Vincent and Gribonval, 2005) |
| Audio Melody Extraction @ WASPAA | G | (Ryynänen and Klapuri, 2005) |
| Automatic Mixing @ AXMEDIS | P | (Katayose et al., 2005) |
| Drum Sound Detection @ ICASSP | P | (Yoshii et al., 2006) |
| Lyrics Alignment @ ISM | P | (Fujihara et al., 2006) |
| Pitch Estimation @ ICASSP | P | (Fujihara et al., 2006) |
| Music Structure Analysis @ ISMIR | {P,J} | (Bruderer et al., 2006) |
| Musical Interfaces @ ISMIR | J | (Hamanaka, 2006) |
| Audio Coding @ TASLP | {C,J} | (Derrien et al., 2006) |
| Automatic Music Transcription @ ISMIR | P | (Ryynänen and Klapuri, 2006) |
| Genre Classification @ ISMIR | RWC | (Reed and Lee, 2006) |
| Music Structure Analysis @ AMCMM | P | (Paulus and Klapuri, 2006) |
| Pitch Estimation @ TASLP | {C,J} | (Kameoka et al., 2007) |
| Singing Voice Retrieval @ ISMIR | P | (Fujihara and Goto, 2007) |
| Thumbnail Image Generation @ ISMIR | G | (Yoshii and Goto, 2008) |
| Singing Synthesis @ SMC | P | (Nakano and Goto, 2009) |
| Beat Tracking @ TASLP | {P,C,J} | (Grosche and Müller, 2011) |
| Singer Identification @ ISMIR | P | (Lagrange et al., 2012) |
| Music Source Separation @ ICML | P | (Yoshii et al., 2013) |
| Singing Voice Detection @ ISMIR | P | (Lehner et al., 2013) |
| Pitch Estimation (pYIN) @ ICASSP | P (synth.) | (Mauch and Dixon, 2014) |
| Music Mashup @ TASLP | P | (Davies et al., 2014) |
| Chord Estimation @ ISMIR | P | (Zhou and Lerch, 2015) |
| Beat Tracking @ ISMIR | P | (Böck et al., 2016) |
| Melody Harmonization @ ISMIR | P | (Tsushima et al., 2017) |
| Lyrics Transcription @ ICASSP | P | (Nishikimi et al., 2019) |
| Singing Voice Separation @ WASPAA | P | (Nakano et al., 2019) |
| Audio Declipping @ TASLP | {P,J,C} | (Gaultier et al., 2021) |
| Singing Voice Extraction @ Electronics | P | (Gao et al., 2021) |
| Lyrics Generation @ ISMIR | P | (Watanabe and Goto, 2023) |
| Music Generation @ IJCAI | P | (Lin et al., 2024) |
Table 3
Overview of the five sub‑collections of the RWC Music Database.
| ID | Sub‑Collection | #Pieces | #CDs | #Tracks | Dur. |
|---|---|---|---|---|---|
| RWC‑P | Popular Music | 100 | 7 | 100 | 6:43:36 |
| RWC‑R | Royalty‑Free Music | 15 | 1 | 15 | 0:32:23 |
| RWC‑C | Classical Music | 50 | 6 | 61 | 5:27:08 |
| RWC‑J | Jazz Music | 50 | 4 | 50 | 3:42:20 |
| RWC‑G | Genre Mix | 100 | 9 | 102 | 6:58:26 |
| 315 | 27 | 328 | 23:23:55 | ||
[i] Abbreviations: P = Pop, R = Royalty‑Free, C = Classical, J = Jazz, G = Genre.

Figure 3
Track duration distributions for the five RWC sub‑collections. Tracks containing singing voice are highlighted with black overlays.

Figure 4
Tempo distribution of the RWC Popular Music sub‑collection, showing the tempo range in beats per minute (BPM) across all tracks.

Figure 5
Number of note events per collection and instrument group (top 10).

Figure 6
Two audio‑MIDI alignment examples illustrating linear time scaling (LTS) and non‑linear time warping (NTW) cases. The background shows the chroma‑based cost matrix; the red line indicates the alignment path computed with the SyncToolbox (Müller et al., 2021), while the dotted cyan line shows the corresponding linear fit. (a) LTS case: Musette in D major by Bach, performed on harpsichord (RWC_C024C). (b) NTW case: Prelude and Liebestod from Wagner’s Tristan und Isolde, performed by a symphony orchestra (RWC_C009).
Table 4
Classification of MIDI files into LTS (Linear Time Scaling) and NTW (Non‑linear Time Warping) cases based on heuristic criteria, summarizing the number of files per category. The two LTS columns indicate whether applying a time‑scaling factor of 1.0 results in an accumulated alignment error of (indicating tempo agreement between audio and MIDI) or (indicating a tempo mismatch requiring time scaling to adapt the MIDI). The threshold reflects a typical tolerance used in time‑critical tasks such as onset detection.
| ID | #Tracks | LTS | NTW | |
|---|---|---|---|---|
| ≤ 50 ms | > 50 ms | |||
| RWC‑P | 100 | 65 | 29 | 6 |
| RWC‑R | 15 | 13 | 2 | – |
| RWC‑C | 61 | – | – | 61 |
| RWC‑J | 50 | 6 | 5 | 39 |
| RWC‑G | 102 | 27 | 16 | 59 |

Figure 7
Illustration of the workflow for incorporating community‑provided corrections into the centralized annotation repository.
