Table 1
Overview of song material and demographics in studies A and B.
| genres | songs | participants | female | male | average age | age range | |
| Study A | 5 | 90 | 6 | 3 | 3 | 28.2 | 26 to 34 |
| Study B | 1 | 90 | 28 | 20 | 8 | 25.6 | 21 to 35 |
Table 2
Overview of results for time points t1, t2 and between t1 and t2 (t1 → t2), for both study A and B. Shown are average correlations ρ and upper bounds B80 ± standard deviations, also for MIREX AMS task (last two lines).
| study A – five genres | study B – one genre | |||||
| t1 | t2 | t1 → t2 | t1 | t2 | t1 → t2 | |
| ρ | 0.73 ± .065 | 0.75 ± .065 | 0.80 ± .103 | 0.26 ± 0.146 | 0.22 ± 0.153 | 0.38 ± 0.175 |
| B80 | 67.7 ± 19.5 | 57.5 ± 25.6 | 82.1 ± 14.6 | 49.0 ± 22.8 | 45.0 ± 21.4 | 58.3 ± 21.6 |
| ρAMS | 0.40 ± .027 | |||||
| 61.65 ± 27.0 | ||||||

Figure 1
Average score inter-rater agreement for different intervals of scores (solid line) ± one standard deviation (dash-dot lines), at time point t1 (top) and t2 (bottom) for study A. The dashed line indicates theoretical perfect agreement.

Figure 2
Histogram plots of all scores (left), scores within genres (middle) and scores between genres (right) for both time points t1 and t2 of study A. Average scores are plotted as vertical dashed lines and also given in the respective titles.

Table 3
Genre score matrix at time t1 (top) and t2 (bottom) for study A, showing average scores per genre combination. Left-most column: genre of query song; top line: genre of candidate song.

Figure 3
Scores at t1 (x-axis) vs. t2 (y-axis) for the two raters who achieved maximal (top) and minimal (bottom) intra-rater correlation in study B.

Figure 4
Intra-rater agreement correlations (x-axis) versus BMIS agreement correlations (y-axis). The horizontal dashed line illustrates the proposed exemption of participants with BMIS correlation < 0.4.
