Attend to Chords: Improving Harmonic Analysis of Symbolic Music Using Transformer-Based Models

Tsung-Ping Chen; Li Su

doi:10.5334/tismir.65

Attend to Chords: Improving Harmonic Analysis of Symbolic Music Using Transformer-Based Models

Transactions of the International Society for Music Information Retrieval

Volume 4 (2021): Issue 1

By: Tsung-Ping Chen and Li Su

Open Access

|Feb 2021

Figures & Tables

Basic building blocks of the Transformer, the bi-directional Transformer for chord recognition (BTC), and the Harmony Transformer (HT) models, using on multi-head attention (MHA) and feed-forward networks (FFN). Note that both the encoder and the decoder have repetitive layers which are not shown in the figure.

Table 1

Number of parameters in the MHA and the FFN blocks; h, d, and n stand for the number of heads, the feature size of the partitioned keys (K_i), and the kernel size of the convolution, respectively. We set h = 4, d = 32, and n = 3 for the experiments.

Computational Block	Parameter	Size	Total
Multi-head Attention	W^C	hd × hd	4h²d²
$W_{i}^{Q}, W_{i}^{K}, W_{i}^{V}$	hd × d
Fully-connected FFN	W₁	hd × 4hd	8h²d²
W₂	4hd × hd
Convolutional FFN	$W_{i}^{Q}, W_{i}^{K}, W_{i}^{V}$	n × hd × hd	2nh²d²

Table 2

Annotated chord qualities and the mapping to the major-minor vocabulary.

Quality	Major-Minor Mapping
Major (M)	M
Minor (m)	m
Augmented (a)	others
Diminished (d)	others
Major Seventh (M7)	M
Minor Seventh (m7)	m
Dominant Seventh (D7)	M
Diminished Seventh (d7)	others
Half-diminished Seventh (h7)	others
Augmented Sixth (a6)	M

Statistics of the chord quality and degree annotations (some minor cases are omitted).

Table 3

Vocabularies of the functional harmony recognition task. The 21 tonics include {C,D,E,F,G,A,B} by {♮,♯,♭}; the 2 modes are {major, minor}; the 9 primary degrees are {1,2,3,4,5,6,7,b2,b7}; the 14 secondary degrees are {1,2,3,4,5,6,7,#1,#3,#4,b1,b3,b6,b7}; the 4 inversions are {root position,1st,2nd,3rd}.

Output	Component	Vocabulary Size
Key	21 tonics	42
2 modes
Roman Numeral	9 primary degrees	5040
Roman Numeral	14 secondary degrees	5040
10 qualities
4 inversions

Table 4

Evaluations with the BPS-FH dataset and the Bach Preludes. All the scores (in percentage) are averaged over 4 validation sets; the standard deviations of the scores are also provided.

BPS-FH
Model	Chord Symbol Recognition		Functional Harmony Recognition
	Accuracy	Segmentation	Key	Roman numeral	Segmentation
BTC	82.46±_1.55	81.30±_1.08	77.65±_1.83	37.98±_1.34	66.73±_4.05
BTC-singleBi	82.16±_1.66	80.78±_1.39	75.96±_0.79	35.77±_1.85	68.83±_1.69
BTC-FC	82.06±_1.83	81.24±_1.26	78.40±_2.10	37.60±_1.76	65.56±_3.86
HT	83.19±_1.65	83.47±_1.22	77.94±_2.24	37.00±_2.88	71.93±_2.72
HT-noW	83.06±_1.58	83.26±_0.71	77.13±_1.78	36.84±_2.39	73.53±_1.26
HT-noReg	83.19±_1.31	83.33±_1.26	76.70±_1.26	35.33±_1.79	70.51±_1.16
CRNN	79.79±_0.84	81.49±_1.91	75.56±_2.84	34.83±_1.38	67.75±_3.59
HT*	83.98±_1.08	85.09±_0.96	79.07±_2.70	41.74±_2.63	75.50±_1.72

Bach Preludes
Model	Chord Symbol Recognition		Functional Harmony Recognition
	Accuracy	Segmentation	Key	Roman numeral	Segmentation
BTC	74.12±_0.12	77.20±_3.64	48.63±_4.48	25.25±_1.76	64.19±_2.08
BTC-singleBi	75.67±_1.42	78.85±_4.81	46.24±_5.90	23.35±_1.99	60.40±_6.25
BTC-FC	75.53±_1.22	77.81±_4.51	46.05±_1.84	22.97±_2.19	57.24±_4.17
HT	77.18±_1.24	80.46±_3.36	51.15±_2.47	23.75±_2.20	66.82±_4.52
HT-noW	76.51±_1.45	81.14±_3.31	48.95±_2.88	24.99±_1.32	67.61±_4.75
HT-noReg	76.33±_1.23	80.76±_4.40	50.62±_3.93	23.82±_2.43	65.23±_4.80
CRNN	69.79±_1.15	79.47±_2.03	47.03±_6.59	18.53±_2.23	61.79±_1.83
HT*	78.54±_2.06	83.86±_2.24	56.28±_2.53	25.95±_1.67	73.60±_1.80