Skip to main content
Have a personal or library account? Click to login
Attend to Chords: Improving Harmonic Analysis of Symbolic Music Using Transformer-Based Models Cover

Attend to Chords: Improving Harmonic Analysis of Symbolic Music Using Transformer-Based Models

By: Tsung-Ping Chen and  Li Su  
Open Access
|Feb 2021

Figures & Tables

Figure 1

Basic building blocks of the Transformer, the bi-directional Transformer for chord recognition (BTC), and the Harmony Transformer (HT) models, using on multi-head attention (MHA) and feed-forward networks (FFN). Note that both the encoder and the decoder have repetitive layers which are not shown in the figure.

Table 1

Number of parameters in the MHA and the FFN blocks; h, d, and n stand for the number of heads, the feature size of the partitioned keys (Ki), and the kernel size of the convolution, respectively. We set h = 4, d = 32, and n = 3 for the experiments.

Computational BlockParameterSizeTotal
Multi-head AttentionWChd × hd4h2d2
WiQ,WiK,WiVhd × d
Fully-connected FFNW1hd × 4hd8h2d2
W24hd × hd
Convolutional FFNWiQ,WiK,WiVn × hd × hd2nh2d2
Figure 2

Improved Harmony Transformer (HT*).

Table 2

Annotated chord qualities and the mapping to the major-minor vocabulary.

QualityMajor-Minor Mapping
Major (M)M
Minor (m)m
Augmented (a)others
Diminished (d)others
Major Seventh (M7)M
Minor Seventh (m7)m
Dominant Seventh (D7)M
Diminished Seventh (d7)others
Half-diminished Seventh (h7)others
Augmented Sixth (a6)M
Figure 3

Statistics of the chord quality and degree annotations (some minor cases are omitted).

Table 3

Vocabularies of the functional harmony recognition task. The 21 tonics include {C,D,E,F,G,A,B} by {♮,♯,♭}; the 2 modes are {major, minor}; the 9 primary degrees are {1,2,3,4,5,6,7,b2,b7}; the 14 secondary degrees are {1,2,3,4,5,6,7,#1,#3,#4,b1,b3,b6,b7}; the 4 inversions are {root position,1st,2nd,3rd}.

OutputComponentVocabulary Size
Key21 tonics42
2 modes
Roman Numeral9 primary degrees5040
14 secondary degrees
10 qualities
4 inversions
Table 4

Evaluations with the BPS-FH dataset and the Bach Preludes. All the scores (in percentage) are averaged over 4 validation sets; the standard deviations of the scores are also provided.

BPS-FH
ModelChord Symbol RecognitionFunctional Harmony Recognition
AccuracySegmentationKeyRoman numeralSegmentation
BTC82.46±1.5581.30±1.0877.65±1.8337.98±1.3466.73±4.05
BTC-singleBi82.16±1.6680.78±1.3975.96±0.7935.77±1.8568.83±1.69
BTC-FC82.06±1.8381.24±1.2678.40±2.1037.60±1.7665.56±3.86
HT83.19±1.6583.47±1.2277.94±2.2437.00±2.8871.93±2.72
HT-noW83.06±1.5883.26±0.7177.13±1.7836.84±2.3973.53±1.26
HT-noReg83.19±1.3183.33±1.2676.70±1.2635.33±1.7970.51±1.16
CRNN79.79±0.8481.49±1.9175.56±2.8434.83±1.3867.75±3.59
HT*83.98±1.0885.09±0.9679.07±2.7041.74±2.6375.50±1.72
Bach Preludes
ModelChord Symbol RecognitionFunctional Harmony Recognition
AccuracySegmentationKeyRoman numeralSegmentation
BTC74.12±0.1277.20±3.6448.63±4.4825.25±1.7664.19±2.08
BTC-singleBi75.67±1.4278.85±4.8146.24±5.9023.35±1.9960.40±6.25
BTC-FC75.53±1.2277.81±4.5146.05±1.8422.97±2.1957.24±4.17
HT77.18±1.2480.46±3.3651.15±2.4723.75±2.2066.82±4.52
HT-noW76.51±1.4581.14±3.3148.95±2.8824.99±1.3267.61±4.75
HT-noReg76.33±1.2380.76±4.4050.62±3.9323.82±2.4365.23±4.80
CRNN69.79±1.1579.47±2.0347.03±6.5918.53±2.2361.79±1.83
HT*78.54±2.0683.86±2.2456.28±2.5325.95±1.6773.60±1.80
Figure 4

Examples of the attention maps in the MHA units; the color bars show the relative intensity of attention. The input segment is Beethoven’s Piano Sonata No. 1, MM. 1–8. (a) Two attention heads in the intra-MHA of the decoder. The vertical and the horizontal axes respectively represent the queries and the keys, both of which indicate the same sequence to be recognized. (b) Two attention heads in the inter-MHA of the decoder. The vertical axis is the decoder sequence to be recognized (queries), and the horizontal axis is the encoder sequence (keys) for the chord change estimation (the positions where the chords change are indicated by vertical dashed lines).

DOI: https://doi.org/10.5334/tismir.65 | Journal eISSN: 2514-3298
Language: English
Submitted on: May 10, 2020
Accepted on: Jan 7, 2021
Published on: Feb 24, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Tsung-Ping Chen, Li Su, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.