Skip to main content
Have a personal or library account? Click to login
Measure by Measure: Measure-Based Automatic Music Composition with Modern Staff Notation Cover

Measure by Measure: Measure-Based Automatic Music Composition with Modern Staff Notation

By: Yujia Yan and  Zhiyao Duan  
Open Access
|Nov 2024

Figures & Tables

Figure 1

Overview of the proposed approach. The measure encoder encodes the entire piece into a grid, where each cell contains a vector summarizing the content of a measure within a part (left) and a matrix capturing the detailed structure of the measure (right). The example score is from J.S. Bach’s Art of Fugue.

Figure 2

The common object hierarchy of a single part‑wise measure. This hierarchy specifies the compositional structure of a part‑wise measure, which includes four levels from high to low: measure, voice, chord, and note.

Figure 3

A high‑level illustration of the proposed measure model. The measure/voice encoder and decoder have two outputs/inputs that are vector‑valued and matrix‑valued, respectively.

Figure 4

Example of cross‑beat note splitting, from the left two measures to the right two measures: the note C crossing the (3 + 3)/8 division is split into an eighth note and a quarter note; similarly, the dotted half note is split into two dotted quarter notes.

Figure 5

Linearization of the measure grid for autoregressive modeling using the measure model.

Figure 6

Masked conditional specification.

Algorithm 1:

Sampling/Inference procedure

Figure 7

Some statistics of music pieces in our dataset. Each dot represents a piece. (a) Length versus number of parts (staffs). (b) Length versus number of note events.

Table 1

Effects of different encoding key functions and memory reader mechanisms on negative log likelihood (nll) and length generation accuracy (len acc) of measure generation.

key()nlllen accnll (aug.)len acc (aug.)
Reader: position‑dependent attention
identity13.4996.99%55.9437.25%
softmax12.1397.38%45.5337.24%
sigmoid14.3595.85%77.0235.33&
skewgauss12.1896.90%73.2037.25%
Reader: position‑dependent only
identity13.7097.97%69.3538.29%
softmax13.6096.65%89.2237.37%
sigmoid12.6297.55%56.8836.93%
skewgauss12.3395.48%37.8239.91%
nlllen accnll (aug.)len acc (aug.)
key(): softmax

−pos
12.13
12.17
97.38%
93.72%
45.53
57.99
37.24%
29.90%
key() skewgauss

−pos
12.33
13.14
95.48%
92.68%
37.82
54.10
39.91%
30.60%
No measure memory matrices

−pos
13.81
14.13
95.70%
91.49%
92.23
103.79
36.32%
33.21%
nlllen acc
key(): softmax

−pos
12.74
12.74
98.09%
92.97%
key(): skewgauss

−pos
12.51
12.34
97.77%
96.14%
No measure memory matrices

−pos
13.44
14.30
91.67%
78.67%
Figure 8

Subjective listening tests among different self‑reported expertise groups. (a) Overall ratings. (b) Overall ratings among those with the expertise “none/beginner”. (c) Overall ratings among those with the expertise “intermediate/experienced/expert”.

Figure 9

Subjective listening ratings along four aspects across all expertise groups. (a) Fluency. (b) Expressivity. (c) Novelty. (d) Organization.

Figure 10

One‑sided Mann–Whitney rank test for the pairwise comparison of ratings: p‑value indicates the probability for the hypothesis that the median of population A (row) is larger than the median of population B (column). (a) Overall. (b) Fluency. (c) Expressivity. (d) Novelty. (e) Organization.

Table 3

Ranking scores (and ranks) by the Bradley–Terry model on the Mann–Whitney U‑statistics.

ARAR MeasureCSD MeasureHuman
Overall0.210(4)0.218(3)0.254(2)0.318(1)
Fluency0.209(4)0.214(3)0.275(2)0.302(1)
Expressivity0.234(3)0.206(4)0.235(2)0.325(1)
Novelty0.246(3)0.238(4)0.267(1)0.249(2)
Organization0.203(4)0.231(3)0.251(2)0.316(1)
Figure 11

With the measure model, a beat sync error (left, m.18, second violin having an extra beat) can be recovered starting the next measure; without the measure model, this similar error (right, m.41, all parts having an extra 16th note) results in shifts in all subsequent note onset positions. (a) AR, with Measure Model. (b) AR, without Measure Model, output MIDI file imported by Sibelius.

DOI: https://doi.org/10.5334/tismir.163 | Journal eISSN: 2514-3298
Language: English
Submitted on: Mar 4, 2023
Accepted on: Aug 12, 2024
Published on: Nov 1, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Yujia Yan, Zhiyao Duan, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.