Skip to main content
Have a personal or library account? Click to login
Draw and Listen! A Sketch-Based System for Music Inpainting Cover
Open Access
|Nov 2022

Figures & Tables

Figure 1

Top: A triplet of music measures. The first and third are the context measures, and the middle one is to be “inpainted”. Middle: The user input area consisting of a canvas for drawing the curves and the sliders to control the pitch offset and note density for the whole measure. Bottom: The generated result.

Figure 2

Overall structure of the proposed VAE-based model. Multiple encoders are used to encode user’s input and context measures, and multiple decoders are used to achieve the desired disentanglement of latent variables and to generate the missing measure.

Figure 3

An example of 4 different input curves using the same context measures. The rhythm slider was set to 2 (‘high’) for the first, and 1 (‘medium’) for the last three. Note how the pitch contour and rhythm of the generated melodies follow the hand-drawn pitch curves (green) and note density curves (blue).

Figure 4

An example of one hand-drawn pitch curve working with three different context measures to generate different but context matching melodies. The rhythm slider was set to 0 (‘low’) for the first, and 2 (‘high’) for the last two.

Table 1

Objective evaluation of measures generated by the three comparison methods (proposed, rule-based, and GA) taking a simulated user input that is derived from a random measure. As controls, evaluation of the original “missing” measure Mcur and the measure from which the user input is derived Mcur is also provided. Please refer to the main text for explanations of the metrics.

METRICPROPOSEDRULE-BASEDGAMCUR (“MISSING” MEASURE)MCUR’ (FROM WHICH USER INPUT IS DERIVED)
Pitch curve DTW cost6.1 ± 5.23.6 ± 2.413.3 ± 10.654.6 ± 38.3
Pitch slider match rate98%100%93%
Rhythm curve DTW cost8.4 ± 13.123.1 ± 41.141.0 ± 52.553.8 ± 61.3
Rhythm slider match rate97%92%90%
Context match86%100%80%87%45%
Figure 5

The web application developed for user studies. Detailed instructions are shown on the bottom left of the page throughout the duration of the experiment. On the top left is the canvas where users draw the curves. On the right side, three identical rows of context measures, generated results and control units are displayed corresponding to the three comparison methods (with a random order for each trial).

Figure 6

Boxplots of the answers (ratings on a 1–5 scale) to the six subjective evaluation questions from all of the 23 participants (the higher the better). Each box for Q1–Q3 contains 112 points, while for Q4–Q6 contains 23 points. The notch in each plot represents the 95% confidence interval around the median. Outliers are shown as circles.

Figure 7

Two examples obtained during the user studies. The first line is the output of the proposed VAE model and the next two contain the baselines’ ouputs.

DOI: https://doi.org/10.5334/tismir.128 | Journal eISSN: 2514-3298
Language: English
Submitted on: Dec 22, 2022
Accepted on: Aug 4, 2022
Published on: Nov 2, 2022
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2022 Christodoulos Benetatos, Zhiyao Duan, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.