Draw and Listen! A Sketch-Based System for Music Inpainting

Christodoulos Benetatos; Zhiyao Duan

doi:10.5334/tismir.128

Draw and Listen! A Sketch-Based System for Music Inpainting

Transactions of the International Society for Music Information Retrieval

Volume 5 (2022): Issue 1

By: Christodoulos Benetatos and Zhiyao Duan

Open Access

|Nov 2022

Figures & Tables

Top: A triplet of music measures. The first and third are the context measures, and the middle one is to be “inpainted”. Middle: The user input area consisting of a canvas for drawing the curves and the sliders to control the pitch offset and note density for the whole measure. Bottom: The generated result.

Overall structure of the proposed VAE-based model. Multiple encoders are used to encode user’s input and context measures, and multiple decoders are used to achieve the desired disentanglement of latent variables and to generate the missing measure.

An example of 4 different input curves using the same context measures. The rhythm slider was set to 2 (‘high’) for the first, and 1 (‘medium’) for the last three. Note how the pitch contour and rhythm of the generated melodies follow the hand-drawn pitch curves (green) and note density curves (blue).

An example of one hand-drawn pitch curve working with three different context measures to generate different but context matching melodies. The rhythm slider was set to 0 (‘low’) for the first, and 2 (‘high’) for the last two.

Table 1

Objective evaluation of measures generated by the three comparison methods (proposed, rule-based, and GA) taking a simulated user input that is derived from a random measure. As controls, evaluation of the original “missing” measure M_cur and the measure from which the user input is derived $M_{c u r}^{'}$ is also provided. Please refer to the main text for explanations of the metrics.

METRIC	PROPOSED	RULE-BASED	GA	M_CUR (“MISSING” MEASURE)	M_CUR’ (FROM WHICH USER INPUT IS DERIVED)
Pitch curve DTW cost	6.1 ± 5.2	3.6 ± 2.4	13.3 ± 10.6	54.6 ± 38.3	–
Pitch slider match rate	98%	100%	93%	–	–
Rhythm curve DTW cost	8.4 ± 13.1	23.1 ± 41.1	41.0 ± 52.5	53.8 ± 61.3	–
Rhythm slider match rate	97%	92%	90%	–	–
Context match	86%	100%	80%	87%	45%