
Figure 1
Top: A triplet of music measures. The first and third are the context measures, and the middle one is to be “inpainted”. Middle: The user input area consisting of a canvas for drawing the curves and the sliders to control the pitch offset and note density for the whole measure. Bottom: The generated result.

Figure 2
Overall structure of the proposed VAE-based model. Multiple encoders are used to encode user’s input and context measures, and multiple decoders are used to achieve the desired disentanglement of latent variables and to generate the missing measure.

Figure 3
An example of 4 different input curves using the same context measures. The rhythm slider was set to 2 (‘high’) for the first, and 1 (‘medium’) for the last three. Note how the pitch contour and rhythm of the generated melodies follow the hand-drawn pitch curves (green) and note density curves (blue).

Figure 4
An example of one hand-drawn pitch curve working with three different context measures to generate different but context matching melodies. The rhythm slider was set to 0 (‘low’) for the first, and 2 (‘high’) for the last two.
Table 1
Objective evaluation of measures generated by the three comparison methods (proposed, rule-based, and GA) taking a simulated user input that is derived from a random measure. As controls, evaluation of the original “missing” measure Mcur and the measure from which the user input is derived is also provided. Please refer to the main text for explanations of the metrics.
| METRIC | PROPOSED | RULE-BASED | GA | MCUR (“MISSING” MEASURE) | MCUR’ (FROM WHICH USER INPUT IS DERIVED) |
|---|---|---|---|---|---|
| Pitch curve DTW cost | 6.1 ± 5.2 | 3.6 ± 2.4 | 13.3 ± 10.6 | 54.6 ± 38.3 | – |
| Pitch slider match rate | 98% | 100% | 93% | – | – |
| Rhythm curve DTW cost | 8.4 ± 13.1 | 23.1 ± 41.1 | 41.0 ± 52.5 | 53.8 ± 61.3 | – |
| Rhythm slider match rate | 97% | 92% | 90% | – | – |
| Context match | 86% | 100% | 80% | 87% | 45% |

Figure 5
The web application developed for user studies. Detailed instructions are shown on the bottom left of the page throughout the duration of the experiment. On the top left is the canvas where users draw the curves. On the right side, three identical rows of context measures, generated results and control units are displayed corresponding to the three comparison methods (with a random order for each trial).

Figure 6
Boxplots of the answers (ratings on a 1–5 scale) to the six subjective evaluation questions from all of the 23 participants (the higher the better). Each box for Q1–Q3 contains 112 points, while for Q4–Q6 contains 23 points. The notch in each plot represents the 95% confidence interval around the median. Outliers are shown as circles.

Figure 7
Two examples obtained during the user studies. The first line is the output of the proposed VAE model and the next two contain the baselines’ ouputs.
