Skip to main content
Have a personal or library account? Click to login

Figures & Tables

Table 1

Overview of libraries for audio analysis and MIR on web clients compared to Essentia.js, including libraries written purely in JS or cross-compiled for Wasm, in terms of their target applications and the number algorithms suitable for MIR out of the box. *Csound and Faust are very extensive programming languages for audio DSP which require cross-compilation.

NameImplementationMIR algorithmsApplicationsLast updated
CsoundEmscripten (Lazzarini et al., 2014)asm.js4*processing, synthesis2021
Meyda (Fiala et al., 2015)plain JS∼20analysis2021
JS-xtract (Jillings et al., 2016)plain JS∼70analysis2021
Piper (Thompson et al., 2017)Wasm∼20analysis, processing2018
Faust (Letz et al., 2017)Wasm*processing, synthesis2021
lfo (Matuszewski and Schnell, 2017)plain JS∼15analysis, processing2017
MMLL (Collins and Knotts, 2019)plain JS∼15analysis2020
Essentia.jsWasm∼200analysis, processing, synthesis2021
Figure 1

Overview of the Essentia.js library in terms of its abstraction levels.

Listing 1

A simple example of offline audio feature extraction using Essentia.js via ES6 style imports.

Table 2

Transfer learning classifiers.

TaskClasses
genredortmundalternative, blues, electronic, folkcountry, funksoulrnb, jazz, pop, raphiphop, rock
gtzanblues, classic, country, disco, hip hop, jazz, metal, pop, reggae, rock
rosamericaclassic, dance, hip hop, jazz, pop, rhythm and blues, rock, speech
moodacousticacoustic, non acoustic
aggressiveaggressive, non aggressive
electronicelectronic, non electronic
happyhappy, non happy
partyparty, non party
relaxedrelaxed, non relaxed
sadsad, non sad
misc.danceabilitydanceable, non danceable
voice/instrum.voice, instrumental
gendermale, female
tonal/atonalatonal, tonal
urbansound8kair conditioner, car horn, children playing, dog bark, drilling, engine idling, gun shot, jackhammer, siren, street music
fs-loop-dsbass, chords, fx, melody, percussion
Table 3

The Essentia models. RF: Receptive field, AT: Auto-tagging, TL: Transfer learning.

ModelRF (s)Params.Size (MB)Purpose
MusiCNN3787K3.1AT/TL
VGG3605K2.4AT/TL
VGGish162M276TL
TempoCNN12[27K–1.2M][0.1–4.7]Tempo
Figure 2

Activations for the MSD and MTT auto-tagging taxonomies and all the transfer learning classifiers for Bohemian Rhapsody by Queen.

Listing 2

Example of offline audio feature extraction for the MusiCNN-based models using essentia.js-model via ES6 style imports.

Listing 3

Example of inference of MusiCNN-based models from the feature input computed in Listing 2 using essentia.js-model via ES6 style imports.

Figure 3

Essentia.js demo applications: (a) real-time mel-spectrogram (top-left), pitch estimation (top-right), HPCP (bottom-left) and music auto-tagging (bottom-right), (b) five Essentia.js transfer learning models for mood classification, (c) industrial application by SonoSuite for audio problem detection.

Table 4

Platform versions for each device used in the JavaScript benchmarks.

DeviceChromeFirefoxNode.js
Linux89.0.4389.114 (64-bit)87.0 (64-bit)14.15.1
macOS89.0.4389.114 (64 bit)87.0 (64-bit)14.13.0
Android92.04484.6Nightly 210421
iOS87.0.4280.7733.0
Figure 4

Mean execution time (in seconds) for common audio features on a 30-second music track. If the standard deviation is smaller than 0.005 it is not printed. The algorithms marked with * in (a) are only available in Essentia.js.

Figure 5

Mean execution time (in seconds) for Essentia.js model algorithms on a 30-second music track, comparing TensorFlow.js back ends (a) Wasm, and (b) WebGL. If standard deviation is smaller than 0.005 it is not printed. N.C. stands for not computed values, where the benchmark suite was unable to complete execution. (a) CPU acceleration (Wasm on browsers). (b) GPU acceleration with WebGL. Since not all Android and iOS devices support WebGL or have powerful enough GPUs, only browser benchmarks on Linux and MacOS are shown.

DOI: https://doi.org/10.5334/tismir.111 | Journal eISSN: 2514-3298
Language: English
Submitted on: Apr 24, 2021
Accepted on: Sep 2, 2021
Published on: Nov 22, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Albin Correya, Jorge Marcos-Fernández, Luis Joglar-Ongay, Pablo Alonso-Jiménez, Xavier Serra, Dmitry Bogdanov, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.