
Figure 1
Frequency of BioContainers executables by count as of 04/2023. As an example, a count of “1” with a high frequency over 10,000 indicates that there are over 10,000 unique commands that appear in only one container. Manual inspection reveals that we start to see shared executables approximately after a count of 1000, and thus it serves as a good threshold for unique or “special” container commands. A Jupyter notebook was used to generate the plot [22].

Figure 2
Movement of a new BioContainers entry from original repository through being available as a module via the shpc software. The BioContainers repository (A) provides an updated listing of containers from a web-accessible address. Three times a week, the container-executable-discovery action [19] (B) is run alongside this shpc-registry-cache [15] repository to discover new executables, derive their counts, and populate the cache (C). This step uses pipelib [12] to parse and sort container tags to derive newer ones, and guts [13, 14] to extract executables on a container path. The shpc-registry [10], the remote registry with container YAML files, can then run an action provided directly by shpc to use the cache to generate new container recipes to install (D). Existing recipes in the remote registry are updated in increments each day of the month to discover new tags (E) using an action to assign entries to days of the month [25] and the shpc software “update” command [7]. On the command line, a user that has installed shpc can then request a module to be installed from the registry. This installation pulls a container from a container registry (G) and installs to the system module software (H) where it can be loaded by a user, exposing the executables discovered in (B) for easy interaction (H).
