Electronic Laboratory Notebook: An Adaptable Solution

Simon Schubotz; Moritz Schubotz; Günter K. Auernhammer

doi:10.5334/jors.391

(1) Overview

Introduction

Good research data management should primarily allow the researcher to navigate data easily. Additionally, it is an essential prerequisite to archiving and later potential reuse of the data. Large-scale experiments like Cern [1] and KaTriN [4] have significant capabilities in storing and processing data in a streamlined way. However, for small-scale experiments, these capabilities are limited. Many scientists still rely on handwritten notes. For analyses, notes may need to be manually linked to the data (Figure 1).

When performing multiple experiments on multiple samples, much data is produced. The data have to be structured, analyzed, and potentially linked with notes or other data. In our approach, we aim to streamline this process.

While several Electronic Laboratory Notebook (ELN) solutions exist—often called “lab books” or “lab information systems”—they tend to be highly specialized and difficult to customize [2, 3, 12]. Many are built on proprietary or non-standard platforms, which make user-driven modification difficult. By directly linking lab books to experimental metadata, we may be able to eliminate most manual documentation steps—especially for setups lacking dedicated data-management tools. This approach is particularly valuable for experimental setups built in-house to meet specific scientific needs, as these often lack the integrated data management software found in some commercial instruments. Our open-source framework can serve as a lightweight ELN, potentially improving the user experience where commercial software is impractical and avoiding any licensing fees. A data crawler can be triggered on demand to scan the file system, extract metadata, and populate the ELN—keeping documentation up-to-date with minimal manual input.

The data crawler automatically parses each path segment and filename to extract core metadata such as experiment type, sample identifier, and timestamp—without requiring additional configuration. This process relies on a standardized folder hierarchy that the user must adopt to encode this information directly into the file paths, as detailed in the File Structure and Data Ingestion section. For experiment-specific metadata—such as stage positions, acquisition parameters, or environmental conditions—the software supports extensibility through user-defined Python functions tailored to each experiment type. Users can implement custom parsing logic in these functions, for example, using regular expressions or file header readers, to extract and populate additional metadata fields. This modular design allows support for new instruments or file formats by modifying only a small, isolated section of code (further details in Custom Metadata Extraction).

Typically, one file type contains the primary experimental results (e.g., microscope images, ellipsometry measurements, or video recordings), while other files contain complementary information such as pressure measurements, temperature logs, or timing signals. All files must be organized by the user using a consistent folder hierarchy and standardized naming conventions, as specified in the File Structure and Data Ingestion section. Within this structure, we distinguish between primary experiments—referred to as main experiments—and complementary experiments—referred to as sub experiments. Each of these main and sub-experiments is recorded as a database entry, which is assigned a globally unique, persistent identifier (ID), which enables linking between related entities across experiments as well as to later analyses or derived results.

During crawling, this configuration is applied: the crawler reads each file’s metadata, uses the configured mappings (e.g. using the time stamp) to detect related files by their IDs, and then automatically generates the corresponding database entries with the correct relationships in place. Once processed, data can be explored through a web interface, which supports filtering, sorting, and interactive visualization using Plotly [9]. To enable plotting, users can implement a data-specific Python function for each experiment type that loads the raw data into a pandas DataFrame [14, 8]. Out of the box, the ELN supports basic plotting functionality through a generic interface for assigning DataFrame columns to axes, while more advanced visualizations can be implemented by the user (see Interactive Data Exploration and Visualization). For advanced scripting, the ELN exposes the same objects inside an integrated Jupyter notebook environment [15] (see Jupyter Integration). This allows users to run complex analysis scripts on specific subsets of the data they have defined, with the results optionally fed back into the ELN.

Currently, our research produces several thousand primary data entries per year. Manually cataloguing all these entries in a database would be prohibitively time-consuming. In this work, we investigate polymer brushes with varying chain lengths and other compositional parameters. Each experimental treatment—such as solvent exchange, temperature cycling, or mechanical loading—can alter key properties, including advancing and receding contact angles [11]. Interpreting these changes, therefore, requires a complete history of every sample: which procedures were applied, in what order, and when. With the ELN software, users can filter data by sample, experiment type, date, or custom observations, providing a searchable record of each sample’s provenance and supporting systematic analyses.

Based on these practical needs, we developed ELN (see Figure 2). For this, we derived the following functional requirements:

Scalable Ingestion: Hundreds or thousands of files must be able to be imported automatically, provided they follow the prescribed folder hierarchy.
Automatic Metadata: Metadata should be automatically extracted from the folder structure, filename pattern, or the file contents, without requiring manual input.
Configurable Relationships: Files that belong together—designated in the Django model as “main” or “sub” for a given experiment type—must be linked automatically in the database via their ID.
Transparent Auditing: Every crawl should yield a human-readable report summarising which files were added, which were linked, and any errors encountered.
Targeted Analysis: Users can script analyses in Jupyter (see Jupyter Integration) on selected data, with results fed back into the ELN.
Interactive Visualisation: If an experiment’s loader returns data in a tabular (pandas-compatible) format, a Plotly widget (see Figure 5) should enable users to interactively explore both the main data and sub data together.

A major part of the lab book database is generated from experiment files. Users can also link notes and analysis results directly to each experiment, enabling a complete documentation and interpretation pipeline.

The ELN software has already supported two peer-reviewed studies [10, 13], demonstrating its suitability for high-throughput surface-science workflows.

Implementation and Architecture

Our ELN system is implemented as a Django [5] web application consisting of five apps: Exp_Main, Exp_Sub, Analysis, Lab_Dash, and Lab_Misc. These components handle experimental data ingestion, metadata modeling, analysis output, the dashboard UI, and general-purpose metadata, respectively.

We use Django’s Object-Relational Mapping (ORM) functionality and a SQLite [6] backend to store all experiment entries and metadata. A base model defines shared fields—such as timestamp, sample identifier, and file path—while each specific experiment type extends this base model with additional fields (e.g., wavelength, exposure time). To adapt the ELN for new experiments, users must define the corresponding data models in the models.py file. This task requires basic Python programming skills. However, because the system uses Django’s Object-Relational Mapping (ORM), all database interactions are handled through Python code, and no knowledge of SQL is necessary. Primary experiments can be linked to sub-experiments via relational fields in the schema, and each record is tied to a sample entity. For project-level organization, we also support hierarchical experiment grouping.

File Structure and Data Ingestion

A standardized folder hierarchy encodes experiment metadata and enables automatic ingestion. Each data file must be located according to the following pattern:

<experiment>/

<date>/

<sample>/

<timestamp>_<datafile>

A concrete example from an OCA experiment is shown below:

01_Data/

01_Main_Exp/

01_OCA_35_XL/ <- experiment identifier

20210201/ <- date (YYYYMMDD)

Probe_BA_01/ <- sample name

171700_osz_wasser_laengest.png

From this path, the crawler extracts the experiment type (OCA), date (2021-02-01), sample (BA_01), and timestamp (17:17:00). Suppose files lack a timestamp in their names. In that case, users may consider using the give_file_times module in our Exp_Main/Generate.py script, which assigns a timestamp based on the file’s last modification time, a value provided by the operating system that behaves consistently across platforms, including Windows, macOS, and Linux. We decided not to always extract these timestamps automatically, with the assumption that users will want to manually record these timestamps based on the specific needs of each project. Files must fall within a user-configurable age range and match allowed file extensions; the allowed list is configurable in the web application. Entries are created or updated by navigating to Generate → Main → Generate entries in the interface.

For certain experiments that we have conducted with recorded videos, we chose a different approach, as the ELN software does not control the names used for files. Instead, we created the experimental entry in the database at the moment the video started and entered all relevant details directly into this entry. When we clicked Generate, a sort_video routine moved the footage into the standard date/time folder structure based on the recording timestamp. Our CreateAndUpdate hook then detected the newly sorted files and linked them to the experiment entry we created earlier, by comparing the timestamps.

Custom Metadata Extraction

After linking, each uploaded file is processed by the CreateAndUpdate hook to extract metadata unique to the specific experiment type. Device-specific add_files methods extracted information directly from filenames (e.g. _X12.3_Y45.6 → XPos_mm, YPos_mm; _3.5muL → Drop_Volume_muL; _1800UPM → Rotations_per_min) or from file headers (frame rate, temperature), computed derived quantities on the fly (for instance, linear velocity from RPM and radius), and wrote all values into dedicated Django fields (FloatField, IntegerField, CharField, etc.). Because the hook ras inside Django, it could leverage any Python library or query any model without modifying core code—adding support for new metadata requires only subclassing CreateAndUpdate and implementing a small parsing function. All extracted or derived values—stage coordinates, drop volume, rotation speed, etc.—were written back to the model and instantly became searchable via the UI’s filter bar.

Main and Sub Experiments

Related experiment files are classified as “main” or “sub” experiments via Django model relationships and folder configurations. Files are automatically linked based on timestamp and path structure. The linkage occurs via unique persistent identifiers generated for each entry. These identifiers also allow association with analysis outputs, notes, or other experiments.

Database Schema

All experiment records are stored in a normalized relational database to ensure consistency and reduce redundancy. The base model, ExpBase, defines the core fields shared across all experiment entries. Each specific experiment type extends this base model by adding its own custom fields relevant to its data and structure. Samples are represented by their own models, which are referenced via foreign keys. The schema also supports grouping experiments into hierarchies using tree-based models.

The web interface features a django-filter search bar that lists every metadata field—sample, date, experiment type, user-defined observations, and more—as selectable filters. When users select one or more criteria, the system dynamically constructs the corresponding ORM query and returns the matching records in a sortable, paginated table view (Figure 3). This design lets researchers narrow datasets in seconds without writing a single line of SQL.

Overview of all experiments stored in the lab book. Each row corresponds to one experiment record.

User Interface and Permissions

Each experiment is listed in a tabular overview that includes metadata from the base model. A detailed view shows linked sub-experiments, analysis results, and visualization tools (Figure 4). Entries can be edited or deleted manually through the interface. If changes are made through Django’s administrative menu, the corresponding metadata updates are visible in the admin panel.

Detail view of a selected experiment entry, including links to sub-experiments, analysis output, and visualizations.

Additionally, users who store their data in private GitHub repositories can protect their data using a Decentralized Trusted Timestamping (DTT) mechanism. This process is implemented through the OriginStamp API, which allows users to prove the existence of data at a specific point in time. A cryptographic one-way hash of the data is generated locally and then sent to the OriginStamp API. The API aggregates the hash with others and anchors them into the Bitcoin blockchain, ensuring immutable and verifiable timestamping. Because only the hash is published—rather than the full dataset—confidentiality is preserved. This approach allows researchers to establish priority or authorship while preventing premature access to the actual data before it is finalized, peer-reviewed, or legally cleared for publication. This method is particularly well-suited for users who are unable or unwilling to publish their datasets immediately due to legal, ethical, or competitive concerns. GitHub also supports signed commits, which verify authorship using public-private key cryptography. Additionally, OriginStamp can be integrated into GitHub workflows via Webhooks with minimal setup effort [7, 16].

For users that can not use GitHub’s webhook functionality (or the equivalent functionality in GitLabs), we recommend to combine the user authentication with the DTT mechanism and integrate the hash generation, signing and backup functionality into the lab book.

Visualization and Scripting

Interactive Data Exploration and Visualization

For each experiment type, a corresponding “dash” model can be defined. This model is instantiated and linked via a hook after parsing, and it stores visualization parameters. To render these data, every experiment class provides a data-loader function returning a pandas DataFrame. The web UI wraps this in a generic Plotly component: users pick which DataFrame column maps to the x- or y-axis, overlay sub data, and interactively zoom.

Figure 5 shows an example of this generic plot, displaying data from a spectroscopic ellipsometry experiment. The interface allows the user to dynamically map any column from the main experiment’s DataFrame to the plot axes. As shown in the figure’s dropdown menus, the user has selected the Thickness_Brush column for the primary Y-axis and the time_loc column for the X-axis to visualize how the sample’s thickness changes over time. This functionality provides instant insight into experimental runs without requiring any custom plotting code. The main experiment (ellipsometry measurement) has several sub-experiments connected to it, among them the mass flow controllers controlled over RS232 (MFR). In Figure 5, we display the data of the 209th and the 212th MFR experiments on the secondary Y-axis. When the data of this experiment is loaded into the database using the generate function, it looks in the header of the file to determine with which liquid the nitrogen gas is enriched. This is then saved in the MFR model in the dedicated gas field. Upon loading the data, we define a data name composed of the name of the sub-experiment, MFR_209, and the entry in the gas field of that experiment (e.g., Ethanol). The sccm and time_loc are then again column names in the table.

Example of a generic interactive plot displaying both the main experiment (SEL) data and the data from selected sub-experiments (MFR) within the same graph. Sub-experiments can be individually selected, and their data are temporally aligned based on their respective timestamps, enabling direct visual correlation.

While the generic plot is great for exploring data, it is not always convenient to select the columns to be displayed. This is why we added an option to create specific plots for specific experiments, making it more convenient to display the data without repeatedly selecting the columns. Building on this foundation, users can further enhance the interface by adding custom features beyond the built-in options like legend interaction and zooming. For example, we have developed a specialized feature to improve the interpretation of multidimensional data. This feature allows multiple plots to be displayed side by side, where manually highlighting data in one plot simultaneously highlights the same data in all other plots, even if different quantities are being plotted.

Jupyter Integration

A fully-featured Jupyter server is embedded in the same Python process as the Django web application, so every notebook inherits the complete project context—ORM models, data-loading helpers, and plotting utilities—without any additional configuration. This tight coupling lets researchers prototype scripts, visualise intermediate results, and edit metadata directly in the browser, all while avoiding raw SQL or manual file reloads. Crucially, notebooks execute against the live transactional database: any derived quantities they create can be written back as first-class objects and are immediately available to the entire system.

This feedback loop enables automated, script-driven enrichment of primary data. Such derived tables can then be queried across experiments to build higher-level meta-analyses, giving users an instant, holistic view of their research portfolio. In short, the integrated Jupyter environment transforms the lab book from a passive repository into an active, continuously improving knowledge base.

Quality Control

For quality control, we use the Django testing framework. Several unit tests are performed, along with an integration test. We use a GitHub action to ensure that the code can be run within Docker.

(2) Availability

Operating system

Windows 10/11

Runs with Docker, thus also on Linux-based systems and other operating systems that support Docker. A button in the web interface for launching Jupyter notebooks is functional on a native Windows installation but will not work within the isolated Docker environment. As a result, a native Windows installation offers increased functionality relative to a Docker installation.

Programming Language

Python 3.8.2

Dependencies

The software is built on Python 3.8.2 and the Django framework. All required Python packages are listed in the requirements.txt file in the root of the code repository, ensuring a reproducible installation. Key dependencies include Django, Plotly, pandas, and Jupyter.

Software Location

Archive

Name: Electronic-Laboratory-Notebook

Persistent identifier: https://archive.softwareheritage.org/swh:1:snp:f85cbf7c21ca96b43db60c160fa017a8daf3ff87

Licence: Apache License 2.0

Publisher: Softwareheritage

Date published: 15/03/2025

Code Repository

Name: GitHub

Persistent identifier: https://github.com/gipplab/Electronic-Laboratory-Notebook

Licence: Apache License 2.0

Date published: 15/03/2025

Language

English

(3) Reuse Potential

The ELN is particularly well-suited for research environments that generate numerous data files from various instruments, a common scenario in experimental sciences like physics, chemistry, and materials science. Its core strength lies in managing data where primary results (e.g., images, videos, spectra) must be linked with time-correlated auxiliary data (e.g., temperature logs, pressure readings, control signals). The primary requirement for automated ingestion is that the user organizes their data into the consistent, hierarchical folder structure described in this paper. Because the software is open-source, its extensibility is a key feature. Users with basic Python skills can adapt the data parsers to support new file formats or extract custom metadata from their specific instruments.

Support

Users seeking assistance are encouraged to report bugs, request features, or ask questions by opening an issue in the software’s GitHub repository. The “Issues” tab on the project’s repository page is the primary channel for support.

Competing Interests

The authors have no competing interests to declare.