(1) Overview
Introduction
The performance of water electrolysis cells is a major challenge, with a cell efficiency of about 70% [1]. The theoretical limit of water splitting is further lowered by various factors [2], including the adhesion of hydrogen and oxygen bubbles to the electrodes. The gas bubbles obstruct the catalytically active surface area [3]. This problem can be reduced by allowing the gas bubbles to adhere to the electrode only briefly and then quickly dissipating [4], for example by changing process conditions or applying surface structuring. Nevertheless, there are many efforts to analyze gas bubbles. Those information is also important for simulations and has implications for industrial device design [5]. In most cases, reported experiments are analyzed by hand with a reduced amount of data, e.g. with freely available software packages such as ImageJ [5], which only target a few parameters of bubble development. However, no suitable software is currently used to analyze such date or even videos.
To assess the effectiveness of electrode optimization through structuring, images of the gas bubbles are taken during electrolysis using a high-speed camera in a transparent test cell. Manual annotation of a single image by a human researcher takes approximately 30 minutes. However, when recording a water electrolysis process with a high-speed camera, several thousand images can be captured in just a few seconds. At 900 frames per second, for example, five seconds of electrolysis results in more than 4,000 images. Manual evaluation is not possible.
Consequently, while automated detection is necessary to handle such large volumes of data efficiently, it must also overcome specific challenges inherent in gas bubble detection: (i) Gas bubbles are transparent, making their appearance highly variable depending on the background. (ii) The bubbles are often very small, which makes them difficult to detect accurately against a cluttered background. (iii) Fluctuations in lighting conditions can affect the visibility of the bubbles, leading to inconsistent detection and (iv) the high density of bubbles often results in significant overlap, making it challenging to separate and accurately count each bubble. Addressing these challenges requires a robust detection system, which can handle variable appearances, small object sizes, lighting changes, and overlapping objects effectively.
B.E.D. has been developed to overcome this challenge, quantifying the number and size of bubbles in each frame and assigning each bubble an ID to track its detachment time, movement, growth, and merging over the course of the video. The software is based on a convolutional neural network (CNN) using the Darknet Backbone and YOLOv4 algorithm, trained on hand-annotated images of gas bubbles, in front of structured surfaces.
An additional large set of synthetic training data was generated from these images to further increase the training set size.
After detection in the single images, their ID algorithmically tracks the movement of the bubbles. This allows the software to detect the detachment of the bubbles from the surface, their upward movement, and even the merging of two bubbles. This allows for a deeper analysis of the impact of surface treatments on bubble-induced losses. While other software packages like BubbleNet (hanfengzhai.net/BubbleNet) are capable of predicting bubble movement, they are more designed for simulation and did not have demanding backgrounds like surface structures in mind.
Implementation and architecture
B.E.D. uses a YOLO4/Darknet based CNN trained on hand annotated frames of high-speed videos of gas bubble emergence. These videos were recorded with structured samples, leading to complex backgrounds.
Given the specific requirements of our project, a CNN-based detector is particularly well-suited because of its ability to process large volumes of data efficiently and to generalize across varying visual conditions. Traditional image processing techniques often fall short when dealing with such high-dimensional data due to their limited capacity to capture complex patterns and their reliance on manually engineered features [6].
We selected YOLOv4 (You Only Look Once, version 4) for our task due to its state-of-the-art performance in real-time object detection, particularly in scenarios where both speed and accuracy are critical. YOLOv4 is an advanced deep learning model that builds upon the strengths of its predecessors, YOLOv1, YOLOv2, and YOLOv3 [7, 8, 9], while introducing several enhancements that significantly improve its detection capabilities [10]. YOLOv4 employs the CSPDarknet53 as its backbone architecture, which is responsible for extracting high-level features from the input images. The backbone architecture is crucial because it forms the foundation upon which the detection heads operate. The CSPDarknet53 is an evolution of the Darknet53 architecture used in YOLOv3, with the addition of Cross-Stage Partial connections (CSP). These connections are designed to enhance the learning capability of the network while reducing computational overhead, making the model more efficient without compromising accuracy [10]. In terms of speed, YOLOv4 can process up to 65 frames per second on a Tesla V100 GPU. This speed is achieved through a combination of optimizations, including the use of the CSPDarknet53 backbone, Mish activation functions, and a Squeeze-and-Excitation (SE) module that recalibrates feature maps channel-wise. These enhancements allow YOLOv4 to maintain high processing speeds even when working with the large, high-resolution video frames typical of high-speed imaging. Additionally, YOLOv4 incorporates a range of techniques collectively known as “bag of freebies” and “bag of specials.” These techniques include data augmentation methods, such as Mosaic and Self-Adversarial Training, and architectural improvements like PANet for feature fusion and a new form of spatial attention [10]. These contribute to YOLOv4’s superior generalization capability, making it more robust when applied to varied and unseen datasets, a critical advantage when dealing with the diverse visual patterns presented by gas bubbles in high dynamic environments. At the time of developing our software, YOLOv4 offered several advantages over other contemporary object detection frameworks such as Faster R-CNN [11], SSD [12], and RetinaNet [13]. While Faster R-CNN is known for its high accuracy, it typically lags in speed, processing only a few frames per second, which makes it unsuitable for real-time applications like ours [11]. SSD and RetinaNet, although faster than Faster R-CNN, still do not match the speed-accuracy trade-off provided by YOLOv4. Moreover, YOLOv4’s design optimizes the balance between model complexity and detection performance. For instance, it requires fewer computational resources compared to its competitors, allowing it to be deployed on a wider range of hardware, including less powerful GPUs. This flexibility is crucial for scaling our detection system across different deployment environments. Another significant advantage of YOLOv4 is its superior performance in detecting smaller objects, which is directly relevant to our need to detect small gas bubbles. The integration of the PANet and SPP (Spatial Pyramid Pooling) module in YOLOv4 helps in capturing finer details across multiple scales, making it more adept at recognizing small objects compared to other detectors available at that time.
Runtime consists of the following steps:
Splitting of video file into frames (ffmpeg)
Create list of image files
Using the YOLO4/darknet CNN, bubbles are detected. Bounding box-coordinates are saved to a result file.
Each bubble is assigned a unique ID. The initial size is determined by the size of the bounding box and a pre-defined scaling factor.
Using the ID, bubble are tracked across multiple frames and the growth rate, detachment size and time, velocity and acceleration as well as merging events are analysed.
Quality control
The software outputs images including bounding boxes, as depicted in Figure 1. Each bounding box is assigned a probability that there is a bubble in it. Only bounding boxes with a probability above a certain threshold are processed. This threshold should be adjusted in order to achieve optimal results in each scenario. A too high threshold value leads to small or unclear bubbles not being detected, while below a certain threshold the detection of artefacts or light reflexes as bubbles becomes likely. With the images, a quick scan of the detection quality is possible for each individual run. As the amount of data in a single video is too much for hand annotation and the result depends on the chosen threshold value, a qualitative comparison can be made on a short test dataset and compared to hand annotated values (video of ten frames as example given in Gitlab).

Figure 1
Sample image of gas bubbles on rough surface after detection.
We evaluated the performance of our detector for identifying gas bubbles in images. To assess the model’s performance, we computed various metrics, including Accuracy, Precision, Recall, F1 Score, and the mean Intersection over Union (IoU). These metrics provide a comprehensive understanding of the model’s ability to correctly detect and localize gas bubbles.
The evaluation was performed using a set of annotated test images that included manually annotated ground truth labels for gas bubble locations. For each test image, the detector generated predictions in the form of bounding boxes and associated confidence scores. The following steps were taken to compute the metrics:
True Positives (TP) were counted when a predicted bounding box sufficiently overlapped with a ground truth bounding, determined by an IoU threshold of 0.5.
False Positives (FP) were instances where a predicted bounding box did not correspond to any ground truth object or was incorrectly labeled as a gas bubble when no bubble was present.
False Negatives (FN) were ground truth bubbles that were not detected by the model.
The mean IoU was calculated as the average IoU across all correctly detected bubbles, providing a measure of localization accuracy.
From these counts, we computed the following metrics:
Accuracy: TP/(TP + FP + FN)
Precision: TP/(TP + FP)
Recall: TP/(TP + FN)
F1 Score: 2 × (Precision × Recall)/(Precision + Recall)
The accuracy metric provides an overall measure of the model’s performance by considering true positives, false positives, and false negatives. Precision measures the proportion of positive identifications that were actually correct, focusing on the model’s ability to avoid false positives. This indicates how reliable the positive detections are. Recall measures the proportion of actual positives that were correctly identified, emphasizing the model’s ability to detect all true instances of gas bubbles and minimizing false negatives. The F1 Score, which is the harmonic mean of precision and recall, offers a single metric that balances both precision and recall. Finally, the mean Intersection over Union (meanIOU) measures the average overlap between predicted and ground truth bounding boxes, which is crucial for accurate localization.
The evaluation results for the gas bubble detection using our detector are as follows:
Accuracy: 0.76
Precision: 0.81
Recall: 0.93
F1 Score: 0.86
Mean IoU: 0.67
These results suggest that the model performs well overall in detecting gas bubbles. The high recall of 0.93 indicates that the model successfully detects the majority of gas bubbles present in the images, showing robustness in identifying these challenging objects. The precision of 0.81, although slightly lower, is still satisfactory, demonstrating that most of the detected bubbles are indeed gas bubbles and not false positives. The F1 Score of 0.86 reflects a good balance between precision and recall.
The mean IoU of 0.67 shows that the model’s predicted bounding boxes have a reasonable overlap with the ground truth annotations, which is important for precise localization. Given the challenging nature of detecting transparent and blurry gas bubbles, a mean IoU of 0.67 indicates that the model is effective at capturing the spatial extent of the gas bubbles, even though perfect localization remains difficult.
Figure 2 illustrates a qualitative result. The red circles represent manually annotated gas bubbles, while the yellow circles indicate the detected ones. It is evident that many of the actual gas bubbles were detected, and the detector accurately estimated their sizes. However, it can also be observed that as the gas bubbles become blurrier, there are more discrepancies in size estimation between predictions and annotations, as it is challenging for both the annotator and the detector to determine the exact edge of the gas bubble.

Figure 2
Left: Qualitative Result. The red circles show manually annotated gas bubbles. The yellow circles are the detections. Right: false positive detected gas bubbles highligthed in green bounding boxes.
One can see that many of the actual gas bubbles were detected. There are, however, some false positives in Figure 2, i.e. detections that were not actually gas bubbles (depicted separately in Figure 2 Right). While some of these are indeed not gas bubbles, others are questionable, and may have been missed by the annotator.
The false negatives (undetected bubbles) are primarily caused by very small bubbles, which are often difficult to distinguish from image distortion. It should be noted that bubbles of this size are not tracked over time. Temporal tracking is only initiated once they exceed a certain size and are reliably detected by the detector. In contrast, during backtracking, even very small bubbles are included in the parameter estimation because minimum size and detection thresholds are disregarded. When performance evaluation considers only those bubbles that can initiate a track, the recall value increases to 0.97, meaning that false negatives are reduced.
In summary, the detector demonstrates strong performance in identifying gas bubbles with high recall and reasonable precision. The F1 Score indicates a balanced performance, while the mean IoU reflects good localization accuracy despite the challenges posed by transparent and blurry bubbles.
For future work, an improvement to handle small gas bubbles more effectively would be to address the internal image scaling process. Currently, image data is significantly downscaled for processing by the detector. To better detect small gas bubbles, a promising extension would be to implement a sliding window approach or a similar method to evaluate image sections in greater detail. This could enhance the model’s ability to identify smaller bubbles and improve overall detection accuracy. In addition, the performance always depends on the physical setup of the gas bubble generation, such as the optics, the ambient medium of the gas bubbles, the lighting conditions, etc.
Output data
The tracking of bubbles yields result files with many different bubble metrics, depicted in Table 1.
Table 1
Output of B.E.D. in tracking_results.fthr.
| NR. | PARAMETER | UNIT | DESCRIPTION |
|---|---|---|---|
| 1 | Consecutive number | ||
| 2 | frame_id | Number of the processed image. The same number appears several times in the table, as several bubbles appear within each image | |
| 3 | conf | Confidence score. How confident the detector was in recognizing the bubble. Value between 0 and 1 (very certain). | |
| 4 | x | µm | X Position of the bubble center point |
| 5 | y | µm | Y Position of the bubble center point |
| 6 | w | µm | Width of Bubble |
| 7 | h | µm | Height of Bubble |
| 8 | maxWH | µm | Size of Bubble |
| 9 | id | Number of the bubble. Attention: This is not the tracking ID, but a consecutive number of all bubbles recognized in the video sequence, i.e. the same bubble has different IDs in different images. | |
| 10 | ts | TrackingState. Internal Parameter for Tracking | |
| 11 | pre | Predecessor (predecessor). Tracking attempts to assign the bubbles of an image t to the bubbles of a previous image t-1. Pre-specifies the number (ID) of the assigned bubble from the previous image for each bubble. –1 means that the bubble has no predecessor. | |
| 12 | tid | Track ID. This is the unique identification number assigned to each bubble by means of tracking. This means that the same bubble has the same tid number in different images. (This does not apply to the value of id). | |
| 13 | tl | Tracklength. Number of images in which the bubble could be tracked up to this point. | |
| 14 | xe | µm | Estimated position of the bubble in the next image (x) |
| 15 | ye | µm | Estimated position of the bubble in the next image (y) |
| 16 | growa | Abrupt growth with merges (merging of bubbles). Dimensionless. Growa is the ratio of bubble size (after the merge)/bubble size (before the merge) | |
| 17 | merge_id | If two bubbles are merged, merge_id specifies the number (ID) of the bubble with which the bubble is merged. | |
| 18 | grows | µm/ms | Continuous growth rate. Defined as change in size per unit of time: ds/dt. Changes in bubble size due to merges are compensated for, i.e. factored out. Note: If the bubble has only just been detected, dt (time difference since initial detection) is very small. Fluctuations in the determination of the bubble size can therefore lead to outliers for the value of grows. The longer the bubble can be observed, the more precise the value for grows becomes (see also parameter grows2). |
| 19 | vx | µm/ms | Velocity of the bubble in x direction. Defined as dx/dt. Attention: If the bubble has just been detected, dt (time difference since the first detection) is very small. Fluctuations in the determination of the bubble position can therefore lead to outliers for the value of vx. The longer the bubble can be observed, the more precise the value for vx becomes. |
| 20 | vy | µm/ms | Velocity of the bubble in x direction. Defined as dy/dt. See notes for vx |
| 21 | time | ms | Current time. Calculated as frame number (frame_id)/fps *1000 |
| 22 | v | µm/ms | Instantaneous velocity of the bubble. Calculated as sqrt(vx² + vy²) |
| 23 | vs | µm/ms | Smoothed instantaneous velocity. Smoothed using a moving average (window width N = 20). Smoothing robustly removes the outliers at the beginning of the bubble detection. |
| 24 | grows2 | µm/ms | Continuous growth rate (alternative calculation using regression). In order to compensate for the outliers in grows, a second calculation method for continuous growth was implemented. For this purpose, the bubble size is approximated linearly (regression line). grows2 is the increase in this line. Changes in size due to merges are compensated for in advance. For longer observation periods, the value of grows approaches that of grows2. It is recommended to use grows2 instead of grows as the value for the continuous growth rate. |
| 25 | size_corrected | µm | Size of the bubble, corrected for changes that occur due to bubble mergers. Basis for the calculation of grows2. |
| 26 | size_approx | µm | Linear approximation of the bubble size. size_approx = m * t + n The parameters of the linear equation are determined using least squares: m,n = argmin (size_approx – size_corrected)². The slope of the straight line is the continuous growth rate, grows2 = m |
(2) Availability
Operating system
Linux (Ubuntu)
Programming language
Python 3
Additional system requirements
Solid State Disk recommended, recent Office-PC (i5/Ryzen 5 or higher), optionally CUDA or Vulkan-compatible GPU.
Dependencies
- Ubuntu
- Python 3
- Necessary Python libraries are shown in Table 2.
- Git-repositories:
List of contributors
Lukas Lentz (Project lead, Testing, Coordination, Trainset generation), Dorian Hüne (Additional Programming, Revision), Sebastian Handrich, TVG – Technische Visualistik GmbH (Main Programming, revision), Christoph Niems (GUI, Porting), Thomas Gimpel (Head of group, funding acquisition, revision, coordination).
Software location
Archive
GitLab
Name
B.E.D – Bubble Emergence Detector
Persistent identifier
https://gitlab.tu-clausthal.de/bed-bubble-emergence-detector/bed
Licence
Apache 2.0
Publisher
TU Clausthal
Version published
1.0
Date published
05.01.2024
Language
English
(3) Reuse potential
Every scientist or engineer visually inspecting processes, in which the emergence of gas bubbles is of importance, can reuse the software. Application involve electrochemical approaches [5, 14] as well as bubble analysis in the food and beverage industry [15, 16] or elsewhere.
The program is specialized on the use of magnified high-speed-camera footage with complex background. Retraining of the basic bubble detection model is possible to adapt the software to specific use cases, without affecting the tracking and analysis capabilities.
The classificatory can be retrained on specialized trainsets without affecting the performance of the tracking algorithms. This might be helpful for different backrounds or may be out-of-round bubbles in challanging setups. To reduce the work needed for trainset generation, all tools for fabrication of synthetic training data are supplied.
Acknowledgements
We thank the BMWI/BMWK for funding the LOReley project. Thanks to the DFG for its support. We also thank Madita Lederle-Flamm and Somayeh Mansouri for their help in annotating the training set data.
Competing Interests
The authors have no competing interests to declare.
