BACKGROUND
Abdominal auscultation, i.e. listening to Bowel Sounds (BS), can be used to analyse digestion. An automated retrieval of BS would be beneficial to assess gastro-intestinal disorders non-invasively.
OBJECTIVE
To develop a multi-scale spotting model to detect BS in continuous audio data from a wearable monitoring system.
METHODS
We designed a spotting model based on Efficient-U-Net (EffUNet) architecture to analyse 10-second audio segments at a time and spot BS with a temporal resolution of 25 ms. Evaluation data was collected across different digestive phases from 18 healthy participants and 9 patients with Inflammatory Bowel Disease (IBD). Audio data were recorded in a daytime setting with a T-Shirt that embeds digital microphones. The dataset was annotated by independent raters with substantial agreement (Cohen’s κ between 0.70 and 0.75), resulting in 136 h of labelled data. In total, 11482 BS were analysed, with BS duration ranging between 18 ms and 6.3 s. The share of BS in the dataset (BS ratio) was 0.89%. We analysed performance depending on noise level, BS duration, and BS event rate, as well as report spotting timing errors.
RESULTS
Leave-One-Participant-Out cross-validation of BS event spotting yielded a median F1 score of 0.73 for both, healthy volunteers and patients. EffUNet detected BS in different noise conditions with 0.73 recall and 0.72 precision. In particular, for SNR > 4 dB, more than 83% of BS were recognised, with precision ≥ 0.77. EffUNet recall dropped below 0.60 for BS duration ≥ 1.5 s. At BS ratio > 5%, our model precision was > 0.83. For both healthy participants and patients, insertion and deletion timing errors were the largest, with a total of 15.54 min insertion errors and 13.08 min of deletion errors over the total audio dataset. On our dataset, EffUNet outperform existing BS spotting models that provide similar temporal resolution.
CONCLUSIONS
The EffUNet spotter is robust against background noise and can retrieve BS with varying duration. EffUNet outperforms previous BS detection approaches in unmodified audio data, containing highly sparse BS events.