Pulse Shape Analysis – INFN (VIP Project)
The work carried out within the VIP project focused on the development and evaluation of automated analysis techniques for waveforms acquired with Broad Energy Germanium (BEGe) detectors. The discrimination between valid (“good”) pulses and degraded or noisy (“bad”) events represents a crucial step to improve the quality of the resulting energy spectra and to optimize the overall signal-to-noise ratio.
The dataset analyzed in this study included measurements collected between 2021 and 2023. Two complementary approaches were explored. The first consisted of a feature-based method, involving the extraction of characteristic parameters from each waveform—such as rise and decay times, and other statistical descriptors—and the application of supervised machine learning algorithms, including Random Forest, Gradient Boosting, and K-Nearest Neighbors. These classical models provided interpretable and accurate baselines, demonstrating that waveform-shape parameters alone can achieve excellent classification performance.
The second approach applied machine learning directly to the raw waveforms, without any manual feature extraction. This strategy takes full advantage of the detector signals as acquired, eliminating preprocessing and saving considerable time—an important benefit when dealing with large datasets where feature engineering is labor-intensive.
Building on this framework, a deep neural pipeline was designed and implemented in TensorFlow/Keras, comprising three main components:
- Convolutional Denoising Autoencoder (CDAE) for signal denoising
- Feature Autoencoder (FAE) for latent feature extraction
- Gaussian Mixture Variational Autoencoder (GMVAE) for semi-supervised classification
The GMVAE model was trained using a composite loss function that combined reconstruction error, Kullback–Leibler divergence, a supervised cross-entropy term, and a triplet loss component to enhance cluster separation in the latent space. Training was monitored via TensorBoard, tracking loss, accuracy, and AUC throughout the process.
This architecture achieved up to 98% classification accuracy on both labeled and unlabeled events, confirming its ability to learn meaningful waveform representations even when only limited labeled data were available. The semi-supervised learning capability of the model significantly reduces the need for manual labeling, which is typically one of the most time-consuming tasks in detector data analysis.
Overall, the results demonstrated that both approaches effectively discriminate valid from degraded pulses, each offering distinct advantages. The feature-based models provide fast, interpretable, and reliable baselines that are useful for comparison with standard experimental procedures. Conversely, the deep learning pipeline achieves comparable or superior accuracy while drastically reducing dependence on labeled data, paving the way for scalable and automated waveform analysis in future detector systems.
Download the Slides with the results obtained.