Interpretable signal analysis with knockoffs enhances classification of bacterial Raman spectra

Published in pre-print, 2020

Recommended citation: Chia, Sesia, Ho, Jeffrey, Dionne, Cand├Ęs, Howe (2020). "Interpretable signal analysis with knockoffs enhances classification of bacterial Raman spectra." pre-print at arXiv:2006.04937 . https://arxiv.org/abs/2006.04937

Abstract

Sophisticated machine learning models are widely applied to signal data because they can detect complex patterns and leverage them effectively to make predictions. However, such models tend to be difficult to interpret, which is particularly concerning for critical biomedical applications, such as the identification of bacterial infections from spectroscopic data. Feature extraction and selection can identify structures in the data that are both informative and non-redundant, leading to simpler and more easily understandable models, without necessarily sacrificing predictive accuracy. In this paper, we present a signal classification method that combines wavelet-based feature extraction with a knockoff filter to control the false discovery rate. We apply the method to Raman spectroscopy data in order to classify bacterial samples. We show that the features thus obtained allow an intuitive logistic regression model to achieve predictive accuracy comparable to that of less understandable alternative approaches.

Download paper here