Gene hunting with knockoffs for hidden Markov models
Published in Biometrika, 2019
Abstract
Modern scientific studies often require the identification of a subset of relevant explanatory variables,in the attempt to understand an interesting phenomenon. Several statistical methods have been developed to automate this task, but only recently has the framework of model-free knockoffs proposed a generalsolution that can perform variable selection under rigorous type-I error control, without relying onstrong modeling assumptions. In this paper, we extend the methodology of model-free knockoffs to arich family of problems where the distribution of the covariates can be described by a hidden Markovmodel (HMM). We develop an exact and efficient algorithm to sample knockoff copies of an HMM. Wethen argue that combined with the knockoffs selective framework, they provide a natural and powerful tool for performing principled inference in genome-wide association studies with guaranteed FDR control. Finally, we apply our methodology to several datasets aimed at studying the Crohn’s disease and several continuous phenotypes, e.g. levels of cholesterol.