Gene hunting with knockoffs for hidden Markov models

Published in Biometrika, 2019

Abstract

Modern scientific studies often require the identification of a subset of relevant explanatory variables,in the attempt to understand an interesting phenomenon. Several statistical methods have been developed to automate this task, but only recently has the framework of model-free knockoffs proposed a generalsolution that can perform variable selection under rigorous type-I error control, without relying onstrong modeling assumptions. In this paper, we extend the methodology of model-free knockoffs to arich family of problems where the distribution of the covariates can be described by a hidden Markovmodel (HMM). We develop an exact and efficient algorithm to sample knockoff copies of an HMM. Wethen argue that combined with the knockoffs selective framework, they provide a natural and powerful tool for performing principled inference in genome-wide association studies with guaranteed FDR control. Finally, we apply our methodology to several datasets aimed at studying the Crohn’s disease and several continuous phenotypes, e.g. levels of cholesterol.

Download paper here