SNPknock package reference

Module contents

class SNPknock.knockoffDMC

Bases: object

Class for knockoffs of a discrete Markov chain.

Parameters:
  • pInit – a numpy array of length K, containing the marginal distribution of the states for the first variable.
  • Q – a numpy array of size (p-1,K,K), containing a list of p-1 transition matrices between the K states of the Markov chain.
  • groups – a numpy array of length p, describing the group membership of each variable (default: [1,2,…,p]).
  • seed – an integer random seed (default: 123).
sample(self, X)

Sample a knockoff copy of each row of X.

Parameters:X – a numpy array of size (n,p), where n is the number of individuals and p is the number of variables, containing the original Markov chain variables. The entries of X must be integers ranging from 0 to K-1, where K is the number of possible states of the Markov chain.
Returns:a numpy array of size (n,p), containing a knockoff copy of X.
class SNPknock.knockoffHMM

Bases: object

Class for knockoffs of a hidden Markov model.

Parameters:
  • pInit – a numpy array of length K, containing the marginal distribution of the hidden states for the first variable.
  • Q – a numpy array of size (p-1,K,K), containing a list of p-1 transition matrices between the K latent states of the HMM.
  • pEmit – a numpy array of size (p,M,K), containing the emission probabilities for each of the M possible emission states, from each of the K hidden states and the p variables.
  • groups – a numpy array of length p, describing the group membership of each variable (default: [1,2,…,p]).
  • seed – an integer random seed (default: 123).
sample(self, X)

Samples a knockoff copy of each row of X.

Parameters:X – a numpy array of size (n,p), where n is the number of individuals and p is the number of variables, containing the original HMM variables The entries of X must be integers ranging from 0 to M-1, where M is the number of possible emission states of the HMM.
Returns:a numpy array of size (n,p), containing a knockoff copy of X.
class SNPknock.knockoffGenotypes

Bases: object

Class for knockoffs of a hidden Markov model.

Parameters:
  • pInit – a numpy array of length K, containing the marginal distribution of the hidden states for the first variable.
  • Q – a numpy array of size (p-1,K,K), containing a list of p-1 transition matrices between the K latent states of the Genotypes.
  • pEmit – a numpy array of size (p,M,K), containing the emission probabilities for each of the M possible emission states, from each of the K hidden states and the p variables.
  • groups – a numpy array of length p, describing the group membership of each variable (default: [1,2,…,p]).
  • seed – an integer random seed (default: 123).
sample(self, X)

Samples a knockoff copy of each row of X.

Parameters:X – a numpy array of size (n,p), where n is the number of individuals and p is the number of variables, containing the original Genotypes variables The entries of X must be integers ranging from 0 to M-1, where M is the number of possible emission states of the Genotypes.
Returns:a numpy array of size (n,p), containing a knockoff copy of X.
class SNPknock.knockoffHaplotypes

Bases: object

Class for knockoffs of a hidden Markov model.

Parameters:
  • pInit – a numpy array of length K, containing the marginal distribution of the hidden states for the first variable.
  • Q – a numpy array of size (p-1,K,K), containing a list of p-1 transition matrices between the K latent states of the Haplotypes.
  • pEmit – a numpy array of size (p,M,K), containing the emission probabilities for each of the M possible emission states, from each of the K hidden states and the p variables.
  • groups – a numpy array of length p, describing the group membership of each variable (default: [1,2,…,p]).
  • seed – an integer random seed (default: 123).
sample(self, X)

Samples a knockoff copy of each row of X.

Parameters:X – a numpy array of size (n,p), where n is the number of individuals and p is the number of variables, containing the original Haplotypes variables The entries of X must be integers ranging from 0 to M-1, where M is the number of possible emission states of the Haplotypes.
Returns:a numpy array of size (n,p), containing a knockoff copy of X.

SNPknock.fastphase module

SNPknock.fastphase.check_writable(file_path)
SNPknock.fastphase.loadHMM(r_file, alpha_file, theta_file, char_file, compact=True, phased=False)

Load the parameter estimates obtained by fastPhase and assembles the HMM model for the genotype data. For more information about fastPhase format see: http://scheet.org/software.html

Parameters:
  • r_file – a string with the path of the “_rhat.txt” file produced by fastPhase.
  • alpha_file – a string with the path of the “_alphahat.txt” file produced by fastPhase.
  • theta_file – a string with the path of the “_thetahat.txt” file produced by fastPhase.
  • char_file – a string with the path of the “_origchars” file produced by fastPhase.
  • compact – whether to return a compact representation of the HMM (r,alpha,theta). (default: True).
  • phased – whether the non-compact representation of the HMM should describe phased haplotypes (default: False).
Returns:

a dictionary {‘Q’,’pInit’,’pEmit’} where:

  • Q is a numpy array of size (p-1,K,K), containing a list of p-1 transition matrices between the K latent states of the HMM.
  • pInit is a numpy array of length K, containing the marginal distribution of the hidden states for the first SNP.
  • pEmit is a numpy array of size (p,K,3), containing the emission probabilities of the hidden states for each of the p SNPs.
SNPknock.fastphase.runFastPhase(X_file, out_path, fastphase='fastphase', phased=False, K=12, numit=25, seed=1)

This function calls fastPhase to fit an HMM to the genotype data. FastPhase will fit the HMM to the genotype data and write the corresponding parameter estimates in three separate files named: * out_path + “_rhat.txt” * out_path + “_alphahat.txt” * out_path + “_thetahat.txt”

The HMM for the genotype data can then be loaded from this files using SNPknock.fastphase.loadFit().

Parameters:
  • X_file – a string with the path of the genotype input file containing X in fastPhase format (as created by SNPknock.fastphase.writeXtoInp()).
  • out_path – a string with the path of the directory in which the parameter estimates will be saved.
  • fastphase – a string with the path to the directory with the fastPhase executable (default: “fastphase”).
  • phased – whether the input describes phased haplotypes (default: False).
  • K – the number of hidden states for each haplotype sequence (default: 12).
  • numit – the number of EM iterations (default: 25).
  • seed – the random seed for the EM algorithm (default: 1).
SNPknock.fastphase.writeXtoInp(X, out_file, phased=False)

Convert the genotype data matrix X (consisting of 0,1 and 2’s) into the fastPhase input format and saves it to a text file. This script assumes that there are no missing values in X. For more information about the fastPhase format see: http://scheet.org/software.html

Parameters:
  • X – a numpy array of size (n,p), where n is the number of individuals and p is the number of SNPs. If X is phased, the array is assumed to have size (2n,p).
  • out_file – a string containing the path of the output file onto which X will be written.
  • phased – whether the input describes phased haplotypes (default: False).

SNPknock.models module

class SNPknock.models.DMC(pInit, Q)

Discrete Markov chain model.

Parameters:
  • pInit – a numpy array of length K, containing the marginal distribution of the first variable in the chain.
  • Q – a numpy array of size (p,K,K), containing a list of p-1 transition matrices between the K states of the Markov chain.
sample(n=1)

Sample the observations from their marginal distribution.

Parameters:n – the number of observations to be sampled (default: 1).
Returns:a numpy matrix of size (n,p).
class SNPknock.models.HMM(pInit, Q, pEmit)
Hidden Markov model with a discrete emission distribution.
Parameters:
  • pInit – a numpy array of length K, containing the marginal distribution of the hidden states for the first variable.
  • Q – a numpy array of size (p,K,K), containing a list of p-1 transition matrices between the K latent states of the hidden Markov model.
  • pEmit – a numpy array of size (p,M,K), containing the emission probabilities for each of the M possible emission states, from each of the K hidden states and the p variables.
sample(n=1)

Sample the n observations of the hidden Markov model.

Parameters:n – the number of observations to be sampled (default: 1).
Returns:a numpy matrix of size (n,p).