reGenotyper: Detecting mislabeled samples, recovering the optimal...

Description Usage Arguments Value Author(s) References See Also Examples

Description

Main function to detect mislabeled samples using perturbation strategy

Usage

1
2
3
reGenotyper(phenotype, genotype, fileName = "test", thres = 0.9, optGT = TRUE,
            optGTplot = FALSE,optGT.thres = 0, permu = FALSE, n.permu = 10, 
            wls.score.permu = NULL, process = TRUE, t.thres = 1.5, GT.ref=NULL)

Arguments

phenotype

phenotype data: a nTrait-by-nSample matrix

genotype

genotype data: a nMarker-by-nSample matrix with two allels being 0 and 1 (or A and B) or three allels being 0, 0.5 and 1 (or, A, H, and B), where 0.5 (or H) represents heterozygous allele.

fileName

output file name. If NULL (default) it produces files starting with "test"

thres

probability threshold to decide if a sample is mislabled based on permutation result (Default=0.9).

optGT

recovered optimal genotype from the given phenotype

optGTplot

If TRUE it produces a plot of the genotype with two colors: green and red color indicate the original genotype of a sample (column) at certain marker (row) is correct or correct, respectively.

optGT.thres

threshold to decide if thr original genotype is correct

permu

If TRUE permutation is performed to estimate the likelihood of each sample being mislabled.

n.permu

The number of permutation to be performed. n.permu=1000 is usually recommended for a reliable estimate but it can take long time.

wls.score.permu

A vector with element being WLS score from permutation which can be obtained using function permutation: e.g. wls.score.permu <- permutation(phenotype,genotype,n.permu=1000,process=TRUE,fileName="test",t.thres=3)

process

If TRUE, it prints which step has been finished. Default = TRUE.

t.thres

threshold for deciding significant QTLs (t.test) that will be used to detecting mislabled samples

GT.ref

reference gentoype data from a large collection of strains. This is used to search for best mached gentoype for identified mislabeled samples. Default= NULL. If GT.ref is NULL, the orginal input genotype data willl beused to seach for best matched genotype for identified mislabeled samples.

Value

An object of class wls. A list with elements:

wls.score

a vector with length being the number of samples; each element gives the score for the sample being mislabeled

wls.names

the names of sample that being detected as mislabeled using the Z score method

gt.opt

recovered the optimal genotype based on the given phenotype data

wls.pValue

p value for each sample using permutation, only when permu=TRUE

wls.score.permu

a vector with the length of n.permu. Each element represents the score of a randomly selected sample with permutated genotype, only when permu=TRUE.

thres

threshold used probability threshold to decide if a sample is mislabled based on permutation result

Author(s)

Yang Li <yang.li@rug.nl>

References

Li Y. et al, reGenotyper: detecting mislabeled samples in genetic data (submitted)

See Also

optimalGT, permutation,
tMatFunction,genotype,
phenotype

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  library(reGenotyper)
  #load example genotype and phenotype data
  data(genotype)
  data(phenotype)
  ### For this test dataset 5 permutations is enough. In real case at least few hundreds 
  ### of permutations are needed.
  wlsObject <- reGenotyper(phenotype, genotype, fileName = "test", thres = 0.9, optGT = TRUE, 
  optGTplot = FALSE,   optGT.thres = 0,  permu = TRUE, n.permu = 5, wls.score.permu = NULL, 
  process = TRUE, t.thres = 1.5, GT.ref=NULL)
  ###Inspecting the output
  wlsObject
  plot(wlsObject)
  ### previous line takes around 30s to execute, you can also load the result:
  data(wlsObject)

reGenotyper documentation built on May 1, 2019, 11:08 p.m.