MVP.FarmCPU: Perform GWAS using FarmCPU method

MVP.FarmCPUR Documentation

Perform GWAS using FarmCPU method

Description

Date build: Febuary 24, 2013 Last update: May 25, 2017 Requirement: Y, GD, and CV should have same taxa order. GD and GM should have the same order on SNPs

Usage

MVP.FarmCPU(
  phe,
  geno,
  map,
  CV = NULL,
  ind_idx = NULL,
  mrk_idx = NULL,
  P = NULL,
  method.sub = "reward",
  method.sub.final = "reward",
  method.bin = c("EMMA", "static", "FaST-LMM"),
  bin.size = c(5e+05, 5e+06, 5e+07),
  bin.selection = seq(10, 100, 10),
  memo = "MVP.FarmCPU",
  Prior = NULL,
  ncpus = 2,
  maxLoop = 10,
  maxLine = 5000,
  threshold.output = 0.01,
  converge = 1,
  iteration.output = FALSE,
  p.threshold = NA,
  QTN.threshold = 0.01,
  bound = NULL,
  verbose = TRUE
)

Arguments

phe

phenotype, n by t matrix, n is sample size, t is number of phenotypes

geno

genotype, either m by n or n by m is supportable, m is marker size, n is population size. This is Pure Genotype Data Matrix(GD). THERE IS NO COLUMN FOR TAXA.

map

SNP map information, m by 3 matrix, m is marker size, the three columns are SNP_ID, Chr, and Pos

CV

covariates, n by c matrix, n is sample size, c is number of covariates

ind_idx

the index of effective genotyped individuals

mrk_idx

the index of effective markers used in analysis

P

start p values for all SNPs

method.sub

method used in substitution process, five options: 'penalty', 'reward', 'mean', 'median', or 'onsite'

method.sub.final

method used in substitution process, five options: 'penalty', 'reward', 'mean', 'median', or 'onsite'

method.bin

method for selecting the most appropriate bins, three options: 'static', 'EMMA' or 'FaST-LMM'

bin.size

bin sizes for all iterations, a vector, the bin size is always from large to small

bin.selection

number of selected bins in each iteration, a vector

memo

a marker on output file name

Prior

prior information, four columns, which are SNP_ID, Chr, Pos, P-value

ncpus

number of threads used for parallele computation

maxLoop

maximum number of iterations

maxLine

the number of markers handled at a time, smaller value would reduce the memory cost

threshold.output

only the GWAS results with p-values lower than threshold.output will be output

converge

a number, 0 to 1, if selected pseudo QTNs in the last and the second last iterations have a certain probality (the probability is converge) of overlap, the loop will stop

iteration.output

whether to output results of all iterations

p.threshold

if all p values generated in the first iteration are bigger than p.threshold, FarmCPU stops

QTN.threshold

in second and later iterations, only SNPs with lower p-values than QTN.threshold have chances to be selected as pseudo QTNs

bound

maximum number of SNPs selected as pseudo QTNs in each iteration

verbose

whether to print detail.

Value

a m by 4 results matrix, m is marker size, the four columns are SNP_ID, Chr, Pos, and p-value

Author(s)

Xiaolei Liu and Zhiwu Zhang

Examples


phePath <- system.file("extdata", "07_other", "mvp.phe", package = "rMVP")
phenotype <- read.table(phePath, header=TRUE)
idx <- !is.na(phenotype[, 2])
phenotype <- phenotype[idx, ]
print(dim(phenotype))
genoPath <- system.file("extdata", "06_mvp-impute", "mvp.imp.geno.desc", package = "rMVP")
genotype <- attach.big.matrix(genoPath)
genotype <- deepcopy(genotype, rows=idx)
print(dim(genotype))
mapPath <- system.file("extdata", "06_mvp-impute", "mvp.imp.geno.map", package = "rMVP")
map <- read.table(mapPath , head = TRUE)

farmcpu <- MVP.FarmCPU(phe=phenotype,geno=genotype,map=map,maxLoop=2,method.bin="static")
str(farmcpu)



XiaoleiLiuBio/MVP documentation built on Jan. 3, 2025, 5:59 a.m.