lmm: Testing for associations between occurrence of bacterial...

View source: R/func__lmm.R

lmmR Documentation

Testing for associations between occurrence of bacterial genes/alleles with linear mixed model

Description

This is one of the two main functions of this package (the other main function is findPhysLink). It estimates parameters of univariate linear mixed models (LMMs) to test for associations between occurrence of bacterial genes or alleles when control for bacterial population structure. This function does not utilise allelic physical distances at all.

Dependency: packages data.table and parallel

Usage

lmm(
  snps = NULL,
  snps.delim = ",",
  pos.col = "Pos",
  ref.col = "Ref",
  min.mac = 1,
  genetic.pam = NULL,
  genetic.pam.delim = "\t",
  genes.excl = NULL,
  allelic.pam = NULL,
  allelic.pam.delim = "\t",
  mapping = NULL,
  min.count = 2,
  min.co = 2,
  ingroup = NULL,
  outliers = NULL,
  ref = NULL,
  tree = NULL,
  sample.dists = NULL,
  output.dir = "output",
  prefix = NULL,
  gemma.path = "gemma",
  n.cores = -1,
  save.stages = TRUE,
  skip = TRUE
)

Arguments

snps

Core-genome SNPs used for estimating the relatedness matrix.

snps.delim

(optional) Delimiters of fields in the SNP table (pam.delim and dist.delim are defined similary)

pos.col

(optional) An integer (column index) or a string (column name) specifying which column contains SNP positions

ref.col

(optional) A string specifying the column for SNPs of the reference genome.

min.mac

(optional) An integer specifying the minimal number of times required for the minor allele of every biallelic SNP to occur across all isolates. SNPs failed this criterion will be removed from this analysis.

genetic.pam

A presence-absence matrix of genes. It may be a compiled table from SRST2.

genetic.pam.delim

(optional) A delimiter character in the genetic PAM. Default: tab.

genes.excl

(optional) Genes to be excluded from PAMs. For example, genes.excl = c("AmpH_Bla", "OqxBgb_Flq", "OqxA_Flq", "SHV.OKP.LEN_Bla").

allelic.pam

A presence-absence matrix of alleles. The matrix may be a compiled table of SRST2 results.

allelic.pam.delim

(optional) A delimiter character in the allelic PAM. Default: tab.

mapping

(optional) A data frame mapping alleles to genes and patterns, etc, which equals the "mapping" element within findPhysLink's output list. This argument is only used when a user reruns a previous analysis.

min.count

(option) The minimum count of alleles/genes in the current data set to be included for analysis.

min.co

(optional) The minimum number of allelic co-occurrence events. Set it to zero to specify the tests even though the corresponding alleles never co-occur. However, this will increase the number of tests tremendously.

ingroup

(optional) A vector of characters for names of isolates to be analysed. Isolates may be sorted, such as according to the phylogeny. The function includes all isolates by default.

outliers

(optional) A vector of characters for isolate/strain names to be excluded from snps, pam

ref

(optional) A new name for the reference genome. The column name specified by ref.col in the SNP matrix will be replaced with this argument.

tree

(optional) A path to a tree file or a phylo object for a tree of all samples. The format of the tree file must be compartile to the read.tree function in the ape package.

sample.dists

(optional) A numeric matrix of distances between samples. The distances can be Euclidean distances between projections, phylogenetic tip distances or SNP distances (the number of SNPs between any two samples). A matrix of Euclidean distances will be computed for projections of samples if this option is left NULL.

output.dir

(optional) Path of the output directory.

prefix

(optional) For names of all output files

gemma.path

Path to GEMMA. No forward slash should be attached at the end of the path.

n.cores

Number of cores used to run GEMMA in parallel where possible. -1: automatically detect the number of available cores N, but use N - 1 cores (recommended) 0: automatically detect the number of available cores and use all of them. Be careful when the current R session is not running through SLURM. >= 1: use the number of cores as specified. n.cores is reset to the maximal number of available cores N when n.cores > N.

save.stages

(optional) Whether to turn on stage control or not. Recommend to turn it on when you are not sure whether the pipeline will be finished smoothly.

skip

(optional) Whether to avoid overwriting existing output files.

Author(s)

Yu Wan, wanyuac@126.com


wanyuac/GeneMates documentation built on Aug. 12, 2022, 7:37 a.m.