View source: R/func__projectSamples.R
projectSamples | R Documentation |
First, this function performs a singular-value decomposition of the core-genome SNP (cgSNP) matrix S through an eigen-decomposition of the relatedness matrix K = S * t(S) / L, where L is the number of cgSNPs. It uses GEMMA to carry out the eigen-decompostion. It returns projections of bacterial isolates on eigenvectors (also known as the right-singular vectors of S) of V = t(S) * S. One may use it before the function testProjections of this package.
It is mathematically equivalent to perform a singular-value decomposition of S through commands svd(S) or svd(scale(G, center = TRUE)) to calculate projections of samples on singular vectors of positive singular values (equivalently, eigenvalues). However, Phylix uses GEMMA to perform this procedure because it is much faster than R, and the input matrix K is exactly the relatedness matrix used by GEMMA to fit LMMs.
We cannot specify the output directory for GEMMA yet. It stores every output under a directory named "output". This directory appears under the current working directory of R. In addition, it is recommended to use a different prefix for outputs of this calculation to that is used for calculating the relatedness matrix. Otherwise, GEMMA will overwrite the previous log file.
Output files: output/[prefix].eigenU.txt, output/[prefix].eigenD.txt and [prefix].log.txt under the current working directory of R.
projectSamples( K, G, Y, L, samples, prefix = "S", get.dists = TRUE, dist.method = "euclidean", get.tree = TRUE, gemma.path )
K |
a string specifying the path to a file of the relatedness matrix produced using GEMMA |
G |
path to the uncentred SNP genotype file G, such as [prefix]__G.txt, which has been used by GEMMA to calculate the K matrix. Here, GEMMA uses it again to match samples between K, G and Y (see below), although mathematically the eigen-decomposition does not need G and Y when samples are already matched, which is the case when using Phylix. |
Y |
path to the "phenotype" file, such as [prefix]__Y.txt under output/gene in Phylix's output GEMMA requires this file to extract the same samples from the K matrix, although samples from K and Y must match in a correct input data set for Phylix. Phylix keeps samples and their order the same with the row names of the SNP matrix S (centred) or G (uncentred). |
L |
number of core-genome SNPs, which equals the number of columns of SNP matrices G and S. |
samples |
row names of cgSNP matrices S or G. |
prefix |
output files are named [prefix]_eigenD.txt and [prefix]_eigenU.txt according to GEMMA's manual. |
get.dists |
a logical option specifying whether to calculate a distance matrix between projections of samples |
dist.method |
a character option specifying the method for calculating the sample distances. Valid values are the same as the base::dist function. |
get.tree |
a logical option specifying whether to return a midpoint-rooted neighbour-joining tree with the distance matrix. The distance matrix is calculated when this option is set, regardless of the get.dists option. |
gemma.path |
a path to an executable GEMMA. |
Yu Wan (wanyuac@126.com)
C <- projectSamples(K = "output/Kp_K.cXX.txt", G = "output/snp/Kp__G.txt", Y = "output/gene/Kp__Y.txt", samples = rownames(G), L = ncol(G), prefix = "Kp_SVD", gemma.path = "~/apps/gemma")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.