philr: Data transformation and driver of PhILR.

View source: R/philr.R

philrR Documentation

Data transformation and driver of PhILR.

Description

This is the main function for building the phylogenetic ILR basis, calculating the weightings (of the parts and the ILR coordinates) and then transforming the data.

Usage

philr(
  x,
  tree = NULL,
  sbp = NULL,
  part.weights = "uniform",
  ilr.weights = "uniform",
  return.all = FALSE,
  pseudocount = 0,
  abund_values = "counts",
  ...
)

Arguments

x

matrix of data to be transformed (samples are rows, compositional parts are columns) - zero must be dealt with either with pseudocount, multiplicative replacement, or another method.

tree

a phylo class tree object that is binary (see multi2di)

sbp

(Optional) give a precomputed sbp matrix phylo2sbp if you are going to build multiple ILR bases (e.g., with different weightings).

part.weights

weightings for parts, can be a named vector with names corresponding to colnames(x) otherwise can be a string, options include:

'uniform'

(default) uses the uniform reference measure

'gm.counts'

geometric mean of parts of x

'anorm'

aitchison norm of parts of x (after closure)

'anorm.x.gm.counts'

'anorm' times 'gm.counts'

'enorm'

euclidean norm of parts of x (after closure)

'enorm.x.gm.counts'

'enorm' times 'gm.counts', often gives good results

ilr.weights

weightings for the ILR coordiantes can be a named vector with names corresponding to names of internal nodes of tree otherwise can be a string, options include:

'uniform'

(default) no weighting of the ILR basis

'blw'

sum of children's branch lengths

'blw.sqrt'

square root of 'blw' option

'mean.descendants'

sum of children's branch lengths PLUS the sum of each child's mean distance to its descendent tips

return.all

return all computed parts (e.g., computed sign matrix(sbp), part weightings (codep), ilr weightings (codeilr.weights), contrast matrix (V)) as a list (default=FALSE) in addition to in addition to returning the transformed data (.ilrp). If return.all==FALSE then only returns the transformed data (not in list format) If FALSE then just returns list containing x.ilrp.

pseudocount

optional pseudocount added to observation matrix ('x') to avoid numerical issues from zero values. Default value is 0 which has no effect (allowing the user to handle zeros in their own preffered way before calling philr). Values < 0 given an error.

abund_values

A single character value for selecting the assay to be used. Only used when x is object from this class. Default: "counts".

...

other parameters passed to philr.data.frame or philr.TreeSummarizedExperiment

Details

This is a utility function that pulls together a number of other functions in philr. The steps that are executed are as follows:

  1. Create sbp (sign matrix) if not given

  2. Create parts weightings if not given

  3. Shift the dataset with respect to the new reference measure (e.g., part weightings)

  4. Create the basis contrast matrix from the sign matrix and the reference measure

  5. Transform the data based on the contrast matrix and the reference measure

  6. Calculate the specified ILR weightings and multiply each balance by the corresponding weighting

Note for both the reference measure (part weightings) and the ILR weightings, specifying 'uniform' will give the same results as not weighting at all.

Note that some of the prespecified part.weights assume x is given as counts and not as relative abundances. Except in this case counts or relative abundances can be given.

The tree argument is ignored if the x argument is assay or assay These objects can include a phylogenetic tree. If the phylogenetic tree is missing from these objects, it should be integrated directly in these data objects before running philr. Alternatively, you can always provide the abundance matrix and tree separately in their standard formats.

If you have a assay, this can be converted into assay, to incorporate tree information.

Value

matrix if return.all=FALSE, if return.all=TRUE then a list is returned (see above).

Author(s)

Justin Silverman; S3 methods by Leo Lahti

See Also

phylo2sbp calculate.blw

Examples

# Prepare example data
tr <- named_rtree(5)
x <- t(rmultinom(10,100,c(.1,.6,.2,.3,.2))) + 0.65 # add a small pseudocount
colnames(x) <- tr$tip.label
philr(x, tr, part.weights='enorm.x.gm.counts',
               ilr.weights='blw.sqrt', return.all=FALSE)

# Running philr on a TreeSummarizedExperiment object

## Prepare example data
library(mia)
library(tidyr)
data(GlobalPatterns, package="mia")

## Select prevalent taxa 
tse <-  GlobalPatterns %>% subsetByPrevalentTaxa(
                               detection = 3,
                               prevalence = 20/100,
                               as_relative = FALSE)

## Pick taxa that have notable abundance variation across sammples
variable.taxa <- apply(assay(tse, "counts"), 1, function(x) sd(x)/mean(x) > 3.0)
tse <- tse[variable.taxa,]

# Collapse the tree
tree <- ape::keep.tip(phy = rowTree(tse), tip = rowLinks(tse)$nodeNum)
rowTree(tse) <- tree

## Add a new assay with a pseudocount 
assays(tse)$counts.shifted <- assay(tse, "counts") + 1

## Run philr for TreeSummarizedExperiment object
## using the pseudocount data
res.tse <- philr(tse, part.weights='enorm.x.gm.counts',
               ilr.weights='blw.sqrt', return.all=FALSE,
               abund_values="counts.shifted")

# Running philr on a phyloseq object
## Not run: 
  pseq <- convertToPhyloseq(tse)
  res.pseq <- philr(pseq, part.weights='enorm.x.gm.counts',
               ilr.weights='blw.sqrt', return.all=FALSE,
               pseudocount=0.5)

## End(Not run)


jsilve24/philr documentation built on Oct. 17, 2024, 4:59 p.m.