long_distance_df: Create a long data frame of among-sample distances
In JCSzamosi/aftersl1p: Generate Summary Graphics and Basic Analysis of 16s Data

long_distance_df

R Documentation

Create a long data frame of among-sample distances

Description

long_distance_df creates a long data frame of all the pairwise distances from a sample distance matrix (e.g. the output of phyloseq::distance()) with all the metadata listed for each sample. Allows for easy within- and among-group boxplots, or whatever other comparisons are of interest.

Usage

long_distance_df(
  dmat,
  metadat,
  idcol = "X.SampleID",
  diag = FALSE,
  suff = c("1", "2"),
  distcol = "Distance",
  baseline = NULL
)

Arguments

`dmat`	A distance matrix or other diagonal matrix object with sample names as row and column names.
`metadat`	A data frame or data frame-like object with the data set's metadata
`idcol`	(`'X.SampleID'`.) A string. The column in `metadat` that holds the sample names. Sample names should match the row/column namse of the distance matrix. If there are samples in the metadata data frame that are missing from the distance matrix, they will be excluded with a warning. If there are samples in the distance matrix that are missing from the metadata, you will get an error.
`diag`	(`FALSE`.) Logical. Whether the diagonal elements (zeros in a distance matrix) should be included in the long data frame. Defaults to `FALSE` because we almost never want them.
`suff`	(`c('1','2')`.) A character vector of length 2. The suffixes to be appended to the metadata column names in the output. The two elements must not be identical.
`distcol`	(`'Distance'`.) A string. The desired column name for the distance column in your long data frame. Only here to avoid clashes with existing metadata column names.
`baseline`	(`'NULL'`). A dataframe whose column names must also be column names in the metadat data frame, and whose rows contain a subset of the possible values/combinations. If this parameter is used, all the samples whose metadata matches a row in this data frame will end up in Sample1 and the rest will end up in Sample2. This means you will not get all the pairs, because the samples in Sample1 will not get compared to each other, and neither will the samples in Sample2. If this parameter is not used, the upper triangle of the distance matrix is used, without regard for metadata values.

Value

A data frame N(N-1) (or N^2 if diag = TRUE is set) rows (where N is the number of samples) with sample IDs, metadata, and pairwise distances listed for each pair of samples. Sample ID and metadata columns have '1' or '2' appended to them so the user can tell which column belongs to which sample.

JCSzamosi/aftersl1p documentation built on July 3, 2025, 8:37 p.m.