midivObject: Creating a midiv-object
In larssnip/midiv: MiDiv-lab bioinformatics

View source: R/midiv_object.R

midivObject

R Documentation

Creating a midiv-object

Description

Creating a list containing data from microbial community profiling at MiDiv.

Usage

midivObject(
  metadata.tbl,
  readcount.mat,
  sequence.tbl,
  taxonomy.tbl = NULL,
  sample_id_column = "SampleID",
  filter_samples = "metadata",
  filter_OTUs = "sequence"
)

Arguments

`metadata.tbl`	A data.frame with the sample metadata, or the name of a file.
`readcount.mat`	A matrix with readcounts data, or the name of a file.
`sequence.tbl`	A data.frame with the OTU sequences, or the name of a file.
`taxonomy.tbl`	A data.frame with the OTU taxonomy table results, or the name of a file (optional).
`sample_id_column`	Text with the name of the metadata.tbl column name that identifies samples.
`filter_samples`	Text indicating if the metadata or the readcount table should decide the samples to keep.
`filter_OTUs`	Text indicating if the readcount or sequence table should decide the OTUs to keep.

Details

This function stores the data structure from the processing of microbial community sequencing data in a list that we refer to as a *midiv-object*.

The first four arguments are either data structure already read into R, or names of the files to read.

The metadata.tbl is a data.frame with one row for each sample, containing metadata for each sample in the columns. It must have one column that uniquely identifies each sample. This is specified in sample_id_column and is by default "SampleID". If a file name is specified, it is assumed to be a tab-delimited text file.

The readcount.mat is a *matrix* with readcounts, one row for each OTU and one column for each sample. If a file name is specified, it is assumed to be a tab-delimited text file, with OTUs in the rows and the samples in the columns, but where the first column contains the OTU identifying texts (typically OTU1, OTU2,...).

The sequence.tbl is a Fasta-table with centroid sequences, see readFasta, or the name of a FASTA-file. These sequences are the centroid sequences for each OTU, and the texts in the Header column must match the texts identifying the OTUs in the readcount.mat above (typically OTU1, OTU2,...).

The taxonomy.tbl may be supplied, and must then be a table where the first column is named OTU and contains the texts identifying the OTUs (typically OTU1, OTU2,...). The remaining columns should list the taxonomy at various ranks, and nothing more. Columns of scores etc must been selected out of this table, see sintaxFilter. If taxonomy.tbl is a file name the file must be a tab-delimited text file with the columns as described above. NB! In the created midiv object this is merged with the sequence.tbl.

The argument filter_samples is only used if the samples in the metadata.tbl and readcount.mat are not the same. If filter_sample = "metadata" the samples in this table are kept, and the readcount.mat is trimmed accordingly.

The argument filter_OTUs is only used if the OTUs in the readcount.mat and sequence.tbl are not the same. If filter_OTUs = "sequence" the samples in this table are kept, and the readcount.mat is trimmed/extended accordingly.

Value

A list with the elements:

metadata.tbl a data.frame with one row for each sample.
readcount.mat a matrix with the readcounts, the samples are in the columns, the OTUs in the rows.
sequence.tbl a data.frame with the OTU sequences (see readFasta) and taxonomy, if supplied.