readCountData: Read Count Data

Description Usage Arguments Details Value See Also Examples

Description

Generates a count matrix from a list of files containing count data.

Usage

1
2
readCountData(targets, path, id_column, data_column, skip = 0,
  sep = "\t")

Arguments

targets

A data.frame containing the target information mapping samples to their count data files and experimental conditions. See importTargets for more information.

path

A character string giving the directory containing the file. This may be omitted if the file is in the current working directory.

id_column

A character string giving the name of the column which contains the gene identifiers, should be identical for all of the count data files.

data_column

A numeric which is the index of the column in the files which contains the count data

skip

A numeric which is the number of non-column-header lines to skip at start of the file

sep

A character which is field separator

Details

readCountData is the main data import function for deago. Before importing count data, target information should be imported using importTargets which will map the count data files to the experimental conditions applied for each sample. The targets dataframe should contain the following columns:

filename

name of the file containing the count data for the sample

condition

experimental treatment or condition that was applied

replicate

replicate identifier - can be numeric or character

label

unique sample identifier comprised of the condition and replicate

For more information on the targets file and dataframe see importTargets.

When importing from individual files, count data will be imported using the filename and label columns in the targets dataframe. Each row in the targets file represents a sample, each sample in the targets file has a file name and a unique label. The column containing the gene identifiers must have the same column name in each of the count data files. This column name should be specified using id_column. The order of these gene identifiers should be the same across all of the count data files being imported.

By default, deago assumes that the count data column is in the final column in the file. If this is not the case, a column number must be specified using data_column. The reason that deago uses a column number instead of a column name is because many of the different count data programs use a filename as the header for the count column which would differ between files. As a file may have been renamed between creation and analysis, it is also simpler to make sure that the counts are in the same place in each of the files.

Once imported, the first column of the deago count matrix returned by readCountData will contain the gene identifiers. The remaining columns will contain the count data for each sample with the column name being the unique label associated with that sample in the targets dataframe.

Value

dataframe: containing count data

See Also

Other import functions: annotateDataset, importAnnotation, importConfig, importTargets, validateConfig, validateTargets

Examples

1
2
3
4
## Not run: 
readCountData(targets, id_column="genes", data_column=2)

## End(Not run)

sanger-pathogens/deago documentation built on May 28, 2019, 8:42 a.m.