load_serp: Import ribosome profiling data

Description Usage Arguments Details Value See Also Examples

View source: R/import.R

Description

Import read count tables. Currently, CSV and HDF5 files are accepted.

Usage

1
2
3
4
5
6
7
8
load_serp(
  ...,
  ref,
  normalize = FALSE,
  bin = c("bynuc", "byaa"),
  exclude = list(),
  defaults = list()
)

Arguments

...

Name-value pairs of lists. The name of each argument will be the name of an experiment. The name of each element will be the sample type (e.g. TT for total translatome), the value of each element must be a character vector of file paths, where each file is a read count table of one replicate experiment. Replicate order must match between sample types.

ref

Reference data frame containing at least the following columns:

gene

Gene/ORF name. Must match the names given in the read count tables.

length

ORF length in nucleotides.

If all input files are HDF5 files, this argument can be missing, in which case a refence is created from the union of all input files. For this to work, each HDF5 data set must have an attribute cds_length. Other HDF5 attributes will be included as additional columns. If HDF5 datasets have an attribute gene, the corresponding column will be named gene_alt to avoid conflicts with the gene column created from dataset names.

normalize

Normalize the read counts to library size? Output will then be in RPM.

bin

Bin the data. bynuc: No binning (i.e. counts per nucleotide). byaa: Bin by residue.

exclude

Genes to exclude in all future analyses. This genes will also be excluded from total read count calculation. Note that the raw count tables will not be modified. Named list with names corresponding to experiments. If a character vector of gene names is given, these genes will be excluded from all experiments.

defaults

Default parameters of the data set.

Details

CSV files are expected to have one row per ORF with the first column containing the ORF names. Other columns represent positions from the 5' end of of the ORF in nucleotides and must contain integer-valued read counts. A header must be present.

HDF5 files are assumed to contain one data set per gene at the top level. Each data set must be a two-column matrix, the first column containing the position from the 5' end of the ORF in nucleotides and the second column containing the associated integer-valued read counts. Data set names are assumed to be gene names.

Value

An object of class serp_data

See Also

serp_data_accessors, defaults

Examples

1
2
3
4
5
6
7
8
9
## Not run: 
     data <- load_serp(DnaK=list(ip=c('data/dnak1_ip.csv', 'data/dnak2_ip.csv'),
                                 tt=c('data/dnak1_tt.csv', 'data/dnak2_tt.csv')),
                       TF=list(ip=c('data/tf1_ip.csv', 'data/tf2_ip.csv'),
                               tt=c('data/tf1_tt.csv', 'data/tf2_tt.csv')),
                       ref=reference_df,
                       bin='byaa')
     
## End(Not run)

ilia-kats/RiboSeqTools documentation built on Oct. 5, 2020, 7:41 p.m.