Read one or more columns into XStringSet (e.g., DNAStringSet) objects

Share:

Description

This function allows short read data components such as DNA sequence, quality scores, and read names to be read in to XStringSet (e.g., DNAStringSet, BStringSet) objects. One or several files of identical layout can be specified.

Usage

1
2
3
4
readXStringColumns(dirPath, pattern=character(0),
                   colClasses=list(NULL),
                   nrows=-1L, skip=0L,
                   sep = "\t", header = FALSE, comment.char="#")

Arguments

dirPath

A character vector giving the directory path (relative or absolute) of files to be read.

pattern

The (grep-style) pattern describing file names to be read. The default (character(0)) reads all files in dirPath. All files are expected to have identical numbers of columns.

colClasses

A list of length equal to the number of columns in a file. Columns with corresponding colClasses equal to NULL are ignored. Other entries in colClasses are expected to be character strings describing the base class for the XStringSet. For instance a column of DNA sequences would be specified as "DNAString". The column would be parsed into a DNAStringSet object.

nrows

A length 1 integer vector describing the maximum number of XString objects to read into the set. Reads may come from more than one file when dirPath and pattern parse several files and nrow is greater than the number of reads in the first file.

skip

A length 1 integer vector describing how many lines to skip at the start of each file.

sep

A length 1 character vector describing the column separator.

header

A length 1 logical vector indicating whether files include a header line identifying columns. If present, the header of the first file is used to name the returned values.

comment.char

A length 1 character vector, with a single character that, when appearing at the start of a line, indicates that the entire line should be ignored. Currently there is no way to use comment characters in other than the first position of a line.

Value

A list, with each element containing an XStringSet object of the type corresponding to the non-NULL elements of colClasses.

Author(s)

Martin Morgan <mtmorgan@fhcrc.org>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## valid character strings for colClasses
names(slot(getClass("XString"), "subclasses"))

dirPath <- system.file('extdata', 'maq', package='ShortRead')

colClasses <- rep(list(NULL), 16)
colClasses[c(1, 15, 16)] <- c("BString", "DNAString", "BString")

## read one file
readXStringColumns(dirPath, "out.aln.1.txt", colClasses=colClasses)

## read all files into a single object for each column
res <- readXStringColumns(dirPath, colClasses=colClasses)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.