knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
Allows loading and working with *.sam
files, i.e. the format used by SamTools. Tracks header information, reads, and the source of the file. Loads all reads and optionally the extra read tags into S3 objects that are also data frames. This allows easy manipulation and requires much less overhead than using Bioconductor, but is significantly slower and uses more memory.
Includes functions to translate the Sam flags and to filter reads.
A Sam
object is created/constructed by the Sam()
function, usually by passing it the name of a *.sam
file as a parameter. From this object, the sam header data can be accessed with SamHeader()
, the sam read data with SamRead
and the source information about the sam file with SamSource()
. The following code uses a tiny sam file distributed with this package.
library(SamSeq) # Get the name of a *. sam file to use in this example. If SamSeq is not # installed in the normal place, specify the R library directory with # the system.file parameter lib.loc= sample.sam <- system.file( "extData/pe.sam", package = "SamSeq" ) samObj <- Sam( sample.sam ) class(samObj) samHeaderObj <- SamHeader( samObj ) class(samHeaderObj) samReadsObj <- SamReads( samObj ) class(samReadsObj) samSourceObj <- SamSource( samObj ) class(samSourceObj)
The SamHeader
is not fully parsed at this point, it will probably have a different structure in future. For now it is provided as an object that can be cast to a data frame with two columns: the initial tag as the column tag
, and the rest of the line as the column record
.
samHeaderObj <- SamHeader( samObj ) headerDF <- as.data.frame(samHeaderObj) names(headerDF) headerDF[1, ] headerDF$tag headerDF[headerDF$tag == "SQ", "record" ] areHeaderTagsParsed(samHeaderObj) areHeaderTagsParsed(samHeaderObj) == areHeaderTagsParsed(samObj)
The problem yet to be determined is how to split the contents when some header lines use tabs for delimiters and others can contain embedded unquoted tabs. Probably some form of parameter object that will allow specifying, on a per tag basis, what to do.
Vectors of raw header lines can also be parsed directly into a SamHeader
object. Source information can be provided, by default it will all be NA
.
headerLines <- c( "@HD\tVN:1.4", "@SQ\tSN:chr7\tLN:159138663" ) headersObj <- SamHeader( headerLines ) headerDF <- as.data.frame(samHeaderObj) headerDF[1:2, ]
The reads in a *.sam
file are represented as a SamReads
object, which is a data frame containing the fixed 11 columns of read data with the same name as the sam fields, plus an additional column or columns containing the optional read tags. By default the optional read tags are parsed out into columns with the tag as the column name and the value converted to integer or double if appropriate, otherwise left as character.
samReadsObj <- SamReads( samObj ) readsDF <- as.data.frame(samReadsObj) names(readsDF) readsDF[1:2, ] readsDF$flag areReadTagsParsed(samReadsObj) areReadTagsParsed(samReadsObj) == areReadTagsParsed(samObj)
If splitTags= TRU
E is set when reading in the file with Sam()
, tags will be left as a single unparsed tags
column with values the unparsed tags delimited by embedded tabs. You might want to leave the optional tags unparsed if you don't need them as parsing them makes reading the file 4 times slower.
rawTagsSamObj <- Sam( sample.sam, splitTags = FALSE ) rawTagsReadObj <- SamReads( rawTagsSamObj ) rawReadsDf <- as.data.frame(rawTagsReadObj) names(rawReadsDf) rawReadsDf[1:2, ] rawReadsDf$tags areReadTagsParsed(rawTagsReadObj) areReadTagsParsed(rawTagsReadObj) == areReadTagsParsed(rawTagsSamObj)
The source of the sam data is automatically bound to Sam
object created when the file is read in, and propagated to the SamReads
and SamHeader
objects. It can be obtained by the SamSource()
accessor.
samFileSourceObj <- SamSource( samObj ) samHeaderSourceObj <- SamSource( SamHeader( samObj )) samReadsSourceObj <- SamSource( SamReads( samObj )) identical( samFileSourceObj, samHeaderSourceObj) identical( samFileSourceObj, samReadsSourceObj)
A SamSource
object has accessors samSourceName()
, samSourceHost()
, and samSourceType()
which return the name of the file read, the name of machine with the file, and the type of access used to get the data, respectively. Currently the only type of access is file
. A possible future type might be url
.
The accessors can be applied directly to the Sam
, SamHeader
, and SamReads
objects, or indeed any object for which SamSource(x)
is defined.
samSource <- SamSource( samObj ) samSourceName( samSource ) samSourceHost( samSource ) samSourceType( samSource )
The accessors are S3 functions, and can be applied directly to the Sam
, SamHeader
, and SamReads
objects without first extracting the SamSource
object.
samSourceName( samObj ) samSourceHost( samHeaderObj ) samSourceType( samReadsObj )
To be detailed - see ?samFlags
and ?testSamFlags
To be detailed - see ?pairedReads
, ?samReadFilter
, ?pairedEndReadTests
, and ?readTests
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.