mergeCountFiles: Merge multiple expression count data files for RNAseq

Description Usage Arguments Details Value Author(s) Examples

Description

Reads in the count data files from each sample of an RNAseq experiment and then combines the files into a single data.frame, useful for several downstream applications.

Usage

1
2
3
4
5
6
7
  mergeCountFiles(fileDir,
                 fileID = "*.genes.results$",
                 fileSep = "\t",
                 seqIDcol = 1,
                 colsToKeep = c("expected_count","FPKM"),
                 idCols = NULL,
                 minMatchToMerge = 0.5)

Arguments

fileDir

The path to the directory containing all of the count data files.

fileID

Character to use to limit which files are imported; regular expressions allowed. This fileID is also removed from the file names when naming the output columns.

fileSep

The column delimiter used in the file (e.g. "," or "\t")

seqIDcol

Which column has the gene name or other identifier? Can be numeric or character.

colsToKeep

Which columns of info should be kept for the output? Can be a vector of either numeric or character.

idCols

Which columns of general site information should be kept? This limits these to one single column, rather than one for each sample. Note, that row.names of the output will already match seqIDcol. This can be a vector of either numeric or character, but must match the format of seqIDcol if the rows of your input data are not all in the same order.

minMatchToMerge

If your data are not in the same order, what portion of gene names must be in common to proceed with merge? This is a protection step to avoid accidentally merging datasets from different references.

Details

Reads in the count data files from fileDir and merges by gene. It checks to see if the information is all in the same order, and issues a warning if not because it may suggest data are from different reference files.

Value

Returns a data.frame with a row for each gene, and columns (named from the file names), for each sample for each data type kept

Author(s)

Mark Peterson

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  
## Find where the data is stored (or use your own)
pathToData <- try(system.file("",package="rnaseqWrapper",mustWork=TRUE))

if(class(pathToData) != "try-error"){
## Make sure the data were found before proceeding

## Read in the data
## Note, the files here are compressed,
##  but yours do not need to be
testCountData <- mergeCountFiles(paste(pathToData,"/data/",sep=""),".genes.results.txt.gz")

## Display the contents
head(testCountData)

}  
  
## Not run: 
    ## On your data, it will look more like:
    mergedCountData <- mergeCountFiles("/path/to/countData")
    head(mergedCountData)    
  
## End(Not run) 

rnaseqWrapper documentation built on May 2, 2019, 5:58 a.m.