readVariantFiles: Read in variant files for RNAseq

Description Usage Arguments Details Value Author(s) Examples

Description

Reads in the variant files from each sample of an RNAseq experiment and then combines the files into a single data.frame, useful for several downstream applications.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
readVariantFiles(fileDir,
                 sepSymbol = "_",
                 fileID = "*_variants.txt",
                 firstColName = "SEQ_ID",
                 fileSep = "\t",
                 idCols = 5,
                 refPosCol = "Reference.Position",
                 colToSort = "Coverage",
                 removeDups = TRUE,
                 returnMerged = TRUE,
                 returnSing = FALSE,
                 limitGenes = NULL,
                 omitRefMatches = TRUE,
                 refAlleleCol = "Reference$",
                 varAlleleCol = "Allele")

Arguments

fileDir

The path to the directory containing all of the variant files.

sepSymbol

The symbol that separates the sample names from other info in the file name. Used to pull names for columns in the combined file. Set to "" if the full file name should be used.

fileID

character to use to limit which files are imported; regular expressions allowed

firstColName

What should the first column be renamed to. Set to NULL or "" to leave the column as is. Intended to stanardize and to match the column names in other parts of the analysis pipeline.

fileSep

The column delimiter used in the file (e.g. "," or "\t")

idCols

How many columns of position information are there? Avoids including duplicated information in the combined ouput.

refPosCol

Which column has the reference position? Can be numeric or character

colToSort

Which column should be used to keep one line per position, if removeDups == TRUE? Can be numeric or character.

removeDups

Logical, should duplicates at a position be removed? This is necessary to avoid massive over merging

returnMerged

Logical, should the merged variants be returned?

returnSing

Logical, should each of the separate variant files be returned?

limitGenes

A character vector listing the genes to include. This can be useful if your variant files include genes that you are not interested in analyzing (e.g. things without a blast hit).

omitRefMatches

Logical, should 'variants' which match the reference be excluded? This is useful if your variant file includes rows for reads aligning to the reference allele, which may be accidentally set as the main 'variant' in this function. Defaults to TRUE.

refAlleleCol

Which column has the reference allele? Can be numeric or character.

varAlleleCol

Which column has the variable alleles? Can be numeric or character.

Details

Reads in the variant files from fileDir and merges by gene and position.

Value

Output is based on returnMerged & returnSing returns:

If returnMerged: a data.frame with the merged variants

If returnSing: a list of the singVariants (cleaned if removeDups=TRUE)

If both TRUE: a list with both of the above

Author(s)

Mark Peterson

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 

mergedVariants <- readVariantFiles (
      fileDir="path/to/variant/directory",
      fileID = "*_variants.txt",
      firstColName = "SEQ_ID",
      idCols = 4, 
      refPosCol = "Region"
      ) 


## End(Not run) 

rnaseqWrapper documentation built on May 2, 2019, 5:58 a.m.