Description Usage Arguments Details Value Author(s) Examples
Reads in the variant files from each sample of an RNAseq experiment and then combines the files into a single data.frame, useful for several downstream applications.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | readVariantFiles(fileDir,
sepSymbol = "_",
fileID = "*_variants.txt",
firstColName = "SEQ_ID",
fileSep = "\t",
idCols = 5,
refPosCol = "Reference.Position",
colToSort = "Coverage",
removeDups = TRUE,
returnMerged = TRUE,
returnSing = FALSE,
limitGenes = NULL,
omitRefMatches = TRUE,
refAlleleCol = "Reference$",
varAlleleCol = "Allele")
|
fileDir |
The path to the directory containing all of the variant files. |
sepSymbol |
The symbol that separates the sample names from other info in the file name. Used to pull names for columns in the combined file. Set to "" if the full file name should be used. |
fileID |
character to use to limit which files are imported; regular expressions allowed |
firstColName |
What should the first column be renamed to. Set to NULL or "" to leave the column as is. Intended to stanardize and to match the column names in other parts of the analysis pipeline. |
fileSep |
The column delimiter used in the file (e.g. "," or "\t") |
idCols |
How many columns of position information are there? Avoids including duplicated information in the combined ouput. |
refPosCol |
Which column has the reference position? Can be numeric or character |
colToSort |
Which column should be used to keep one line per position,
if |
removeDups |
Logical, should duplicates at a position be removed? This is necessary to avoid massive over merging |
returnMerged |
Logical, should the merged variants be returned? |
returnSing |
Logical, should each of the separate variant files be returned? |
limitGenes |
A character vector listing the genes to include. This can be useful if your variant files include genes that you are not interested in analyzing (e.g. things without a blast hit). |
omitRefMatches |
Logical, should 'variants' which match the reference be excluded? This is useful if your variant file includes rows for reads aligning to the reference allele, which may be accidentally set as the main 'variant' in this function. Defaults to TRUE. |
refAlleleCol |
Which column has the reference allele? Can be numeric or character. |
varAlleleCol |
Which column has the variable alleles? Can be numeric or character. |
Reads in the variant files from fileDir
and merges by gene and position.
Output is based on returnMerged & returnSing returns:
If returnMerged: a data.frame with the merged variants
If returnSing: a list of the singVariants (cleaned if removeDups=TRUE)
If both TRUE: a list with both of the above
Mark Peterson
1 2 3 4 5 6 7 8 9 10 11 12 | ## Not run:
mergedVariants <- readVariantFiles (
fileDir="path/to/variant/directory",
fileID = "*_variants.txt",
firstColName = "SEQ_ID",
idCols = 4,
refPosCol = "Region"
)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.