Helper function to generate a data frame that can be used as input for the function analyzeSNPhood

Share:

Description

collectFiles creates a data frame that can be used as input for the function analyzeSNPhood. The resulting data frame contains information about files that will be processed (column signal) and, optionally, corresponding input files for normalization (column input) and labels to combine datasets to meta-datasets (column individual).

Usage

1
2
3
collectFiles(patternFiles, recursive = FALSE, ignoreCase = TRUE,
  inputFiles = NA, individualID = NA, genotypeMapping = NA,
  verbose = TRUE)

Arguments

patternFiles

Character. If vector of length 1, absolute path to one or multiple BAM files that should be processed. Wildcards ("*") are allowed (examples are *CTCF* or *.bam, see also examples). If vector of length > 1, each element must specify the absolute path to a BAM file, with no wildcards being allowed. See also the note above concerning the integration of BamFile or BamFileList objects. For more details, see the examples and the vignette.

recursive

Logical(1). Default FALSE. Should the search for BAM files within the directory be performed recursively? If set to TRUE, all files matching the pattern within the specified directory and all of its subdirectories will be added. If set to FALSE, only files within the specified directory but not any subdirectories will be used.

ignoreCase

Logical(1). Default TRUE. Should the specified pattern be case sensitive?

inputFiles

Character. Default NULL. Input files that should be used as a control for normalization. Supported values are NA (no input normalization), a single character specifying one or multiple input files (comma-separated, see examples) that should be used for all processed files, or a character vector of the same length as the number of files that will be processed. Set to NULL if you want to add the files later manually in the data frame (see vignette).

individualID

Character. Default NULL. Name of the individual IDs. Only relevant if datasets should be pooled.

genotypeMapping

Character. Default NULL. Path to the corresponding genotype file in VCF format, followed by a colon and the name of the column in the VCF file. Genotypes can also be integrated later using the function associateGenotypes

verbose

Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled?

Details

Note that if you already have an object of class BamFile or BamFileList, this can easily be integrated into the SNPhood framework by using the path function to specify the value of the parameter patternFiles, see the examples below.

Value

a data frame with the three columns signal, input and individual that can be used as input for the function analyzeSNPhood.

See Also

analyzeSNPhood

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## For brevity, only exemplary filenames are given in the following. 
## Note that in reality, absolute paths should be provided.
## First some examples using specific files rather than files that 
## match a pattern in a particular directory

## Load SNPhoodData library
library(SNPhoodData)
files.df = collectFiles(patternFiles = paste0(system.file("extdata", package = "SNPhoodData"),"/*.bam"))

## If you already have BAM files in objects of class \code{\linkS4class{BamFile}} or \code{\linkS4class{BamFileList}},
## you may use the following code snippet:
files = list.files(pattern = "*.bam$",system.file("extdata", package = "SNPhoodData"),full.names = TRUE)
BamFile.o = BamFile(files[1])
BamFiles.o = BamFileList(files)
files.df = collectFiles(patternFiles = path(BamFile.o))
files.df = collectFiles(patternFiles = path(BamFiles.o))

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.