Find_Samples | R Documentation |
Often, files e.g. raw sequencing FASTQ files, alignment BAM files,
or IRFinder output files, are stored in a single folder under some directory
structure.
They can be grouped by being in common directory or having common names.
Often, their sample names can be gleaned by these common names or the names
of the folders in which they are contained.
This function (recursively) finds all files and
extracts sample names assuming either the files are named by sample names
(level = 0
), or that their names can be derived from the
parent folder (level = 1
). Higher level
also work (e.g. level = 2
)
mean the parent folder of the parent folder of the file is named by sample
names. See details section below.
Find_Samples(sample_path, suffix = ".txt.gz", level = 0) Find_FASTQ( sample_path, paired = TRUE, fastq_suffix = c(".fastq", ".fq", ".fastq.gz", ".fq.gz"), level = 0 ) Find_Bams(sample_path, level = 0) Find_IRFinder_Output(sample_path, level = 0)
sample_path |
The path in which to recursively search for files
that match the given |
suffix |
A vector of or or more strings that specifies the file suffix (e.g. '.bam' denotes BAM files, whereas ".txt.gz" denotes gzipped txt files). |
level |
Whether sample names can be found in the file names themselves (level = 0), or their parent directory (level = 1). Potentially parent of parent directory (level = 2). Support max level <= 3 (for sanity). |
paired |
Whether to expect single FASTQ files (of the format "sample.fastq"), or paired files (of the format "sample_1.fastq", "sample_2.fastq") |
fastq_suffix |
The name of the FASTQ suffix. Options are: ".fastq", ".fastq.gz", ".fq", or ".fq.gz" |
Paired FASTQ files are assumed to be named using the suffix _1
and _2
after their common names; e.g. sample_1.fastq
, sample_2.fastq
. Alternate
FASTQ suffixes for Find_FASTQ()
include ".fq", ".fastq.gz", and ".fq.gz".
In BAM files, often the parent directory denotes their sample names. In this
case, use level = 1
to automatically annotate the sample names using
Find_Bams()
.
IRFinder outputs two files per BAM processed. These are named by the given
sample names. The text output is named "sample1.txt.gz", and the COV file
is named "sample1.cov", where sample1
is the name of the sample. These
files can be organised / tabulated using the function Find_IRFinder_Output
.
The generic function Find_Samples
will organise the IRFinder text output
files but exclude the COV files. Use the latter as the Experiment
in
CollateData if one decides to collate an experiment without linked COV
files, for portability reasons.
A multi-column data frame with the first column containing
the sample name, and subsequent columns being the file paths with suffix
as determined by suffix
.
Find_Samples
: Finds all files with the given suffix pattern.
Annotates sample names based on file or parent folder names.
Find_FASTQ
: Use Find_Samples() to return all FASTQ files
in a given folder
Find_Bams
: Use Find_Samples() to return all BAM files in a
given folder
Find_IRFinder_Output
: Use Find_Samples() to return all IRFinder output
files in a given folder, including COV files
# Retrieve all BAM files in a given folder, named by sample names bam_path <- tempdir() example_bams(path = bam_path) df.bams <- Find_Samples(sample_path = bam_path, suffix = ".bam", level = 0) # equivalent to: df.bams <- Find_Bams(bam_path, level = 0) # Retrieve all IRFinder output files in a given folder, # named by sample names expr <- Find_IRFinder_Output(file.path(tempdir(), "IRFinder_output")) ## Not run: # Find FASTQ files in a directory, named by sample names # where files are in the form: # - "./sample_folder/sample1.fastq" # - "./sample_folder/sample2.fastq" Find_FASTQ("./sample_folder", paired = FALSE, fastq_suffix = ".fastq") # Find paired gzipped FASTQ files in a directory, named by parent directory # where files are in the form: # - "./sample_folder/sample1/raw_1.fq.gz" # - "./sample_folder/sample1/raw_2.fq.gz" # - "./sample_folder/sample2/raw_1.fq.gz" # - "./sample_folder/sample2/raw_2.fq.gz" Find_FASTQ("./sample_folder", paired = TRUE, fastq_suffix = ".fq.gz") ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.