sync.metadata.sequenceFiles: Check if all samples in a dataframe have sequence data

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/Format_sequenceData_ENA.R

Description

checks if all records in the metadata file match with filenames in a given directory

Usage

1
2
3
sync.metadata.sequenceFiles(Names, file.dir=NULL, 
  paired=TRUE, seq.file.extension=".fastq.gz", 
  pairedEnd.extension=c("_1", "_2"))

Arguments

Names

a vector or a MIxS.metadata class object. A character vector with the sample names (without a file extension) or the MIxS.metadata object of which the sample names (original_name) should be compared to the sequence file names

file.dir

a character string. The path to the directory where the sequence files are stored

paired

boolean. wether or not the sequence files are paired-end (forward _1, reverse_2) or single-end

seq.file.extension

a character string. The file-extension of the sequence files

pairedEnd.extension

a character vector of length 2. If the data is paired-end data, specify the forward (first element of te vector) and reverse (second) extension tags here. Default is c("_1", "_2")

Details

Nucleotide sequence datasets typically contain a large number of samples, too much to check manually. This function will check if all the names that are expected in the sequence data files are actually found. Reasons for mismatches may include typos, accedentally deleting files or records, errors in naming or copying files,...

Value

a list of two. Two lists with the non-matching file names. redundant_metadata shows the names found in the metadata but not among the sequences, redundant_seqFiles shows sequence file names that were not listed in the metadata.

Author(s)

Maxime Sweetlove

See Also

Other data archiving functions: FileNames.to.Table(), get.ENAName(), prep.metadata.ENA(), renameSequenceFiles()

Examples

1
2
3
sync.metadata.sequenceFiles(Names="seq_sample1", file.dir="user/path/to/sequenceFilesFolder",
                            paired=TRUE, seq.file.extension=".fastq.gz",
                            pairedEnd.extension=c("_1", "_2"))

biodiversity-aq/OmicsMetaData documentation built on Dec. 19, 2021, 9:44 a.m.