prep.metadata.ENA: Formatting metadata and environmental data to upload to...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/Format_sequenceData_ENA.R

Description

converts metadata into the right format (a tab separated text file) to submit data to ENA (version Januari 2020)

Usage

1
2
3
4
5
6
prep.metadata.ENA(metadata, dest.dir, file.name,
  sample_unique_name_prefix=NA, checklist_accession=NA, 
  tax_name=NA, ask.input=TRUE, insert.size=NA, library.layout=NA, 
  library.strategy=NA, library.selection=NA, 
  seq.file.extension=".fastq.gz", 
  pairedEnd.extension=c("_1", "_2"))

Arguments

metadata

a MIxS.metadata class object. The object to be written as text file suited for submission to ENA.

dest.dir

a character string. The file path to the directory where the output files must be written. If left blank files are written to the working directory.

file.name

a character string. A name (without a file type extension) to use for the output files.

sample_unique_name_prefix

a character string. The unique name prefix to append to the sample names (to tie them all together). Required for ENA submissions

checklist_accession

a character string. The name of a MIxS environmetal package or it's ENA checklist accession number.

tax_name

a character string. The scientific name of a taxon targeted in the data (applicable to all the samples). Same as "subspecf_gen_lin" or "scientific_name" in the MIxS.metadata input.

ask.input

boolean. Whether or not to ask for user input to make decisions or solve problems that arise during the reformatting (e.g. missing data,...)

insert.size

a character string. The size of the reads (in number of basepairs), if applicable to all samples.

library.layout

a character string. The layout of the library, either PAIRED or SINGLE, if applicable to all samples.

library.strategy

a character string. The library strategy (e.g. AMPLICON, WGS,...), if applicable to all samples.

library.selection

a character string. The method used to select for, enrich or or screen the material being sequenced (e.g. PCR)?, if applicable to all samples.

seq.file.extension

a character string. The extension for the sequence files. Default is .fastq.gz

pairedEnd.extension

a character vector of length 2. If the data is paired-end data, specify the forward (first element of the vector) and reverse (second) extension tags here. Default is c("_1", "_2")

Details

This function will reformat metadata for submission to ENA. Specifically made for ecological environmental studies (e.g. amplicon sequencing, shotgun metagenomics,...), with additional QCs build in for Antarctic and Southern Ocean data. The assumption is taken that the dataset has already been subjected a quality controll.

Value

a *.tsv file with the sample metadata, and a \*_runInfo.tsv file with the technical data, written to the destination directory.

Author(s)

Maxime Sweetlove

See Also

Other data archiving functions: FileNames.to.Table(), get.ENAName(), renameSequenceFiles(), sync.metadata.sequenceFiles()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## Not run: 
sampleNames <- c("sample_1", "sample_2")
test_MIxS <- new("MIxS.metadata",
                  data = data.frame(var1=c(1,2), var2=c(3,4), 
                                    eventID=sampleNames, 
                                    target_gene=c("16S", "18S"), 
                                    subspecf_gen_lin=c("Bacteria", "Eukaryota"),
                                    seq_meth=c("Illumina MiSeq", "Illumina MiSeq"),
                                    row.names=sampleNames),
                  section = c(var1="section1", var2="section1", eventID="miscellaneous",
                              target_gene="miscellaneous", subspecf_gen_lin="miscellaneous",
                              seq_meth="miscellaneous"),
                  units = c(var1="unit1", var2="unit2", eventID="alphanumeric",
                            target_gene="alphanumeric", subspecf_gen_lin="alphanumeric",
                            seq_meth="alphanumeric"),
                  env_package = "water",
                  type = "versatile",
                  QC = TRUE)
prep.metadata.ENA(metadata=test_MIxS, dest.dir=getwd(), file.name="testthat",
                  sample_unique_name_prefix=NA, checklist_accession=NA,
                  tax_name=NA, ask.input=FALSE,
                  insert.size=NA, library.layout=NA,
                  library.strategy=NA, library.selection=NA,
                  seq.file.extension=".fastq.gz", pairedEnd.extension=c("_1", "_2"))

## End(Not run)

biodiversity-aq/OmicsMetaData documentation built on Dec. 19, 2021, 9:44 a.m.