Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/Format_sequenceData_ENA.R
converts metadata into the right format (a tab separated text file) to submit data to ENA (version Januari 2020)
1 2 3 4 5 6 |
metadata |
a MIxS.metadata class object. The object to be written as text file suited for submission to ENA. |
dest.dir |
a character string. The file path to the directory where the output files must be written. If left blank files are written to the working directory. |
file.name |
a character string. A name (without a file type extension) to use for the output files. |
sample_unique_name_prefix |
a character string. The unique name prefix to append to the sample names (to tie them all together). Required for ENA submissions |
checklist_accession |
a character string. The name of a MIxS environmetal package or it's ENA checklist accession number. |
tax_name |
a character string. The scientific name of a taxon targeted in the data (applicable to all the samples). Same as "subspecf_gen_lin" or "scientific_name" in the MIxS.metadata input. |
ask.input |
boolean. Whether or not to ask for user input to make decisions or solve problems that arise during the reformatting (e.g. missing data,...) |
insert.size |
a character string. The size of the reads (in number of basepairs), if applicable to all samples. |
library.layout |
a character string. The layout of the library, either PAIRED or SINGLE, if applicable to all samples. |
library.strategy |
a character string. The library strategy (e.g. AMPLICON, WGS,...), if applicable to all samples. |
library.selection |
a character string. The method used to select for, enrich or or screen the material being sequenced (e.g. PCR)?, if applicable to all samples. |
seq.file.extension |
a character string. The extension for the sequence files. Default is .fastq.gz |
pairedEnd.extension |
a character vector of length 2. If the data is paired-end data, specify the forward (first element of the vector) and reverse (second) extension tags here. Default is c("_1", "_2") |
This function will reformat metadata for submission to ENA. Specifically made for ecological environmental studies (e.g. amplicon sequencing, shotgun metagenomics,...), with additional QCs build in for Antarctic and Southern Ocean data. The assumption is taken that the dataset has already been subjected a quality controll.
a *.tsv file with the sample metadata, and a \*_runInfo.tsv file with the technical data, written to the destination directory.
Maxime Sweetlove
Other data archiving functions:
FileNames.to.Table()
,
get.ENAName()
,
renameSequenceFiles()
,
sync.metadata.sequenceFiles()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ## Not run:
sampleNames <- c("sample_1", "sample_2")
test_MIxS <- new("MIxS.metadata",
data = data.frame(var1=c(1,2), var2=c(3,4),
eventID=sampleNames,
target_gene=c("16S", "18S"),
subspecf_gen_lin=c("Bacteria", "Eukaryota"),
seq_meth=c("Illumina MiSeq", "Illumina MiSeq"),
row.names=sampleNames),
section = c(var1="section1", var2="section1", eventID="miscellaneous",
target_gene="miscellaneous", subspecf_gen_lin="miscellaneous",
seq_meth="miscellaneous"),
units = c(var1="unit1", var2="unit2", eventID="alphanumeric",
target_gene="alphanumeric", subspecf_gen_lin="alphanumeric",
seq_meth="alphanumeric"),
env_package = "water",
type = "versatile",
QC = TRUE)
prep.metadata.ENA(metadata=test_MIxS, dest.dir=getwd(), file.name="testthat",
sample_unique_name_prefix=NA, checklist_accession=NA,
tax_name=NA, ask.input=FALSE,
insert.size=NA, library.layout=NA,
library.strategy=NA, library.selection=NA,
seq.file.extension=".fastq.gz", pairedEnd.extension=c("_1", "_2"))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.