dataQC.MIxS: format dataframes into a MIxS object

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/DataQC_Main_MIxS.R

Description

takes a dataframe with contextual data and metadata from a sequencing dataset and performs a basis Quality Controll. (see details)

Usage

1
dataQC.MIxS(dataset = NA, ask.input=TRUE, add_to = NA, sample.names = NA)

Arguments

dataset

data.frame. The raw dataset downloaded from INSDC to be cleaned up. Rows are samples, columns variables. Units can be listed in the first or second row, and will be automatically detected if the row names of this row includes the word "units". Different units per sample are not allowed.

ask.input

logical. If TRUE, console input will be requested to the user when a problem occurs (process runs user-supervised). Default TRUE

add_to

a MIxS.metadata object. An already present dataset with quality-controlled metadata. must be formatted as MIxS.metadata to ensure the correct input format of the data.

sample.names

character. The column with sample names to use. Use row.names for rownames. If NA the function will try to find sample names itself. default NA

Details

Any sequencing project typically has important additional data associated with it. This goes from laboratory protocols, sequencing platform settings or environmental measurements. Thisfunction was develloped to sort through these metadata (provided in a dataframe), and perform a basic quality controll, correcting the most common mistakes, like incorrectly formatting the geographic coordinates, formatting dates, typos or variants of variable names, etc. To do this, the function makes use of a build-in dictionary of (MIxS) terms and their synonyms (that is: spelling errors, writing differences, true synonyms,...). Note that it is possible some terms are not recognized. In that case contact the author of the package to update the dictionary in the upcomming version.

Value

a MIxS.metadata object that is compatible with the MIxS standard

Author(s)

Maxime Sweetlove ccBY 4.0 2019

See Also

get.BioProject.metadata.INSDC, get.sample.attributes.INSDC

Other standardization functions: coordinate.to.decimal(), dataQC.DwC_general(), dataQC.DwC()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
test_metadata <- data.frame(sample_name=c("sample1", "sample2"),
                            collection_date=c("2021-09-27", "2021-09-28"),
                            lat_lon=c("54.7 88.9", "33 -48.4"),
                            BioProject=c("PRJXXXX", "PRJXXXX"),
                            investigation_type=c("mimarks-survey", "mimarks-survey"),
                            row.names=c("sample1", "sample2"))
dataQC.MIxS(test_metadata)

## End(Not run)

biodiversity-aq/OmicsMetaData documentation built on Dec. 19, 2021, 9:44 a.m.