removeFalsePositives: removeFalsePositives is a function to exclude from a data...
In Metab: Metab: An R Package for a High-Throughput Analysis of Metabolomics Data Generated by GC-MS.

Description Usage Arguments Details Value Author(s) References See Also Examples

removeFalsePositives is used to exclude compounds considered false positives. We consider false positive those compounds detected in just few samples from a specific experimental condition.

removeFalsePositives(
	inputData, 
	truePercentage = 50, 
	Name_medium_condition = "none", 
	truePercentageMedium = 50, 
	save = TRUE, 
	folder, 
	output = "NoFalse"
)

`inputData`	When inputData is missing, a dialog box will pop up allowing the user to click-and-point to the .csv file from which the data is to be read. It may also receive a character string pointing to a .csv file containing a data frame such as data(exampleMetReport), generated by `MetReport`. Alternatively, inputData takes an R vector containing the desired data frame.
`truePercentage`	A numerical string indicating in which proportion of samples, per experimental condition, each metabolite must be present to be considered a true compound (See details).
`Name_medium_condition`	A character string indicating the name of the experimental condition containining samples from the uncultured medium (See details).
`truePercentageMedium`	truePercentageMedium works in the same way as truePercentage. However, it refers specifically to samples belonging to the experimental condition defined in Name_medium_condition (See details).
`save`	A logical vector (TRUE or FALSE) indicating if the resultant data frame should be saved in a .csv file. If save = TRUE, the .csv file will be saved in the path defined in the argument folder.
`folder`	A character string pointing to the folder where the results will be saved.
`output`	A character string indicating the name of the .csv file to be generated.

The data argument takes the path to the input file or an R vector containing the input data. The user should see data(exampleMetReport) for what the input file should look like. The first row of the input data is used to define the experimental conditions associated with each sample. This row contains the word Replicates in the first collumn and the names of experimental conditions in the following columns, according to samples. The argument truePercentage takes a numerical vector between 0 to 100. It works as a proportion cut off indicating the required proportion of samples from an experiemntal condition where a compound must be present in order to be considered a true compound. For example, considering an experiment performed in 6 replicates and true = 50, compounds detected in fewer than 3 replicates will have their intensity replaced by NA. However, samples from the uncultured medium may have a different number of replicates, generally less replicates. In this case, the user may want to have a different proportion cut off applied to samples from the uncultured medium. The argument Name_medium_condition is then used to identify in the input data those samples from the uncultured medium. For this, the argument Name_medium_condition takes the same character string used in the input data, in the row Replicates, to define the experimental condition associated with the uncultured medium. truePercentageMedium works in the same way as truePercentage; however, it refers specifically to samples containing the name defined in Name_medium_condition. This feature is used when analyzing extracellular metabolites or footprinting. As a result, removeFalsePositives produces a data frame containing only metabolites present in a higher proportion of replicates than defined by the user. When the argument save = TRUE this data frame is saved in a folder defined in folder. The CSV file generated is named according to the character vector defined in the argument output (e.g. NoFalse). The extension .csv will be added automatically. There is no limit for the number of experimental conditions under analysis.

removeFalsePositives processes the input data and produces a data frame containing only compounds present in a defined proportion of samples from each experimental condition.

Note that the first line of the resulting data.frame is used to represent sample meta-data (for example replicates).

Raphael Aggio <ragg005@aucklanduni.ac.nz>

Aggio, R., Villas-Boas, S. G., & Ruggiero, K. (2011). Metab: an R package for high-throughput analysis of metabolomics data generated by GC-MS. Bioinformatics, 27(16), 2316-2318. doi: 10.1093/bioinformatics/btr379

htest, MetReport, MetReportNames, normalizeByBiomass, normalizeByInternalStandard, buildLib

### Load the inputData ###
data(exampleMetReport)
### Normalize ####
normalizedData <- removeFalsePositives(exampleMetReport, truePercentage = 40, save = FALSE)
##################
# The abundances of compound Zylene3 will be replaced by NA in samples from experimental 
#condition 50ul, as it is present in less than 40 per cent of the samples from this 
#experimental condition. 
### Show results ####
print(normalizedData)