fichiers: Create and read a file of p-values for all pairwise tests of...

Description Usage Arguments Details Value Note Author(s) See Also Examples

Description

These functions allow to perform hypothesis testing on all possible pairwise ratios or differences of a set of variables in a given data frame, and store or read their results in a file

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
creer.Fp( d, nom.fichier,
          noms, f.p = student.fpc,
          log = FALSE, en.log = !log,
          nom.var = 'R',
          noms.colonnes = c( "Cmp.1", "Cmp.2", "p" ),
          add.col = "delta",
          sep = ";", dec = ".", row.names = FALSE, col.names = TRUE,
          ... )

grf.Fp( nom.fichier, col.noms = c( 1, 2 ), p = 0.05, col.p = 'p',
        reference = NULL, groupes = NULL,
        sep = ";", dec = ".", header = TRUE,
        ... )

Arguments

d

The data frame that contains the compositional variables. Other objects will be coerced as data frames using as.data.frame

nom.fichier

A length-one character vector giving the name of the file

noms

A character vector containing the column names of the compositional variables to be used for ratio computations. Names absent from the data frame will be ignored with a warning.

Optionnally, an integer vector containing the column numbers can be given instead. They will be converted to column names before further processing.

f.p

An R function that will perform the hypothesis test on a single ratio (or log ratio, depending on log and en.log values).

This function should return a numeric vector, of which the first one will typically be the p-value from the test — see creer.Mp for details.

Such functions are provided for several common situations, see links at the end of this manual page.

log

If TRUE, values in the columns are assumed to be log-transformed, and consequently ratios are computed as differences of the columns. The result is in the log scale.

If FALSE, values are assumed to be raw data and ratios are computed directly.

en.log

If TRUE, the ratio will be log-transformed before applying the hypothesis test computed by f.p. Don't change the default unless you really know what you are doing.

nom.var

A length-one character vector giving the name of the variable containing a single ratio (or log-ratio). No sanity check is performed on it: if you experience strange behaviour, check you gave a valid column name, for instance using make.names.

noms.colonnes

A length-three character vector giving the names of, respectively, the two columns of the data frame that will contain the components identifiers and of the column that will contain the p-value from the test (the first value returned by f.p).

add.col

A character vector giving the names of additional columns of the data.frame, used for storing additional return values of f.p (all but the first one).

sep, dec, row.names, col.names, header

Options for controling the file format, used by write.table and read.table.

col.noms

A length-two vector giving the two columns that contain the two components of the ratio. Can be given either as column number or column name.

col.p

A length-one vector giving the column that contain the p-value of the ratio. Can be given either as column number or column name.

p

The p-value cut-off to be used when creating the graph, see grf.Mp for details.

reference

A character vector giving the names of nodes that should be displayed with a different color in the created graph. These names should match components names present un the file. Typical use would be for reference genes in qRT-PCR experiments. By default, all nodes are displayed in palegreen; reference nodes, if any, will be displayed in orange.

groupes
...

additional arguments to f.p, passed unchanged to it.

Details

These functions are basically the same as the function that create data.frames (creer.DFp) and use data.frames to create a graph (grf.DFp), except thatthey work on text files. This allow to deal with compositionnal data including thousands of components, like RNA-Seq or microarray data.

Seeing the results as a matrix, computations are done in rows and the file is updated after each row. Only the upper-triangular part, without the diagonal, is stored in the file.

The function that creates the graphe from file is not very efficient and can take a lot of time for huge matrices. Making a first filter on the file using shell tools, like gawk or perl, or a dedicated C software and loading the resulting file as a data.frame before converting it into a graph is a better alternative, but may lose some isolated nodes.

Value

creer.Fp does not return anything. grf.Fp returns the result graph.

Note

Creating a file and working from a file is quite inefficient (in terms of speed), so for compositionnal data with only a few components, consider using creer.DFp that creates the data.frame directly in memory and grf.DFp that creates the graphe from a data.frame instead.

Author(s)

Emmanuel Curis (emmanuel.curis@parisdescartes.fr)

See Also

Predefined f.p functions: anva1.fpc for one-way analysis of variance; kw.fpc for the non-parametric equivalent (Kruskal-Wallis test).

For directly creating and manipulating matrices, creer.Mp and grf.Mp.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
   # load the potery data set
   data( poteries )

   # Create the file name in R temporary directory
   nom.fichier <- paste0( tempdir(), "/fichier_test.csv" )
   nom.fichier

   # Compute one-way ANOVA p-values for all ratios in this data set
   #  and store them in a text file
   creer.Fp( poteries, nom.fichier,
             c( 'Al', 'Na', 'Fe', 'Ca', 'Mg' ),
             f.p = anva1.fpc, v.X = 'Site',
             add.col = c( 'mu0', 'd.C', 'd.CoA', 'd.IT', 'd.L' ) )

   # Make a graphe from it and plot it
   plot( grf.Fp( nom.fichier ) )

   # The file is a simple text-file that can be read as a data.frame
   DFp <- read.table( nom.fichier, header = TRUE, sep = ";", dec = "," )
   DFp  

SARP.compo documentation built on May 16, 2021, 1:06 a.m.