creer_df: Create p-values data-frame from pairwise tests of all...

Description Usage Arguments Details Value Note Author(s) See Also Examples

Description

This function performs hypothesis testing on all possible pairwise ratios or differences of a set of variables in a given data frame, and store their results in a data.frame

Usage

1
2
3
4
5
6
creer.DFp( d, noms, f.p = student.fpc,
           log = FALSE, en.log = !log,
           nom.var = 'R',
           noms.colonnes = c( "Cmp.1", "Cmp.2", "p" ),
           add.col = "delta", n.coeurs = 1,
           ... )

Arguments

d

The data frame that contains the compositional variables. Other objects will be coerced as data frames using as.data.frame

noms

A character vector containing the column names of the compositional variables to be used for ratio computations. Names absent from the data frame will be ignored with a warning.

Optionnally, an integer vector containing the column numbers can be given instead. They will be converted to column names before further processing.

f.p

An R function that will perform the hypothesis test on a single ratio (or log ratio, depending on log and en.log values).

This function should return a numeric vector, of which the first one will typically be the p-value from the test — see creer.Mp for details.

Such functions are provided for several common situations, see references at the end of this manual page.

log

If TRUE, values in the columns are assumed to be log-transformed, and consequently ratios are computed as differences of the columns. The result is in the log scale.

If FALSE, values are assumed to be raw data and ratios are computed directly.

en.log

If TRUE, the ratio will be log-transformed before applying the hypothesis test computed by f.p. Don't change the default unless you really know what you are doing.

nom.var

A length-one character vector giving the name of the variable containing a single ratio (or log-ratio). No sanity check is performed on it: if you experience strange behaviour, check you gave a valid column name, for instance using make.names.

noms.colonnes

A length-three character vector giving the names of, respectively, the two columns of the data frame that will contain the components identifiers and of the column that will contain the p-value from the test (the first value returned by f.p).

add.col

A character vector giving the names of additional columns of the data.frame, used for storing additional return values of f.p (all but the first one).

n.coeurs

The number of CPU cores to use in computation, with parallelization using forks (does not work on Windows) with the help of the parallel package.

...

additional arguments to f.p, passed unchanged to it.

Details

This function constructs a data.frame with n\times (n-1)/2 rows, where n = length( noms ) (after eventually removing names in noms that do not correspond to numeric variables). Each line of the data.frame is the result of the f.p function when applied on the ratio of variables whose names are given in the first two columns (or on its log, if either (log == TRUE) && (en.log == FALSE) or (log == FALSE) && (en.log == TRUE)).

Value

These function returns the data.frame obtained as described above.

Note

Creating a data.frame seems slightly less efficient (in terms of speed) than creating a dense matrix, so for compositionnal data with only a few components and simple stastitical analysis were only a single p-value is needed, consider using creer.Mp instead.

Author(s)

Emmanuel Curis (emmanuel.curis@parisdescartes.fr)

See Also

Predefined f.p functions: anva1.fpc for one-way analysis of variance; kw.fpc for the non-parametric equivalent (Kruskal-Wallis test).

grf.DFp to create a graphe from the obtained matrix.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
   # load the Circadian Genes Expression dataset, at day 4
   data( "BpLi_J4" )
   ng <- names( BpLi_J4 )[ -c( 1:3 ) ] # Name of the genes
   
   # analysis function (complex design)
   #   1. the formula to be used
   frm <- R ~  (1 | Patient) + Phenotype + Li + Phenotype:Li

   #   2. the function itself
   #    needs the lme4 package
   if ( TRUE == require( "lme4" ) ) {
     f.p <- function( d, variable, ... ) {
          # Fit the model
          md <- lmer( frm, data = d )

          # Get coefficients and standard errors
          cf <- fixef( md )
          se <- sqrt( diag( vcov( md ) ) )

          # Wald tests on these coefficients
          p <- 2 * pnorm( -abs( cf ) / se )

          # Sending back the 4 p-values
          p
     }
 
     # CRAN does not like 'long' computations
     # => analyse only the first 6 genes
     #  (remove for real exemple!)
     ng <- ng[ 1:6 ]

     # Create the data.frame with all results
     DF.p <- creer.DFp( d = BpLi_J4, noms = ng,
                        f.p = f.p, add.col = c( 'p.NR', 'p.Li', 'p.I' ) )

     # Make a graphe from it and plot it
     #  for the interaction term, at the p = 0.2 threshold
     plot( grf.DFp( DF.p, p = 0.20, col.p = 'p.I' ) )
  }

SARP.compo documentation built on May 16, 2021, 1:06 a.m.