geva.merge.input: GEVA Input Processing and Merge

Description Usage Arguments Details Value Note Examples

View source: R/input.R

Description

Functions to read, load, and concatenate the experimental comparisons from the data input. This is the initial step to proceed with any GEVA analysis.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
geva.merge.input(
  ...,
  col.values = "logFC",
  col.pvals = "adj.P.Val",
  col.other = NULL
)

geva.read.tables(
  filenames = NULL,
  dirname = ".",
  col.values = "logFC",
  col.pvals = "adj.P.Val",
  col.other = NULL,
  ...,
  files.pattern = "\\.txt$",
  p.value.cutoff = 0.05,
  read.args = list()
)

Arguments

...

multiple matrix or data.frame objects. At least two arguments are required for geva.merge.input, but it's optional for geva.read.tables. The optional arguments in geva.read.tables are also passed to its internal call to geva.merge.input and geva.input.filter.
In addition, the following optional arguments are accepted:

  • na.val : (numeric) value between 0 and 1 used as replacement when a p-value column is not present (default is NA)

  • use.regex : (logical) whether to match the column names using regular expressions (default is FALSE)

  • verbose : (logical) whether to print the current loading and merge progress (default is TRUE)

col.values

character vector, possible name(s) to match the logFC column(s) from each table

col.pvals

character vector, possible name(s) to match the p-value column(s) from each table

col.other

character vector, name(s) to match additional columns (e.g., gene symbols). Ignored if NULL

filenames

character vector with two or more file paths

dirname

single character, base directory containing the input files. Ignored if filenames is specified

files.pattern

single character, pattern used to filter the files inside dirname. Ignored if filenames is specified

p.value.cutoff

numeric (0 to 1), initial p-value threshold. Rows entirely composed by p-values above this cutoff (i.e., no significant logFC) are removed after the final merge. Ignored if NA or NULL

read.args

list of additional arguments passed to utils::read.table

Details

The geva.merge.input function takes multiple tables as arguments (e.g., matrix or data.frame objects), extracts the logFC columns from each table and merges them into a single GEVAInput dataset.

The column names are specified in the col.values and col.pvals arguments (character) and must correctly match the column names for logFC and p-value columns, respectively, in the inputs to be extracted. Multiple values for column names can also be specified as valid name possibilities if they differ among the tables.

The function geva.merge.input reads multiple tab-delimited text files containing, extracts the logFC columns from each table and merges into a single GEVAInput dataset.

Value

A GEVAInput object

Note

The inclusion of p-value columns is not technically required, but strongly recommended as they improve the statistical accuracy in the summarization steps. If the p-value (or adjusted p-value) columns are present, their values are converted to weights by applying 1 - pvalue for each pvalue element, otherwise an optional na.val optional argument can specified as replacement to the absent values (default is NA). Weights are used to accomodate the central logFC values towards the most significant observations and penalize potential statistical innacuracies.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
### EXAMPLE 1
## geva.merge.input example with three randomly generated tables
## (For demonstration purposes only)

# Number of rows
n <- 10000

# Random row (probe) names
probnms <- sprintf("PROBE_%s", 1:n)

# Random gene names (optional)
genenms <- paste0(sprintf("GENE_%s", 1:n), LETTERS[1:n %% (length(LETTERS)+1)])

# Random table 1
dt1 <- data.frame(row.names=probnms,
                  logfc=(rnorm(n, 0, sd=2) * rnorm(n, 0, sd=0.5)),
                  pvalues = runif(n, max=0.08),
                  genesymbol = genenms)
# Random table 2
dt2 <- data.frame(row.names=probnms,
                  logfc=(rnorm(n, 0, sd=2) * rnorm(n, 0, sd=0.5)),
                  pvalues = runif(n, max=0.08),
                  genesymbol = genenms)
# Random table 3
dt3 <- data.frame(row.names=probnms,
                  logfc=(rnorm(n, 0, sd=2) * rnorm(n, 0, sd=0.5)),
                  pvalues = runif(n, max=0.08),
                  genesymbol = genenms)

# Merges the three tables
ginput <- geva.merge.input(exp1=dt1, exp2=dt2, exp3=dt3,
                           col.values="logfc",
                           col.pvals="pvalues",
                           col.other="genesymbol")

# Prints the first rows from the merged table
print(head(ginput))               # values
print(head(inputweights(ginput))) # weights

# ---
## Not run: 

### EXAMPLE 2
## geva.read.tables example with three tab-delimited files

# Table file examples. Each one has 3 columns: "logfc", "pvalues", and "genesymbol"
# Replace it with your tab-delimited files (e.g. exported from limma's topTable)
fnames <- c("dt1.txt", "dt2.txt", "dt3.txt")

ginput <- geva.read.tables(fnames,
                           col.values="logfc",
                           col.pvals="pvalues",
                           col.other="genesymbol")

# Prints the first rows from the merged table
print(head(ginput))               # values
print(head(inputweights(ginput))) # weights


# ---

### EXAMPLE 3
## geva.read.tables example with tab-delimited files in a directory

# Directory name (replace it with a directory containing the table files)
dirnm <- "C:/User/robertplant123/Documents/R/gevaexamples"

# In this example, table files contain 3 columns: "logfc", "pvalues", and "genesymbol"
# Reads all txt files in the directory
ginput <- geva.read.tables(dirname=dirnm,
                           col.values="logfc",
                           col.pvals="pvalues",
                           col.other="genesymbol")

# (Optional step)
# Let's assume that all table file names start with "dt" and ends with the ".txt" extension,
# such as dt1.txt, dt2.txt and so on...
fname_pattern <- c("^dt.+?\\.txt$")  # Defines a RegEx pattern to find the files
# Loads only files that match the file name pattern
ginput <- geva.read.tables(dirname=dirnm,
                           files.pattern=fname_pattern,
                           col.values="logfc",
                           col.pvals="pvalues",
                           col.other="genesymbol")

# Prints the first rows from the merged table
print(head(ginput))               # values
print(head(inputweights(ginput))) # weights

## End(Not run)

sbcblab/geva documentation built on March 15, 2021, 10:08 p.m.