A wrapper for DEMI analysis

Description

Function demi is a wrapper for the whole DEMI analysis. First it creates a DEMIExperiment object, then uses it to create a DEMIClust object that contains the list of clustered probes and then performs differential expression analysis by running the function DEMIDiff that creates DEMIDiff object. The latter contains the results of the differential expression analysis. It also prints out the results to the working directory. If parameter pathway is set to TRUE, it also performs gene ontology analysis on the results in DEMIDiff object to determine statistically significant gene ontology categories (it also prints out those in the working directory with the file containing the string 'pathway'). It then returns a list containing the DEMIExperiment object where the results have been attached to and a data.frame that contains the functional annotation analysis results. NB! The results will be printed out in the working directory.

Usage

1
2
3
4
5
6
demi(analysis = "transcript", celpath = character(),
  experiment = character(), organism = character(), maxtargets = 0,
  maxprobes = character(), pmsize = 25, sectionsize = character(),
  group = character(), norm.method = norm.rrank, filetag = character(),
  cluster = list(), clust.method = function() { }, cutoff.pvalue = 0.05,
  pathway = logical())

Arguments

analysis

A character. Defines the analysis type. It can be either 'transcript', 'gene', 'exon' or 'genome'. The default value is 'transcript'. For 'genome' analysis sectionsize parameter needs to be defined as well.

celpath

A character. It can point to the directory containing CEL files or is a vector that points directly to the CEL files.

experiment

A character. A custom name of the experiment defined by the user (e.g. 'myexperiment').

organism

A character. The name of the species the micrroarrays are measuring (e.g. 'homo_sapiens' or 'mus_musculus') given in lowercase and words separated by underscore.

maxtargets

A numeric. The maximum number of allowed targets (e.g. genes or transcripts) one probe can match against. If to set it to 1 it means that the probe can match only one gene. If the analysis is set to 'transcript' the program still calculates the number of matches on genes. Hence a probe matching two transcripts on the same gene would be included but a probe matching two transcripts on different genes would not be included. The value needs to be a positive integer or 0. By default maxtargets is set to 0.

maxprobes

A character. Sets the number of unique probes a target is allowed to have a match against. All the targets that yield more alignments to different probes then set by maxprobes will be scaled down to the number defined by the maxprobes parameter. It can be either a positive integer or set as 'median' or 'max' - 'median' meaning the median number of probes matching to all targets and 'max' meaning the maximum number of probes matching to a target. By default maxprobes is not set which is the same as setting maxprobes to 'max'.

pmsize

A numeric. The minimum number of consecutive nucleotides that need to match perfectly against the target sequence. It can be either 23, 24 or 25. This means that alignments with smaller perfect match size will not be included in the experiment set up. The default value is 25.

sectionsize

A numeric. This is only used if the analysis parameter is set to 'genome'. It defines the length of the genomic target region used in the 'genome' analysis.

group

A character. Defines the groups that are used for clustering (e.g 'group = c("test", "control")'). It uses grep function to locate the group names from the CEL file names and then builds index vectors determining which files belong to which groups.

norm.method

A function. Defines a function used to normalize the raw expression values. The default normalization function is norm.rank.

filetag

A character. This is a custom string that can be used to identify the experiment. It incorporates it to the names of the output files.

cluster

A list. Holds the probes of different clusters in a list.

clust.method

A function. Defines the function used for clustering. The user can build a custom clustering function. The input of the custom function needs to be a DEMIClust object and the output is a list of probes, where each list corresponds to a specific cluster. The default function is demi.wilcox.test that implements the wilcox.test function. However we recommend to use the function demi.wilcox.test.fast that uses a custom wilcox.test and runs a lot faster.

cutoff.pvalue

A numeric. Sets the cut-off p-value used for determining statistical significance of the probes when clustering the probes into clusters.

pathway

A logical. If set to TRUE the functional annotation analysis is done on top of differential expression analysis.

Details

Instead of automatically clustered probes DEMIClust object can use user defined lists of probes for later calculation of differential expression. This is done by setting the cluster parameter. It overrides the default behaviour and no actual clustering occurs. Instead the list of probes defined in the cluster parameter are considered as already clustered probes. The list needs to contain proper names for probe vectors so that they would be recognizable later. Also instead of using the default clustering method the user can write his/her own function for clustering probes based on the expression values.

Further specification of the parameters:

  • maxtargets When analysis is set to 'gene' then all probes that match to more genes then allowed by maxtargets parameter will not be included in the analysis. For 'transcript' and 'exon' analysis the number is also calculated on a gene level. For example if maxtargets is set to one and a probe matches to two transcripts but on the same gene, then this probe will still be used in the analysis. However if the probe matches two transcripts on different genes then this probe will not be included in the analysis. For 'genome' analysis the probe in most cases matches to two genomic sections because adjacent sections overlap by 50 probe will still be used in the analysis.

  • norm.method Every user can apply their own normalization method by writing a custom normalization function. The function should take in raw expression matrix and return the normalized expression matrix where probe ID's are kept as rownames and column names are CEL file names. The normalized expression matrix will then be stored as part of the DEMIExperiment object.

  • sectionsize The sectionsize parameter defines the length of the genomic target region. Currenlty sectionsize can be set as: 100000, 500000 and 1000000. All adjacent sections, except the ones on chromosome ends, overlap with the next adjacent section by 50 genomic section. This parameter is required when analysis is set to 'genome'.

  • group All the CEL files used in the analysis need to contain at least one of the names specified in the group parameter because they determine what groups to compare against each other. It is also a good practice to name the CEL files to include their common features. However if a situation arises where the group/feature name occurs in all filenames then the user can set group names with specific filenames by seperating names in one group with the "|" symbol. For example group = c( "FILENAME1|FILENAME2|FILENAME3", "FILENAME4|FILENAME5|FILENAME6" ). These two groups are then used for clustering the probes expression values.

  • norm.method The norm.method defines a function to use for the normalization of raw expression matrix. The user can implement his/her own function for the normalization procedure. The function should take in raw expression matrix and return the normalized expression matrix where probe ID's are kept as rownames and column names are CEL file names.

  • clust.method The user can write his/her own function for clustering probes according to their expression values. The custom function should take DEMIClust object as the only parameter and output a list. The output list should contain the name of the clusters and the corresponding probe ID's. For example return( list( cluster1 = c(1:10), cluster2 = c(11:20), cluster3 = c(21:30) ).

  • cluster This parameter allows to calculate differential expression on user defined clusters of probe ID's. It needs to be a list of probe ID's where the list names correspond to the cluster names. For example list( cluster1 = c(1:10), cluster2(1:10) ). When using this approach you need to make sure that all the probe ID's given in the clusters are available in the analysis. Otherwise an error message will be produced and you need to remove those probes that have no alignment in the analysis. When setting this parameter the default behaviour will be overridden and no default clustering will be applied.

Value

A list containing the DEMIExperiment object where differential expression results have been added to and a data.frame consisting of the functional annotation analysis results.

Author(s)

Sten Ilmjarv

See Also

DEMIExperiment, DEMIClust, DEMIPathway, DEMIDiff, demi.wilcox.test.fast, wilcox.test

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
## Not run: 

# To use the example we need to download a subset of CEL files from
# http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9819 published
# by Pradervand et al. 2008.

# Set the destination folder where the downloaded files fill be located.
# It can be any folder of your choosing.
destfolder <- "demitest/testdata/"

# Download packed CEL files and change the names according to the feature
# they represent (for example to include UHR or BRAIN in them to denote the
# features).
# It is good practice to name the files according to their features which
# allows easier identification of the files later.

ftpaddress <- "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM247nnn"
download.file( paste( ftpaddress, "GSM247694/suppl/GSM247694.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "UHR01_GSM247694.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247695/suppl/GSM247695.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "UHR02_GSM247695.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247698/suppl/GSM247698.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "UHR03_GSM247698.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247699/suppl/GSM247699.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "UHR04_GSM247699.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247696/suppl/GSM247696.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "BRAIN01_GSM247696.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247697/suppl/GSM247697.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "BRAIN02_GSM247697.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247700/suppl/GSM247700.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "BRAIN03_GSM247700.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247701/suppl/GSM247701.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "BRAIN04_GSM247701.CEL.gz", sep = "" ) )

# We need the gunzip function (located in the R.utils package) to unpack the gz files.
# Also we will remove the original unpacked files for we won't need them.
library( R.utils )
for( i in list.files( destfolder ) ) {
	gunzip( paste( destfolder, i, sep = "" ), remove = TRUE )
}

# Now we can continue the example of the function demi

# Do DEMI analysis with functional annotation analysis
demires <- demi(analysis = 'gene', celpath = destfolder, group = c( "BRAIN", "UHR" ),
		experiment = 'myexperiment', organism = 'homo_sapiens',
		clust.method = demi.wilcox.test.fast, pathway = TRUE)

# Do DEMI analysis without functional annotation analysis
demires <- demi(analysis = 'gene', celpath = destfolder, group = c( "BRAIN", "UHR" ),
		experiment = 'myexperiment', organism = 'homo_sapiens',
		clust.method = demi.wilcox.test.fast, pathway = FALSE)

# Retrieve results from the created object
head( getResultTable( demires$experiment ) )


## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.