Creates a DEMIExperiment object

Share:

Description

This function creates a DEMIExperiment object. It loads and stores the experiment metadata such as annotation and alignment information and raw expression matrix from CEL files. It then normalizes the raw expression matrix and stores both expression matrices in a DEMICel object stored under the created DEMIExperiment object.

Usage

1
2
3
4
DEMIExperiment(analysis = "transcript", celpath = character(),
  experiment = character(), organism = character(), maxtargets = 0,
  maxprobes = character(), pmsize = 25, sectionsize = character(),
  norm.method = norm.rrank, filetag = character())

Arguments

analysis

A character. Defines the analysis type. It can be either 'transcript', 'gene', 'exon' or 'genome'. The default value is 'transcript'. For 'genome' analysis sectionsize parameter needs to be defined as well.

celpath

A character. It can point to the directory containing CEL files or is a vector that points directly to the CEL files.

experiment

A character. A custom name of the experiment defined by the user (e.g. 'myexperiment').

organism

A character. The name of the species the microarrays are measuring (e.g. 'homo_sapiens' or 'mus_musculus') given in lowercase letters and words are separated by underscore.

maxtargets

A numeric. The maximum number of allowed targets (e.g. genes or transcripts) one probe can have a match against. If to set it to 1 it means that the probe can match only one gene. If the analysis is set to 'transcript' the program still calculates the number of matches on genes, not transcripts. Hence a probe matching two transcripts on the same gene would be included but a probe matching two transcripts on different genes would not be included. The value needs to be a positive integer or 0. By default maxtargets is set to 0.

maxprobes

A character. Sets the number of unique probes a target is allowed to have a match against. All the targets that yield more alignments to different probes then set by maxprobes will be scaled down to the number defined by the maxprobes parameter. It can be either a positive integer or set as 'median' or 'max' - 'median' meaning the median number of probes matching to all targets and 'max' meaning the maximum number of probes matching to a target. By default maxprobes is not set which is the same as setting maxprobes to 'max'.

pmsize

A numeric. The minimum number of consecutive nucleotides that need to match perfectly against the target sequence. It can be either 23, 24 or 25. This means that alignments with smaller perfect match size will not be included in the experiment set up. The default value is 25.

sectionsize

A numeric. This is only used if the analysis parameter is set to 'genome'. It defines the length of the genomic target region used in the 'genome' analysis. Currently the only available section sizes are 100000, 500000 and 1000000.

norm.method

A function. Defines a function used to normalize the raw expression values. The default normalization function is norm.rank.

filetag

A character. This is a custom string that can be used to identify the experiment. At the current development stage this parameter is used only when using the function demi, where the output files will contain the specified filetag.

Details

After the analysis has been completed the user can add the results from the analysis to the original DEMIExperiment object with the function attachResult. Then the function getResultTable can be used to retrieve the results from the DEMIExperiment object. Other useful functions are getNormMatrix to retrieve normalized expression matrix and getCelMatrix to retrieve the raw expression matrix. In both cases the probe ID's are present as row names.

Further specification of the parameters:

  • maxtargets When analysis is set to 'gene' then all probes that match to more genes then allowed by maxtargets parameter will not be included in the analysis. For 'transcript' and 'exon' analysis the number is also calculated on a gene level. For example if maxtargets is set to one and a probe matches to two transcripts but on the same gene, then this probe will still be used in the analysis. However if the probe matches two transcripts on different genes then this probe will not be included in the analysis. For 'genome' analysis the probe in most cases matches to two genomic sections because adjacent sections overlap by 50 probe will still be used in the analysis.

  • norm.method Every user can apply their own normalization method by writing a custom normalization function. The function should take in raw expression matrix and return the normalized expression matrix where probe ID's are kept as rownames and column names are CEL file names. The normalized expression matrix will then be stored as part of the DEMIExperiment object.

  • sectionsize The sectionsize parameter defines the length of the genomic target region. Currenlty sectionsize can be set as: 100000, 500000 and 1000000. All adjacent sections, except the ones on chromosome ends, overlap with the next adjacent section by 50 genomic section. This parameter is required when analysis is set to 'genome'.

  • norm.method The norm.method defines a function to use for the normalization of raw expression matrix. The user can implement his/her own function for the normalization procedure. The function should take in raw expression matrix and return the normalized expression matrix where probe ID's are kept as rownames and column names are CEL file names.

Value

A DEMIExperiment object.

Author(s)

Sten Ilmjarv

See Also

DEMIClust, DEMIResult, getResultTable, getResult, attachResult

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
## Not run: 

# To use the example we need to download a subset of CEL files from
# http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9819 published
# by Pradervand et al. 2008.

# Set the destination folder where the downloaded files fill be located.
# It can be any folder of your choosing.
destfolder <- "demitest/testdata/"

# Download packed CEL files and change the names according to the feature
# they represent (for example to include UHR or BRAIN in them to denote the
# features).
# It is good practice to name the files according to their features which
# allows easier identification of the files later.

ftpaddress <- "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM247nnn"
download.file( paste( ftpaddress, "GSM247694/suppl/GSM247694.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "UHR01_GSM247694.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247695/suppl/GSM247695.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "UHR02_GSM247695.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247698/suppl/GSM247698.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "UHR03_GSM247698.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247699/suppl/GSM247699.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "UHR04_GSM247699.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247696/suppl/GSM247696.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "BRAIN01_GSM247696.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247697/suppl/GSM247697.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "BRAIN02_GSM247697.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247700/suppl/GSM247700.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "BRAIN03_GSM247700.CEL.gz", sep = "" ) )
download.file( paste( ftpaddress, "GSM247701/suppl/GSM247701.CEL.gz", sep = "/" ),
		destfile = paste( destfolder, "BRAIN04_GSM247701.CEL.gz", sep = "" ) )

# We need the gunzip function (located in the R.utils package) to unpack the gz files.
# Also we will remove the original unpacked files for we won't need them.
library( R.utils )
for( i in list.files( destfolder ) ) {
	gunzip( paste( destfolder, i, sep = "" ), remove = TRUE )
}

# Now we can continue the example of the function DEMIExperiment

# Basic experiment set up.
demiexp <- DEMIExperiment(analysis = 'gene', celpath = destfolder,
		experiment = 'myexperiment', organism = 'homo_sapiens')

# Run basic experiment set up but this time do 'transcript' analysis.
demiexp <- DEMIExperiment(analysis = 'transcript', celpath = destfolder,
		experiment = 'myexperiment', organism = 'homo_sapiens')

# Run basic experiment set up but this time do 'transcript' analysis.
demiexp <- DEMIExperiment(analysis = 'exon', celpath = destfolder,
		experiment = 'myexperiment', organism = 'homo_sapiens' )

# For genome analysis do not forget to specify the sectionsize parameter.
demiexp <- DEMIExperiment(analysis = 'genome', celpath = destfolder,
		experiment = 'myexperiment', organism = 'homo_sapiens', sectionsize = 500000)

# Specify experiment with specific pmsize; the standard length for Affymetrix microarray
# probes is 25 nucleotides.
demiexp <- DEMIExperiment(analysis = 'gene', celpath = destfolder,
		experiment = 'myexperiment', organism = 'homo_sapiens', pmsize = 23)

# Specify experiment by setting maxtargets to 1.
demiexp <- DEMIExperiment(analysis = 'gene', celpath = destfolder,
		experiment = 'myexperiment', organism = 'homo_sapiens', maxtargets = 1)

# Specify experiment by setting maxprobes to 'median'.
demiexp <- DEMIExperiment(analysis = 'gene', celpath = destfolder,
		experiment = 'myexperiment', organism = 'homo_sapiens', maxprobes = 'median')

# Retrieve the alignment information from the DEMIExperiment object.
head( getAlignment( demiexp ) )

# Retrieve the annotation information from the DEMIExperiment object.
head( getAnnotation( demiexp ) )

# Retrieve the raw expression matrix from the DEMIExperiment object.
head( getCelMatrix( demiexp ) )

# Retrieve the normalized expression matrix from the DEMIExperiment object.
head( getNormMatrix( demiexp ) )

#####################
# If the user has done the analysis and wishes to add the results to the original
# DEMIExperiment object.
#####################

# Create clusters with an optimized wilcoxon's rank sum test incorporated within demi that
# precalculates the probabilities.
demiclust <- DEMIClust( demiexp, group = c( "BRAIN", "UHR" ), clust.method = demi.wilcox.test.fast )
# Calcuate differential expression
demidiff <- DEMIDiff( demiclust )

# Attach the results to the original DEMIExperiment object
demiexp <- attachResult( demiexp, demidiff )

# Retrieve the results from the DEMIExperiment object
head( getResultTable( demiexp ) )


## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.