assemble.data: Assemble the data to run the integrated analysis

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/assemble.data.R

Description

Assembles the dependent and independent data and annotation of the both data sets.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
assemble.data(dep.data, 
		indep.data,		
		dep.id = "ID", 
		dep.chr = "CHROMOSOME", 
		dep.pos = "STARTPOS",
		dep.ann = NULL,		
		dep.symb,  
		indep.id = "ID", 
		indep.chr = "CHROMOSOME", 
		indep.pos = "STARTPOS",
		indep.ann = NULL,
		indep.symb,
		overwrite = FALSE, 
		run.name = "analysis_results")

Arguments

dep.data

The dependent data (data.frame), along with annotations. Each row should correspond to one feature. The following columns are expected to exist, and the column names should be inserted in the function. dep.id: A unique identifier. dep.chr: The number of the chromosome (1,2, ..., 22, X, Y). dep.pos: The base pair position, relative to the chromosome. dep.symb: Gene symbol (optional). dep.ann: Annotation can be multiple columns.

indep.data

data.frame The independent data, along with annotations. Each row should correspond to one feature. The following columns are expected to exist, and the column names should be inserted in the function. indep.id: A unique identifier. indep.chr: The number of the chromosome (1,2, ..., 22, X, Y). indep.pos: The base pair position, relative to the chromosome. indep.symb: Gene symbol (optional). indep.ann: Annotation can be multiple columns.

dep.ann

vector with either the names of the columns or the column numbers in the dependent data that contain the annotation.

indep.ann

vector with either the names of the columns or the column numbers in the independent data that contain the annotation.

dep.id

vector with the column name in the dependent data that contains the ID. Will be used in the sim.plot.zscore.heatmap function. Empty ID's will be substituted by NA.

dep.chr

vector with column name in the dependent data that contains the chromosome numbers.

dep.pos

vector with the column name in the dependent data that contains the position on the chromosome in bases.

dep.symb

Optional, either missing or a single vector with the column name in the dependent data that contains the symbols. Will be used in sim.plot.zscore.heatmap as label.

indep.id

vector with the column name in the independent data that contains the ID. Will be used in the sim.plot.zscore.heatmap function. Empty ID's will be substituted by NA.

indep.chr

vector with the column name in the independent data that contains the chromosome numbers.

indep.pos

vector with the column name in the independent data that contains the position on the chromosome in bases.

indep.symb

Optional, either missing or a vector with the column name in the dependent data that contains the Symbols. Will be used in sim.plot.zscore.heatmap as label.

overwrite

logical, indicate when a run.name is already present, the results can be overwritten.

run.name

Name of the analysis. The results will be stored in a folder with this name in the current working directory (use getwd() to print the current working directory). If the missing, the default folder "analysis_results" will be generated.

Details

Based on the chromosome and probe position an absolute position is calculated according to chromosome number * 1e9 + probe position. Chromosome column is converted to factor and releveled according to the levels of the chrom.table, so the only levels allowed are c(1:22, "X", "Y"). Currently only human genome support without mitochondrial DNA.

Value

No values are returned. Instead, the datasets and annotation columns are stored in separate files in the data folder in the directory specified in run.name. If assemble.data has run succesfully, the integrated.analysis function can be performed.

Author(s)

Marten Boetzer, Melle Sieswerda, Renee X. de Menezes R.X.Menezes@lumc.nl

See Also

SIM, integrated.analysis

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Generate datasets and the samples to run the integrated analysis
set.seed(53245)
ngenes <- 100
nsamples <- 100
# generate copy number measurements
x <- matrix(rnorm(n = ngenes*nsamples), nrow = ngenes, ncol = nsamples)
# add mean shift effect for half of the samples, copy gain for 2nd half of the genes
x[ seq_len(ngenes/2), seq_len(nsamples/2)] <- x[ seq_len(ngenes/2), seq_len(nsamples/2)] + 2
# generate gene expression with normal distribution and mean equal to gene copy number
y <- rnorm(n = ngenes*nsamples, mean = matrix(x, nrow = ngenes*nsamples, ncol = 1), sd = 0.8)
y <-  matrix(y, nrow = ngenes, ncol = nsamples)
samples <- paste0("S", seq_len(nsamples))
colnames(x) <- colnames(y) <- samples
# Making data objects 
acgh.data <- data.frame(ID = paste0("G", seq_len(ngenes)),
                     CHROMOSOME = rep(1, ngenes),
                     STARTPOS = seq_len(ngenes)*12*10^5,
                     Symbol = paste0("Gene", seq_len(ngenes)),
                     x)
expr.data <- data.frame(ID = paste0("G", seq_len(ngenes)),
                        CHROMOSOME = rep(1, ngenes),
                        STARTPOS = seq_len(ngenes)*12*10^5,
                        Symbol = paste0("Gene", seq_len(ngenes)),
                        y)

#assemble the data
assemble.data(dep.data = acgh.data, 
              indep.data = expr.data,
              dep.ann = colnames(acgh.data)[1:4], 
              indep.ann = colnames(expr.data)[1:4], 
              dep.id="ID", 
              dep.chr = "CHROMOSOME",
              dep.pos = "STARTPOS",
              dep.symb="Symbol",  
              indep.id="ID",
              indep.chr = "CHROMOSOME", 
              indep.pos = "STARTPOS", 
              indep.symb="Symbol", 
              overwrite = TRUE,
              run.name = "chr1p")

SIM documentation built on Nov. 8, 2020, 4:58 p.m.