GOZDataSet: Create an object for outstanding genomic zone analysis

Description Usage Arguments Details Value References Examples

View source: R/MD_main.R

Description

The function prepares an object for outstanding genomic zone analysis. It integrates data, annotation, and analysis parameters into the object and performs additional check on data integrity.

Usage

1
2
3
4
5
6
7
8
  GOZDataSet(data, colData, design,
             clustering.method = "1C",
             rowData.GRanges = NULL,
             ks = NULL,
             genome = NULL,
             ensembl.mirror = "www",
             gene.ID.type = NULL,
             ncores = 1)

Arguments

data

a numerical matrix of gene activity data. Rows represent genes. Columns represent samples. The activity data can be gene expression, methylation beta value, copy number variation segment mean, or other gene-based omic data. Row and column names of the matrix must be specified.

colData

a dataframe of sample information. The first column must be sample names, corresponding to columns in the data matrix. Each additional column must contain at least two experimental conditions necessary for differential zone analysis.

design

a one-sided formula with only right-hand side variables. Only one variable is supported in this version. The formula describes on which variable in colData to apply differential zone analysis.

clustering.method

a character string. An option to choose either using "1C" to accumulate all channels of weight into one channel, or using "MC" to allow multi channel of weight in the clustering. Default is "1D".

rowData.GRanges

an optional genome annotation of GRanges class. Rows of rowData.GRanges correspond to rows of the data matrix. Row names of rowData.GRanges must be consistent with row names of data. Their orders are not necessarily the same. Only annotated genes in data will be used in genomic zone analysis. One of rowData.GRanges and genome must be specified.

ks

an optional numerical vector to specify the number of zones to divide each chromosome into. The names of the ks vector must be chromosome names. It is only used with user-specified rowData.GRanges. The seqlevels of rowData.GRanges must have a corresponding name in ks. If not specified, an optimal k value (1~400) will be determined for each chromosome. 400 is equivalent to cluster the longest human chromosome into zones averaging wider than 1 million base pairs. Default is NULL.

genome

an optional value of character type to select a genome from biomaRt. Available genomes can be found in the "version" column of the available ensembl datasets from biomaRt database by calling listDatasets(useMart("ensembl")). One of rowData.GRanges and genome must be specified.

ensembl.mirror

an optional Ensembl mirror server to connect to. It is used only when genome is not NULL. The options are "www", "uswest", "useast" and "asia". Default is "www".

gene.ID.type

an optional value of character type to specify a gene ID type. Options are "hgnc_symbol", "mgi_symbol", "ensembl_gene_id" and "ensembl_transcript_id". Only these four types are allowed in this version. This parameter only works with user-specified genome. If unspecified, all four types would be evaluated to choose the best one.

ncores

an optional integer to specify the number of cores to use parallely in outstanding genomic zone analysis. Default is 1.

Details

The function collects all the input information, checks requirement completeness and integrates the inputs into a list, in preparation for function GenomicOZone to perform outstanding zone analysis.

A genome annotation parameter of GRanges class \insertCitelawrence2013grangesGenomicOZone or a genome version must be assigned by the user. The annotation is used to sort genes by their genomic coordinates. The genome parameter is for function GenomicOZone to obtain genome annotation from the R package biomaRt \insertCitesmedley2015biomartGenomicOZone to access Ensembl annotation databases \insertCitezerbino2017ensemblGenomicOZone. Using rowData.GRanges is recommended over using genome.

Value

A list object with all relevant information for oustanding genomic zone analysis. It will be expanded by further analysis.

References

\insertAllCited

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
  data <- matrix(c(1,5,2,6,5,1,6,2), ncol = 2, byrow = TRUE)
  rownames(data) <- paste("Gene", 1:4, sep='')
  colnames(data) <- paste("Sample", c(1:2), sep='')

  colData <- data.frame(Sample_name = paste("Sample", c(1:2), sep=''),
                        Condition = c("Cancer", "Normal"))

  design <- ~ Condition

  rowData.GRanges <- GRanges(seqnames = Rle(rep("chr1", 4)),
                             ranges = IRanges(start = c(1,2,3,4), end = c(5,6,7,8)))
  names(rowData.GRanges) <- paste("Gene", 1:4, sep='')

  ks <- c(2)
  names(ks) <- "chr1"

  GOZ.ds <- GOZDataSet(data, colData, design,
                       rowData.GRanges = rowData.GRanges,
                       ks = ks)

GenomicOZone documentation built on Nov. 8, 2020, 6:01 p.m.