GOZDataSet: Create an object for outstanding genomic zone analysis
In GenomicOZone: Delineate outstanding genomic zones of differential gene activity

Description Usage Arguments Details Value References Examples

View source: R/MD_main.R

The function prepares an object for outstanding genomic zone analysis. It integrates data, annotation, and analysis parameters into the object and performs additional check on data integrity.

  GOZDataSet(data, colData, design,
             clustering.method = "1C",
             rowData.GRanges = NULL,
             ks = NULL,
             genome = NULL,
             ensembl.mirror = "www",
             gene.ID.type = NULL,
             ncores = 1)

`data`	a numerical `matrix` of gene activity data. Rows represent genes. Columns represent samples. The activity data can be gene expression, methylation beta value, copy number variation segment mean, or other gene-based omic data. Row and column names of the matrix must be specified.
`colData`	a `dataframe` of sample information. The first column must be sample names, corresponding to columns in the `data` matrix. Each additional column must contain at least two experimental conditions necessary for differential zone analysis.
`design`	a one-sided `formula` with only right-hand side variables. Only one variable is supported in this version. The formula describes on which variable in `colData` to apply differential zone analysis.
`clustering.method`	a character string. An option to choose either using `"1C"` to accumulate all channels of weight into one channel, or using `"MC"` to allow multi channel of weight in the clustering. Default is "1D".
`rowData.GRanges`	an optional genome annotation of `GRanges` class. Rows of `rowData.GRanges` correspond to rows of the `data` matrix. Row names of `rowData.GRanges` must be consistent with row names of `data`. Their orders are not necessarily the same. Only annotated genes in `data` will be used in genomic zone analysis. One of `rowData.GRanges` and `genome` must be specified.
`ks`	an optional numerical vector to specify the number of zones to divide each chromosome into. The names of the `ks` vector must be chromosome names. It is only used with user-specified `rowData.GRanges`. The seqlevels of `rowData.GRanges` must have a corresponding name in `ks`. If not specified, an optimal k value (1~400) will be determined for each chromosome. 400 is equivalent to cluster the longest human chromosome into zones averaging wider than 1 million base pairs. Default is `NULL`.
`genome`	an optional value of `character` type to select a genome from biomaRt. Available genomes can be found in the "version" column of the available ensembl datasets from biomaRt database by calling `listDatasets(useMart("ensembl"))`. One of `rowData.GRanges` and `genome` must be specified.
`ensembl.mirror`	an optional Ensembl mirror server to connect to. It is used only when `genome` is not NULL. The options are `"www"`, `"uswest"`, `"useast"` and `"asia"`. Default is `"www"`.
`gene.ID.type`	an optional value of `character` type to specify a gene ID type. Options are `"hgnc_symbol"`, `"mgi_symbol"`, `"ensembl_gene_id"` and `"ensembl_transcript_id"`. Only these four types are allowed in this version. This parameter only works with user-specified `genome`. If unspecified, all four types would be evaluated to choose the best one.
`ncores`	an optional integer to specify the number of cores to use parallely in outstanding genomic zone analysis. Default is 1.

The function collects all the input information, checks requirement completeness and integrates the inputs into a list, in preparation for function GenomicOZone to perform outstanding zone analysis.

A genome annotation parameter of GRanges class \insertCitelawrence2013grangesGenomicOZone or a genome version must be assigned by the user. The annotation is used to sort genes by their genomic coordinates. The genome parameter is for function GenomicOZone to obtain genome annotation from the R package biomaRt \insertCitesmedley2015biomartGenomicOZone to access Ensembl annotation databases \insertCitezerbino2017ensemblGenomicOZone. Using rowData.GRanges is recommended over using genome.

A list object with all relevant information for oustanding genomic zone analysis. It will be expanded by further analysis.

\insertAllCited

  data <- matrix(c(1,5,2,6,5,1,6,2), ncol = 2, byrow = TRUE)
  rownames(data) <- paste("Gene", 1:4, sep='')
  colnames(data) <- paste("Sample", c(1:2), sep='')

  colData <- data.frame(Sample_name = paste("Sample", c(1:2), sep=''),
                        Condition = c("Cancer", "Normal"))

  design <- ~ Condition

  rowData.GRanges <- GRanges(seqnames = Rle(rep("chr1", 4)),
                             ranges = IRanges(start = c(1,2,3,4), end = c(5,6,7,8)))
  names(rowData.GRanges) <- paste("Gene", 1:4, sep='')

  ks <- c(2)
  names(ks) <- "chr1"

  GOZ.ds <- GOZDataSet(data, colData, design,
                       rowData.GRanges = rowData.GRanges,
                       ks = ks)