DaMiRseq: Data Mining for RNA-seq data: normalization, feature selection and classification

Documented in DaMiR.goldenDice DaMiR.makeSE DaMiR.transpose

#' @title Import RNA-Seq count data and variables
#'
#' @description This is an helper function that allows the user to
#' simultaneously
#'  import counts, class (mandatory) and
#'  variables (optional) data, and creates a \code{SummarizedExperiment}
#'  object.
#'
#' @param x  A tab-delimited file which contains
#'  RNA-Seq count data. Each row is a feature (i.e. gene, transcript,
#'  exon etc.)
#'  and each column is a sample
#' @param y A tab-delimited file which contains experiment information.
#' Each row is a sample and each column is a variable. This file must
#' contain
#' at least one column which represent 'class' information for data
#'  adjustment
#' and classification; the class column must be labeled as 'class'
#'
#' @details Before creating a \code{SummarizedExperiment} object, the
#' function performs some checks on input data to ensure that only a
#' matrix
#' of raw counts is accordingly loaded. Other checks allows the
#' identification of missing data (NA) in the data frame of the variables
#'  of
#' interest.
#'
#' @return A \code{SummarizedExperiment} object containing raw counts,
#' class and (optionally) variables of interest.
#'
#'
#' @references Morgan M, Obenchain V, Hester J and Pag\`es H (2016).
#' SummarizedExperiment: SummarizedExperiment container.
#'  R package version 1.4.0.
#'
#' @author Mattia Chiesa, Luca Piacentini
#'
#' @examples
#' rawdata.path <- system.file(package = "DaMiRseq","extdata")
#' # import tab-delimited files:
#' # sample data are a small subset of Genotype-Tissue Expression (GTEx)
#' # RNA-Seq database (dbGap Study Accession: phs000424.v6.p1):
#' count_data <- read.delim(file.path(rawdata.path, "counts_import.txt"))
#' variables_data <- read.delim(file.path(rawdata.path, "annotation_import.txt"))
#' # create a SummarizedExperiment object:
#' SE <- DaMiR.makeSE(count_data, variables_data)
#' print(SE)
#'
#' @seealso
#' \code{\link{SummarizedExperiment}}
#'
#' @export
#'
#'

DaMiR.makeSE<-function(x, y) {
  if ( !is.numeric(as.matrix(x)))
    stop( "the count data is not numeric" )
  if (!isTRUE(all(x == floor(x))))
    stop("Counts must only be positive integer values!")
  if (!isTRUE(all(x >= 0)))
    stop("Counts must be positive integers!")
  if (any( is.na(x)))
    stop( "'NA' values are not allowed in the count matrix" )
  if (!isTRUE("class" %in% colnames(y)))
    stop("'class' info is lacking!
         Include the variable 'class'
         in the 'y' data.frame and label it 'class'!")
  if (length(which(is.na(y)>0)))
    warning("There are some missing data, i.e.'NA'.
            variables with 'NA' will be not used to draw diagnostic plots.
            Consider to impute NAs for the variables of interest.")
  if (!(identical(colnames(x), rownames(y))))
    stop("colnames of raw counts table must equal
         rownames of variables data frame")

  data<-SummarizedExperiment(assays=as.matrix(x), colData=as.data.frame(y))
  cat("Your dataset has:","\n")
  cat(dim(data)[1],"Features;","\n")
  str_classes <- ""
  num_samp <- ""
  name_class <- ""
  for (i in seq_len(length(levels(data@colData$class)))){
    num_samp[i] <- length(which(levels(
      data@colData$class)[i]==data@colData$class))
    name_class[i] <- levels(data@colData$class)[i]
  }
  for (i in seq_len(length(levels(data@colData$class)))){
    str_classes <- paste(str_classes, num_samp[i], name_class[i])
  }
  cat(dim(data)[2],"Samples, divided in:", str_classes, "\n")
  cat(dim(data@colData)[2],"variables:", colnames(data@colData),";",
      "\n","'class' included.","\n")
  return(data)
}

#' @title Generate a Number to Set Seed
#'
#' @description This function implements a formula based on current
#' date and time.
#'
#' @return An integer number.
#'
#' @details The number is generated by combining current seconds (S),
#' minutes (Mi), hours (H), days (D), months (Mo), years (Y) and golden
#' ratio (\eqn{\phi}), in the form:
#'
#' \deqn{ Num = (S * Mi + H * D * Mo / D) ^ \phi}
#'
#' @author Mattia Chiesa, Luca Piacentini
#'
#' @examples
#' gen_numb <- DaMiR.goldenDice()
#' set.seed(gen_numb)
#'
#' @export
#'
#'
DaMiR.goldenDice <- function(){

  golden_ratio <- 1.6180339887
  golden_dice <-round((
    second(Sys.time())*minute(Sys.time()) +
      hour(Sys.time())*day(Sys.time())*month(Sys.time())/year(Sys.time())
    )^golden_ratio)

  return(golden_dice)
}

#' @title Matrix transposition and replacement of '.' and '-' special
#'  characters
#'
#' @description This function transposes matrix and replaces '.' and '-'
#'  special characters.
#'
#' @param data Matrix of normalized expression data, i.e. transformed
#' counts by vst or rlog.
#' A log2 transformed expression matrix is also accepted
#'
#' @return Normalized matrix in which each row is a sample and each
#' column is a feature
#'
#' @author Mattia Chiesa, Luca Piacentini
#'
#' @examples
#' data(data_norm)
#' data.transposed <- DaMiR.transpose(assay(data_norm))
#'
#' @export
#'
#'
DaMiR.transpose <- function(data){
  # check arguments
  if (missing(data)) stop("'data' argument must be provided")
  if(!(is.matrix(data))) stop("'data' must be a matrix")

  # check the presence of NA or Inf
  if (any(is.na(data)))
    stop("NA values are not allowed in the 'data' matrix")
  if (any(is.infinite(data)))
    stop("Inf values are not allowed in the 'data' matrix")
  # specific checks
  if (all((data %%1) == 0))
    warning("It seems that you are using raw counts!
            This function works with normalized data")


  data <- as.data.frame(t(data))
  colnames(data) <- gsub(".","__",colnames(data), fixed = TRUE)
  colnames(data) <- gsub("-","_",colnames(data), fixed = TRUE)
  return(data)
}

BioinfoMonzino/DaMiRseq documentation built on Aug. 22, 2021, 3:11 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BioinfoMonzino/DaMiRseq
Data Mining for RNA-seq data: normalization, feature selection and classification

R/helper.R
In BioinfoMonzino/DaMiRseq: Data Mining for RNA-seq data: normalization, feature selection and classification

Defines functions DaMiR.transpose DaMiR.goldenDice DaMiR.makeSE

Documented in DaMiR.goldenDice DaMiR.makeSE DaMiR.transpose

R Package Documentation

Browse R Packages

We want your feedback!

BioinfoMonzino/DaMiRseq Data Mining for RNA-seq data: normalization, feature selection and classification

R/helper.R In BioinfoMonzino/DaMiRseq: Data Mining for RNA-seq data: normalization, feature selection and classification

Defines functions DaMiR.transpose DaMiR.goldenDice DaMiR.makeSE

Documented in DaMiR.goldenDice DaMiR.makeSE DaMiR.transpose

R Package Documentation

Browse R Packages

We want your feedback!

BioinfoMonzino/DaMiRseq
Data Mining for RNA-seq data: normalization, feature selection and classification

R/helper.R
In BioinfoMonzino/DaMiRseq: Data Mining for RNA-seq data: normalization, feature selection and classification