R/PrioSubtypeDrug.R
In SubtypeDrug: Prioritization of Candidate Cancer Subtype Specific Drugs

Documented in PrioSubtypeDrug

##' PrioSubtypeDrug
##'
##' @title Prioritization of candidate cancer subtype-specific drugs (PrioSubtypeDrug)
##' @description Integrating drug, gene, and subpathway data to identify drugs specific to cancer subtypes.
##' @param expr Matrix of gene expression values (rows are genes, columns are samples).
##' @param input.cls Input sample subtype class vector file in CLS format.
##' @param control.label In the CLS file of `input.cls`, the label of the control sample.
##' @param subpathway.list A list.  The subpathway list data is mined from KEGG data is stored in the package `SubtypeDrugData` and can be downloaded through the connection \url{https://github.com/hanjunwei-lab/SubtypeDrugData}.
##'  The gene tags included in the subpathway list data should be consistent with those in the gene expression profile. The package `SubtypeDrugData` provides two choices that include the Entrezid and Symbol tags of the gene.
##'  Users can also enter their own pathway or gene set list data.
##' @param spw.min.sz Removes subpathways that contain fewer genes than `spw.min.sz` (default: 10).
##' @param spw.max.sz Removes subpathways that contain more genes than `spw.max.sz` (default: Inf).
##' @param spw.score.method Method to employ in the estimation of subpathway
##' enrichment scores per sample. By default this is set to `gsva` (Hänzelmann
##' et al, 2013) and other options are `ssgsea` (Barbie et al, 2009).
##' @param kcdf Character string denoting the kernel to use during the
##' non-parametric estimation of the cumulative distribution function of
##' expression levels across samples when `spw.score.method="gsva"`. By default,
##' `kcdf="Gaussian"` which is suitable when input expression values are
##' continuous, such as microarray fluorescent units in logarithmic scale,
##' RNA-seq log-CPMs, log-RPKMs or log-TPMs. When input expression values are
##' integer counts, such as those derived from RNA-seq experiments, then this
##' argument should be set to `kcdf="Poisson"`.
##' @param drug.spw.data A list data of drug regulation. The drug subpathway association data we constructed is stored in package `SubtypeDrugData` and can be downloaded via connection \url{https://github.com/hanjunwei-lab/SubtypeDrugData}.
##' If the input is user-defined drug regulation data, the data should be a list data with each drug as its element. Each drug also contains `Target_upregulation` and `Target_downregulation` subpathway or gene set.
##' Subpathway or gene set contained in drug regulation data should exist in input data of parameter `subpathway.list`.
##' @param drug.spw.p.val.th Parameter used only when `drug.spw.data="DrugSpwData"`. According to the threshold of the significant P value
##' set by parameter `drug.spw.p.val.th` (default: 0.05), the drug up-regulation and down-regulatory subpathways were screened.
##' @param drug.spw.min.sz A numeric. The drug regulated subpathways intersects with the subpathways in the
##' subpathway activity profile. Then drugs with less than `drug.spw.min.sz` (default: 10) up- or down-regulated subpathways are removed.
##' @param drug.spw.max.sz A numeric. Similar to parameter `drug.spw.min.sz`, drugs with more
##' than `drug.spw.max.sz` (default: Inf) up- or down-regulated subpathways are removed.
##' @param weighted.drug.score A boolean values determines the method for
##' calculating the normalized drug-disease reverse association score of the drug for each sample.
##' `weighted.drug.score=TRUE` (default): KS random walk statistic with individualized subpathway activity aberrance score as weight was used to
##' calculate the normalized drug-disease reverse association score.
##' `weighted.drug.score=FALSE`: Similar to `CMap` (Lamb et al., 2006), no weight is needed, and the normalized drug-disease
##' reverse association score is calculated by the rank of the individualized subpathway activity aberrance score.
##' @param nperm Number of random permutations (default: 1000).
##' @param parallel.sz Number of processors to use when doing the calculations in
##' parallel (default value: 1). If parallel.sz=0,
##' then it will use all available core processors unless we set this argument
##' with a smaller number.
##' @param E_FDR Significance threshold for E_FDR for drugs (default: 0.05)
##' @param S_FDR Significance threshold for S_FDR for drugs (default: 0.001)
##' @details
##' First, the function PrioSubtypeDrug uses the `GSVA` or `ssgsea` method to
##' convert the disease gene expression profile into subpathway activity profile. Parameters `subpathway.list`, `spw.min.sz` and `spw.max.sz` are used
##' to process the subpathway list data. `spw.score.method` and `kcdf` are used to control the method of constructing the subpathway activity score
##' profile.
##'   Individualized subpathway activity aberrance score was estimated using the mean and standard deviation of the Control samples.
##' Subpathways of each cancer sample are ordered in a ranked list according to individualized subpathway activity aberrance score.
##' Next, we calculate the normalized drug-disease reverse association score by enriching drug regulated subpathway tags to the subpathway ranked list.
##' Finlly, all drug-regulated subpathways are enriched into each cancer sample to obtain a normalized drug-disease reverse association score matrix.
##' The `drug.spw.p.val.th`, `drug.spw.min.sz` and `drug.spw.max.sz` is used to screen the drug regulated subpathway set.
##'   If user-defined drug targeting data is used, drug regulated `Target_upregulation` and `Target_downregulation` should already be defined in the data.
##' The `weighted.drug.score` to control the method of calculating the normalized drug-disease reverse association score.
##'   Finally, empirical sample-based permutation test procedure to obtain significative cancer subtype specific drugs.
##' For samples containing only cancer and Control, the subpathways are ranked according to the difference in activity between cancer and Control samples.
##' Subsequently, the subpathway set of drug up- and down-regulated is enriched to the ranking list of subpathway to evaluate the normalized drug-disease reverse association score and
##' subpathway-based permutation test procedure to calculate significance.
##' The subpathway list data and drug subpathway associated data set is stored in package `SubtypeDrugData` and
##' can be obtained on \url{https://github.com/hanjunwei-lab/SubtypeDrugData}.
##' @return A list contains the result table of drug scoring and significance, a subpathway activity score matrix, a normalized drug-disease reverse
##' association score matrix, sample information, and user set parameter information.
##' @author Xudong Han,
##' Junwei Han,
##' Chonghui Liu
##' @examples
##' require(GSVA)
##' require(parallel)
##' ## Get simulated breast cancer gene expression profile data.
##' Geneexp<-get("Geneexp")
##' ## Obtain sample subtype data and calculate breast cancer subtype-specific drugs.
##' Subtype<-system.file("extdata", "Subtype_labels.cls", package = "SubtypeDrug")
##'
##' ## Subpathway list data and drug subpathway association data
##' ## were stored in packet `SubtypeDrugData`.
##' ## `SubtypeDrugData` has been uploaded to the github repository.
##' ## If subpathway list data and drug subpathway association data are needed,
##' ## users can download and install through `install_github` function and
##' ## set parameter url=""hanjunwei-lab/SubtypeDrugData".
##' ## After installing and loading package `SubtypeDrugData`,
##' ## users can use the following command to get the data.
##' ## Get subpathway list data.
##' ## If the gene expression profile contains gene Symbol.
##' ## data(SpwSymbolList)
##' ## If the gene expression profile contains gene Entrezid.
##' ## data(SpwEntrezidList)
##' ## Get drug subpathway association data.
##' ## data(DrugSpwData)
##'
##' ## Identify breast subtype-specific drugs.
##' ## Subtype_drugs<-PrioSubtypeDrug(Geneexp,Subtype,"Control",SpwSymbolList,drug.spw.data=DrugSpwData,
##' ##                                       E_FDR=1,S_FDR=1)
##'
##' ## Identify breast cancer-related drugs in only two types of samples: breast cancer and control.
##' Cancer<-system.file("extdata", "Cancer_normal_labels.cls", package = "SubtypeDrug")
##' ## Disease_drugs<-PrioSubtypeDrug(Geneexp,Cancer,"Control",SpwSymbolList,drug.spw.data=DrugSpwData,
##' ##                                       E_FDR=1,S_FDR=1)
##'
##' ## The function PrioSubtypeDrug() can also support user-defined data.
##' Geneexp<-get("GeneexpT")
##' ## User-defined drug regulation data should resemble the structure below
##' UserDS<-get("UserDST")
##' str(UserDS)
##' ## Need to load gene set data consistent with drug regulation data.
##' UserGS<-get("UserGST")
##' str(UserGS)
##' Drugs<-PrioSubtypeDrug(Geneexp,Cancer,"Control",UserGS,spw.min.sz=1,
##'                        drug.spw.data=UserDS,drug.spw.min.sz=1,
##'                        nperm=10,E_FDR=1,S_FDR=1)
##' @importFrom parallel parLapply
##' @importFrom parallel detectCores
##' @importFrom parallel makeCluster
##' @importFrom parallel clusterExport
##' @importFrom parallel stopCluster
##' @importFrom GSVA gsva
##' @importFrom stats p.adjust
##' @importFrom stats pnorm
##' @export

PrioSubtypeDrug<-function(expr,input.cls="",control.label="",subpathway.list,
                spw.min.sz=10,spw.max.sz=Inf,spw.score.method="gsva",kcdf="Gaussian",
                drug.spw.data,drug.spw.p.val.th=0.05,
                drug.spw.min.sz=10,drug.spw.max.sz=Inf,
                weighted.drug.score=TRUE,nperm=1000,parallel.sz=1,E_FDR=0.05,S_FDR=0.001){

  haveGSVA <- isPackageLoaded("GSVA")
  if(haveGSVA==FALSE){
    stop("The 'GSVA' library, should be loaded first")
  }
  haveParallel <- isPackageLoaded("parallel")
  if(haveParallel==FALSE){
    stop("The 'parallel' library, should be loaded first")
  }
  if(spw.score.method=="ssgsea"){
    gsvaPar <- GSVA::ssgseaParam(expr, subpathway.list)
  }else{
    gsvaPar <- GSVA::gsvaParam(expr, subpathway.list,kcdf=kcdf,minSize=spw.min.sz,maxSize=spw.max.sz)
  }

  spw_matrix_y<-gsva(gsvaPar)
  spw_matrix_rnames<-row.names(spw_matrix_y)
  sampleid<-colnames(spw_matrix_y)

  if(is.list(input.cls)) {
    CLS <- input.cls
  } else {
    CLS <- ReadClsFile(file=input.cls)
  }
  phen <- CLS$phen
  samples.v1 <-CLS$class.labes
  SmaplePhenotype<-data.frame(sampleId=sampleid,sampleSubtype=samples.v1,stringsAsFactors =FALSE)
  control_index<-which(samples.v1==control.label)
  phen<-phen[phen!=control.label]
  samples.v<-samples.v1[-control_index]

  drug_target_data<-getDrugSpw(drug.spw.data,spw_matrix_rnames,drug.spw.p.val.th,drug.spw.min.sz,drug.spw.max.sz)

  up_signature1<-sapply(drug_target_data$Target_upregulation, function(x){
        pj<-paste(x,collapse =" ")
      })
  down_signature1<-sapply(drug_target_data$Target_downregulation, function(x){
        pj<-paste(x,collapse =" ")
      })
  Parameters<-list(control.label=control.label,spw.min.sz=spw.min.sz,spw.max.sz=spw.max.sz,spw.score.method=spw.score.method,kcdf=kcdf,
                    drug.spw.p.val.th=drug.spw.p.val.th,drug.spw.min.sz=drug.spw.min.sz,drug.spw.max.sz=drug.spw.max.sz,weighted.drug.score=weighted.drug.score,nperm=nperm,parallel.sz=parallel.sz)

   if(length(phen)==1){
      fc<-apply(spw_matrix_y, 1, function(x){
        foldc<-mean(x[which(samples.v1!=control.label)])-mean(x[control_index])
        return(foldc)
      })
      drugs<-getDrugMatrix(fc,drug_target_data,weighted.drug.score)
      drugs1<-drugs
      zdz<-max(drugs)
      zxz<-min(drugs)
      for(i in 1:length(drugs)){
       if(drugs[i]>0){
         drugs[i]<-drugs[i]/abs(zdz)
       }else{
          drugs[i]<-drugs[i]/abs(zxz)
        }
     }

      if(parallel.sz==1){
        fc1<-fc
        rdmean_matrix<-NULL
        for(n in 1:nperm){
          rdsamples<-sample(c(1:length(fc1)),size = length(fc1),replace = FALSE)
          names(fc1)<-names(fc1)[rdsamples]
          rdmatrix<-getDrugMatrix(fc1,drug_target_data,weighted.drug.score)
          rdmean_matrix<-cbind(rdmean_matrix,rdmatrix)
        }
      }else{
        if(parallel.sz==0){
         nCores <- parallel::detectCores()
        }else{
         nCores <- parallel.sz
      }
      cl <- makeCluster(nCores)
      clusterExport(cl,c("CalculateSES","getDrugMatrix"))
      rdmean_matrix<-parLapply(cl,1:nperm,function(n,fc1,weighted.drug.score,drug_target_data){
        rdsamples<-sample(c(1:length(fc1)),size = length(fc1),replace = FALSE)
        names(fc1)<-names(fc1)[rdsamples]
        rdmatrix<-getDrugMatrix(fc1,drug_target_data,weighted.drug.score)
        return(rdmatrix)
      },fc,weighted.drug.score,drug_target_data)
      stopCluster(cl)
      rdmean_matrix<-do.call("cbind",rdmean_matrix)
      }

      p_values<-sapply(c(1:length(drugs1)), function(i){
        s_t<-abs(drugs1[i])
        p.val<-pnorm(-s_t,mean = mean(rdmean_matrix[i,]), sd = sd(rdmean_matrix[i,]),lower.tail = TRUE)+pnorm(s_t,mean = mean(rdmean_matrix[i,]), sd = sd(rdmean_matrix[i,]),lower.tail = F)
        return(p.val)
      })

      fdr<-p.adjust(p_values,"BH",length(p_values))

      result<-data.frame(Drug=names(drugs),
                          Target_upregulation=up_signature1,
                          Target_downregulation=down_signature1,
                          SDS=signif(drugs,digits=3),E_Pvalue=signif(p_values,digits=3),E_FDR=signif(fdr,digits=3),stringsAsFactors=FALSE)
      result<-result[which(result[,6]<=E_FDR),]
      result<-list(result,spw_matrix_y,SmaplePhenotype,Parameters)
      names(result)<-c(phen,"SubpathwayMatrix","SampleInformation","Parameter")
      return(result)
   }else{

     sy_spw_matrix<-AccumulateNormal(spw_matrix_y,control_index)

     drug_sample_matrix<-getDrugMatrix(sy_spw_matrix,drug_target_data,weighted.drug.score)
     drug_sample_matrix<-t(apply(drug_sample_matrix,1,function(x){
      zdz<-max(x)
      zxz<-min(x)
      for(i in 1:length(x)){
       if(x[i]>0){
         x[i]<-x[i]/abs(zdz)
       }else{
          x[i]<-x[i]/abs(zxz)
        }
      }
       return(x)
     }))

     pn<-length(phen)
     drug_true_s<-sapply(1:pn, function(p){
       phenindex<-which(samples.v==phen[p])
       DCS<-apply(drug_sample_matrix[,phenindex], 1,mean)
       return(DCS)
     })


     if(parallel.sz==0){
       nCores <- parallel::detectCores()
     }else{
       nCores <- parallel.sz
     }
     cl <- makeCluster(nCores)
     rdmean_matrix<-parLapply(cl,1:nperm,function(n,drug_matrix,samples.v,phen){
        rdsamples<-sample(samples.v,size = length(samples.v),replace = FALSE)
        rdmeans<-NULL
        for(i in 1:length(phen)){
          ppindex<-which(rdsamples==phen[i])
          rdmeans<-cbind(rdmeans,apply(drug_matrix[,ppindex], 1, mean))
        }
        return(rdmeans)
      },drug_sample_matrix,samples.v,phen)

      rdmean_matrix<-do.call("cbind",rdmean_matrix)

      sub_fc<-NULL
      sub_score<-NULL
      for(p in 1:length(phen)){
        fc<-apply(spw_matrix_y, 1, function(x){
          foldc<-mean(x[which(samples.v1==phen[p])])-mean(x[which(samples.v1==control.label)])
          return(foldc)
       })
        sub_score<-cbind(sub_score,getDrugMatrix(fc,drug_target_data,weighted.drug.score))
        sub_fc<-cbind(sub_fc,fc)
      }

      clusterExport(cl,c("CalculateSES","getDrugMatrix"))
      subrd_matrix<-parLapply(cl,1:nperm,function(n,fc1,weighted.drug.score,drug_target_data){
        row.names(fc1)<-sample(row.names(fc1),size = nrow(fc1),replace = FALSE)
        rdmatrix<-apply(fc1,2,function(x){
          return(getDrugMatrix(x,drug_target_data,weighted.drug.score))
        })
        return(rdmatrix)
      },sub_fc,weighted.drug.score,drug_target_data)
      subrd_matrix<-do.call("cbind",subrd_matrix)
      stopCluster(cl)

      sub_p<-NULL
      sub_fdr<-NULL
      for(p in 1:length(phen)){
        if(p==length(phen)){
          sx<-which(c(1:ncol(subrd_matrix))%%length(phen)==0)
        }else{
          sx<-which(c(1:ncol(subrd_matrix))%%length(phen)==p)
        }
        subrd_matrix1<-subrd_matrix[,sx]
        sub_p1<-sapply(c(1:nrow(sub_score)), function(i){
          s_t<-abs(sub_score[i,p])
          p.val<-pnorm(-s_t,mean = mean(subrd_matrix1[i,]), sd = sd(subrd_matrix1[i,]),lower.tail = TRUE)+pnorm(s_t,mean = mean(subrd_matrix1[i,]), sd = sd(subrd_matrix1[i,]),lower.tail = F)
          return(p.val)
        })
         sub_fdr1<-p.adjust(sub_p1,"BH",length(sub_p1))
         sub_p<-cbind(sub_p,sub_p1)
         sub_fdr<-cbind(sub_fdr,sub_fdr1)
      }

      colnames(drug_true_s)<-phen
      p_values<-sapply(c(1:nrow(drug_true_s)),function(d){
        p_val_v<-NULL
        for(i in 1:pn){
          s_t<-abs(drug_true_s[d,i])
          p.val<-pnorm(-s_t,mean = mean(rdmean_matrix[d,]), sd = sd(rdmean_matrix[d,]),lower.tail = TRUE)+pnorm(s_t,mean = mean(rdmean_matrix[d,]), sd = sd(rdmean_matrix[d,]),lower.tail = F)
          p_val_v<-c(p_val_v,p.val)
        }
        return(p_val_v)
     })

      result<-list()
      for(i in 1:pn){
        drug_true_s1<-signif(drug_true_s[,i],digits=3)
        fdr<-p.adjust(p_values[i,],"BH",ncol(p_values))
        result1<-data.frame(Drug=row.names(drug_sample_matrix),
                         Target_upregulation=up_signature1,
                         Target_downregulation=down_signature1,
                         SDS=drug_true_s1,E_Pvalue=sub_p[,i],E_FDR=sub_fdr[,i],S_Pvalue=signif(p_values[i,],digits=3),S_FDR=signif(fdr,digits=3),stringsAsFactors=FALSE)
        result[[i]]<-result1[which(result1[,6]<=E_FDR&result1[,8]<=S_FDR),]
      }

    result<-c(result,list(drug_sample_matrix,spw_matrix_y,SmaplePhenotype,Parameters))
    names(result)<-c(phen,"DrugMatrix","SubpathwayMatrix","SampleInformation","Parameter")
    return(result)
   }
}