R/sum_str.R

Defines functions sum_str sum_str_helper

Documented in sum_str sum_str_helper

#' Summarize the code structure
#'
#' Create a summary of one or multiple code files based on the section
#' separators and their titles.
#' @param path_in Either a path to a directory or to a single file. If it is
#'   a directory path, all files in that directory will be summarised. If it
#'   is a single file path, only the resepective file will be summarised.
#'   The default value uses the RStudio API to produce a summary of content from
#'   the source editor. This requires that the file is saved before
#'   \code{sum_str} is called.
#' @param file_in_extension If \code{file_in} is \code{NULL}, all files with the
#'   \code{file_in_extension} are considered, defaults to ".R".
#' @param dir_out The directory to print the output to. "" implies the console.
#' @param file_out A connection or character string naming the file to print to.
#'   If set to \code{NULL}, the name will be \code{paste0("code-summary_", file_in)}.
#'   If \code{dir_out} is set to "", \code{file_out} can be set to
#'   "object" and the output of the function will be returned as an
#'   object instead of just printed to the console with \code{cat}. This is
#'   useful if output should be assigned to an object. If not set to "object",
#'   \code{cat} will be used.
#' @param file_out_extension A file extension for the file to be created.
#' @param rm_rh_hashes Boolean value indicating whether or not to remove
#'   righthand hashes in section titles for the summary
#'   (see section Removing spaces and hashes).
#' @param rm_rh_spaces Boolean value indicating whether or not to remove
#'   righthand spaces in section titles for the summary
#'   (see section Removing spaces and hashes).
#' @param rm_break_anchors Boolean value indicating whether or not the anchors
#'   inserted in code separators should be removed for the summary.
#' @param width The character width of the output. If NULL, it is set to the
#'   length of the longest separator title.
#' @param line_nr A boolean value that indicates whether the line numbers should
#'   be printed along with the structure summary.
#' @param granularity Indicates the lowest level of granularity that should be
#'   included in the summary.
#' @param last_sep A boolean value indicating whether or not the separating
#'   lines of the highest granularity should be printed.
#' @param title A boolean value indicating whether the reported summary should
#'   contain a title or not.
#' @param header A boolean value indicating whether a column header should
#'   indicate the name of the columns (line, level, section).
#' @param rdf A boolean value indicating whether a rdf file should be generated.
#' @param graph A boolean value indicating whether a rdf graph should be generated.
#' @param domain A boolean value indicating whether use user's working domain as
#' a prefix header in generated rdf file.
#' @param baseURI A character string naming the URI for user's working domain.
#' @param UserID A character string naming the user name in the working domain.
#' @param prefix A character string naming the abbreviation for user's domain.
#' @param UserAL A boolean value indicating whether use default association
#' library.
#' @param fillAssociation A boolean value indicating whether use a pair of default
#' associations to name unlisted associations in User default association library.
#' @param UserANM A boolean value indicating whether use defaut association list.
#' Only associations in this list could serve as an relationship in output rdf graph.
#' @param ... futher arguments to be passed from and to other methods, in
#'   particular \code{\link{list.files}} for reading in multiple files.

#' @details To create the summary, \code{sum_str} uses regular expressions.
#'   Hence it is crucial that the code separators and the separator titles
#'   match the regular expression pattern. We recommend inserting
#'   separators and their titles using the RStudio Add-in that is contained
#'   in this package. The definition is rather intuitive as can be seen in the
#'   example section below. However, we shall provide a formal definition here
#'   as well.
#'   \itemize{
#'     \item A code separator is defined as a line that starts with n hashes,
#'     followed by 8-n spaces where 0 < n < 8. This sequence is followed by one
#'     or more either \code{.} or \code{_}.
#'     \item A title associated with a code separator is defined as a line
#'     that starts with n hashes, followed by 8-n spaces where 0 < n < 8. This
#'     sequence is \emph{not} followed by \code{.} or \code{_}.
#'   }
#'   Lines that do not satisfy these requirements (e.g. do not start with #s,
#'   do not contain the right number of spaces after the #, indent before any #
#'   ect.) are not considered by \code{sum_str}.
#' @section Removing spaces and hashes:
#'   The add-in contained in this package inserts section titles in a way that
#'     that they are recognised by RStudio as sections (for details, see
#'     \href{https://support.rstudio.com/hc/en-us/articles/200484568-Code-Folding-and-Sections}{RStudio's official website}. One structure that is
#'     recognised by RStudio as section is a line starting with a hash and ending
#'     with four hashes. This structure is implemented with \code{strcode}.
#'     Hence when creating the summary, it might be desired to remove the right
#'     hand hashes and spaces, which can be specified with the respective options
#'     \code{rm_rh_hashes} and \code{rm_rh_spaces}.
#' @seealso insert_l_break
#' @examples
#' # the following separator is an example of a valid
#' # separator and associated title
#'
#' #       __________________________________________________
#' #       this is a level 1 title                       ####
#' ##  . . . . . . . . . . . . . . . . . . . . . . . . .
#' ##  note that the title or the separator character (_, .)
#' ##  always starts at indention 8.
#' # to create separators which is valid for generating rdf file,
#' # users should fill out at least title and class, and put the entities into
#' # correct levels to create associations automatically by function.
#'
#' # the following separator is an example of a valid
#' # separator and associated title for generating rdf file
#'
#' #       ________________________________________________________________________
#' #       YourWorkflow {YourID provone:Workflow}                              ####
#'
#' \dontrun{
#' # Open a new .R file in RStudio, insert some code breaks
#' # using the Add-in of this package, save the file and run:
#' sum_str() # get a summary of the source editor.
#' }
#' @importFrom rstudioapi getSourceEditorContext
#' @export
#'
#   ____________________________________________________________________________
#   user function                                                           ####
sum_str <- function(path_in = getSourceEditorContext()$path,
                    file_in_extension = ".R",
                    dir_out = "",
                    file_out = NULL,
                    file_out_extension = "",
                    width = NULL,
                    rm_rh_hashes = TRUE,
                    rm_rh_spaces = TRUE,
                    rm_break_anchors = TRUE,
                    line_nr = TRUE,
                    granularity = 3,
                    last_sep = FALSE,
                    title = TRUE,
                    header = TRUE,
                    rdf = FALSE,
                    graph=FALSE,
                    domain=FALSE,
                    baseURI="http://example.org/base/",
                    UserID="UserID",
                    prefix="user",
                    UserAL=FALSE,
                    fillAssociation=FALSE,
                    UserANM=FALSE,
                    HideAssociation=FALSE,
                    ...) {

##  ............................................................................
##  assertive tests                                                         ####
assert_number(granularity, lower = 1, upper = 3)

### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
### prepare input to call helper repeated times                             ####
  # check if file can be directory or path
  if (is.na(file.info(path_in)$isdir)) {
    stop("Indicated path (", path_in, ") is neither a directory nor a valid file name")
  }
  # create files if path_in is directory
  else if(file.info(path_in)$isdir) {
    all_files <- list.files(path = path_in,
                            pattern = paste0(file_in_extension, "$"),
                            full.names = FALSE)
    # files contain path name
    all_files <- paste(path_in, all_files, sep = "/")
    if (length(all_files) == 0) {
      warning("there are no files in the directory")
    }

  # in the case path_in is already a file
  } else {
    all_files <- path_in
  }

  # if output is not printed in the console, print a short summary.
### . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
### call helper                                                             ####
  if (dir_out != "") {
    cat("The following files were summarized \n")
  }
  output <- lapply(all_files, function(g) {

    # pass all arguments as is except the path_in
    if (dir_out != "") {
      cat(g, sep = " \n")
    }
    sum_str_helper(path_in = g,
                   dir_out = dir_out,
                   file_out = file_out,
                   file_out_extension = file_out_extension,
                   width = width,
                   rm_rh_hashes = rm_rh_hashes,
                   rm_rh_spaces = rm_rh_spaces,
                   rm_break_anchors = rm_break_anchors,
                   line_nr = line_nr,
                   granularity = granularity,
                   last_sep = last_sep,
                   title = title,
                   header = header,
                   rdf=rdf,
                   graph=graph,
                   domain=domain,
                   baseURI=baseURI,
                   UserID=UserID,
                   prefix=prefix,
                   UserAL=UserAL,
                   fillAssociation=fillAssociation,
                   UserANM=UserANM,
                   HideAssociation=HideAssociation)
  })

  if (dir_out == "" && !is.null(file_out) && file_out == "object") {
    output
  } else {
    invisible() # avoid unnecessary NULL return
  }
}
#' helper function for code summarisation
#   ____________________________________________________________________________
#   helper function: sum_str_helper                                         ####
#' return code summary for one file
#'
#' Function is called by \code{sum_str()} and returns summary of one code file.
#' @inheritParams sum_str
#' @details The core of the function is described best as follows: after a file
#' was read in and stored in a vector *lines* whereas each element describes a
#' line of code, the candidate lines (in the sense that they might be contained
#' in the summary) were evaluated, their indices stored in *cand*. Next,
#' various regex patterns are matched against *lines*. Then,
#' after all tests were executed, the variable *remove* contains all indices
#' that should be removed from *lines* before it is returned as the summary of
#' the code file. Hence, applying \code{setdiff(cand, remove)} contains
#' the subset of *lines* that we finally want to output.
#' @keywords internal
#' @import checkmate
sum_str_helper <- function(path_in,
                           dir_out,
                           file_out,
                           file_out_extension,
                           rm_rh_hashes,
                           rm_rh_spaces,
                           rm_break_anchors,
                           width,
                           line_nr,
                           granularity,
                           last_sep,
                           title,
                           header,
                           rdf,
                           graph,
                           domain,
                           baseURI,
                           UserID,
                           prefix,
                           UserAL,
                           fillAssociation,
                           UserANM,
                           HideAssociation) {

##  ............................................................................
##  argument interaction                                                    ####
### .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
### get the file_out together
  if (is.null(file_out)) {
    # file_out must be a file name. Grap the last expression after backslash.
    backslash_rm <- gsub("^.*/(.*)", "\\1", path_in, perl = TRUE)
    file_out <- paste0("code_summary-",
                       gsub("^(.*)\\..*$", "\\1", backslash_rm, perl = TRUE),
                       file_out_extension)
  }

### .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
### paths

  # path_out
  ## path_out is "" if dir_out is ""
  if (dir_out == "") {
    path_out <- ""
  # otherwise it is composed of dir_out and file_out, if file_out
  # has a not empty value
  } else {
    path_out <- paste(dir_out, file_out, sep = "/")
  }



##  ............................................................................
##  get pattern

  lines <- readLines(con = path_in)
  sub_pattern <- "^\\s*#\\s*@\\w"
  cand <- grep(sub_pattern, lines, perl = FALSE) # extract candiates id
  if (length(cand) == 0) {
    return(warning("No line matching the required pattern",
                   call. = FALSE, immediate. = TRUE))
  }

##  .................. #< 3b5746a13447c5269736b631d6a9370d ># ..................
##  replace hashed seps                                                     ####
#if (rm_break_anchors) {
  # extract candidates for replacement
  #hash_candid <- intersect(grep("(\\s#<\\s[0-9a-z]{1,33}\\s>#\\s)", lines, perl = TRUE),
  #                         cand)
  # get their level
#  lvl <- nchar(gsub("^(#+)\\s.*$", "\\1", lines[hash_candid], perl = TRUE))
#  replacement <- vapply(lvl, function(x) help_create_break(start = paste0(rep("#", x), collapse = ""),
#                                           break_char = give_breakchar(x),
#                                           sep = paste(rep(" ", 8 - x), collapse = ""), anchor_in_sep = FALSE),
#                  FUN.VALUE = character(1))
#  lines[hash_candid] <- replacement

#}
##  ............................................................................
##  modify pattern according to arguments
### .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
### getting the granularity right
  remove <- c()
  #if (granularity < 3) { # if there are any lines to remove
  #  hashes <- (granularity + 1):3
  #  spaces <-  8- hashes

    # this variable stores the indices of all lines that should be dropped.
    #for (i in 1:length(hashes)) {
    #  sub_pattern <- paste0("^#{", hashes[i], "}\\s{", spaces[i], "}.*$")
    #  remove <- append(remove, grep(sub_pattern, lines, perl = TRUE))
    #}
  #}

### .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
### remove last separator
  #if (last_sep == FALSE) {
  #  hashes <- min(find_gran("down", lines = lines), granularity)
  #  spaces <- 8 - hashes
  #  sub_pattern <- paste0("^#{", hashes, "}\\s{", spaces, "}[\\._].*$")
  #  remove <- append(remove, grep(sub_pattern, lines, perl = TRUE))
  #}

##  ............................................................................
##  select elements that "survived all tests"
  #tokeep <- setdiff(cand, remove)
  #lines <- lines[tokeep]
lines=lines[cand]
### .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
### width adjust line_nr, title, output path, header

  # only continue if there is a valid pattern
  #if (length(lines) == 0) {
  #  return(warning("No line matching the required pattern",
  #                 call. = FALSE, immediate. = TRUE))
  #}

  # issue warning if there are no titles
  pattern_titles <- grep("^(#   |##  |### )[^\\._)].*$", lines, value = TRUE)
  #if (length(pattern_titles) == 0) { # if there were no titles
  #  warning("There are no segment titles.",
  #          call. = FALSE, immediate. = TRUE)
  #}

  # remove right hand hashes if desired
  if (rm_rh_hashes) {
    lines <- gsub("####$", "    ", lines, perl = TRUE)
  }

  # remove right hand spaces if desired
  if (rm_rh_spaces) {
    lines <- gsub("\\s*$", "", lines, perl = TRUE)
  }

  # adjust length of pattern.
  if (is.null(width)) {
  # first calculate width. It is the length of the maximal title string
  ## get the title strings
    if (length(pattern_titles) == 0) { # if there were no titles
      width <- options()$strcode$char_length
    #  warning("width set to options()$strcode.char.length",
    #          call. = FALSE, immediate. = TRUE)
    } else { # if there were titles
      width <- max(nchar(pattern_titles))
    }
  }

  lines <- substring(lines, 1, width)

  #if (line_nr == TRUE) {
    #lines <- paste(tokeep, lines, sep = "\t")
  #}

  #if (header == TRUE) {
  #  lines <- append(c("line  level section"), lines)
  #}
  #if (title == TRUE) {
  #  lines <- append(paste0("Summarized structure of ", path_in, "\n"), lines)
  #}

##  ............................................................................
##  output the pattern

 if (rdf=="ttl"|graph){ # if users want to generate rdf file or rdf graph
   # use system time to generate a base file
    datetime <- format(Sys.time(), "%Y_%m_%d_%H_%M_%S")
    fileformat=".ttl" #".txt"
    outputfile2 <- paste("RDF_output_file_",datetime,fileformat,sep="")
    write(lines,file=outputfile2)


    templines=readLines(outputfile2)
    print (templines)
    lines_content=templines[4:length(templines)]
    lines_split=strsplit(lines_content, " ")
    print (lines_split)

    baseURI=baseURI
    UserID=UserID
    FullURI=paste0(baseURI,UserID,"/")
    prefix=prefix

    schemalist=list()
    # a schemas library to create headers by using keywords in content
    schemas=c(xsd="@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .",
          owl="@prefix owl:     <http://www.w3.org/2002/07/owl#> .",
          dcterms="@prefix dcterms: <http://purl.org/dc/terms/> .",
          prov="@prefix prov:    <http://www.w3.org/ns/prov#> .",
          wfms="@prefix wfms:    <http://www.wfms.org/registry.xsd> .",
          rdf="@prefix rdf:       <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .",
          provone="@prefix provone: <http://dataone.org/ns/provone#> .",
          skos="@prefix skos:    <http://www.w3.org/2004/02/skos/core#> .",
          yw="@prefix yw:   	 <http://yesworkflow.org/ns/yesworkflow#> .",
          cwfo="@prefix cwfo:    <http://cwf.tw.rpi.edu/vocab#> .",
          cwf="@prefix cwf:     <http://cwf.tw.rpi.edu/data#> .")
    # Using ":" to find class
    for (i in 1:length(lines_split)){
     schemalist[[i]]=grep(":",lines_split[[i]])
    }
    tempcount0=0
    schemalist1=list()
    for (i in 1:length(schemalist)){
      if (length(schemalist[[i]])>0){
        tempcount0=tempcount0+1
        schemalist1[[tempcount0]]=schemalist[[i]]
      }
    }
    # delete the seperator line and get useful information only
    tempcount0=0
    lines_split1=list()
    for (i in 1:length(schemalist)){
      if (length(schemalist[[i]])>0){
        tempcount0=tempcount0+1
        lines_split1[[tempcount0]]=lines_split[[i]]
      }
    }

    lines_split=lines_split1
    schemalist=schemalist1
    lines_split1=schemalist
    for (i in 1:length(lines_split)){
      tempcount0=0
      for (j in 1:length(lines_split[[i]])){
        if (nchar(lines_split[[i]][j])>0){
          tempcount0=tempcount0+1
          lines_split1[[i]][tempcount0]=lines_split[[i]][j]
        }
      }
    }
    # delete the symbols which is used in seperator line
    for (i in 1:length(lines_split1)){
      schemalist[[i]]=grep(":",lines_split1[[i]])
    }
    lines_split=lines_split1

    infolist=lines_split
   print ("this is infolist1")
   print (infolist)
    for (i in 1:length(infolist)){
      templevel=strsplit(infolist[[i]][1],"\\t")
      infolist[[i]][1]=nchar(templevel[[1]][2])
    }
    for (i in 1:length(infolist)){
      infolist[[i]][3]=gsub("\\{","",infolist[[i]][3])
      infolist[[i]][3]=gsub("\\#","",infolist[[i]][3])
    }
    for (i in 1:length(infolist)){
      infolist[[i]][length(infolist[[i]])]=gsub("\\}","",infolist[[i]][length(infolist[[i]])])
    }
    infolist1=infolist

   # delete JSON-LD style structure and only leave the useful information
    for (i in grep("\":",infolist)){
      infolist[[i]][2]=gsub(",","",infolist1[[i]][4])
      infolist[[i]][3]=gsub(",","",infolist1[[i]][2])
      infolist[[i]][4]=gsub(",","",infolist1[[i]][3])
    }

    for (j in 1:length(infolist)){
      for (i in 1:length(strsplit(infolist[[j]], "\":"))){
        if (length(strsplit(infolist[[j]], "\":")[[i]])>1){
          infolist1[[j]][i]=strsplit(infolist[[j]], "\":")[[i]][2]
      }
    }
    }
    infolist=infolist1
    print ("this is infolist2")
   print (infolist)

    schemahad=0
    lines_rdf=" @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .\n"
    count0=1

    schemalist1=list()
    for (i in 1:length(infolist)){
      schemalist1[[i]]=grep(":",infolist[[i]])
    }
    schemalist=schemalist1

    # add prefix
    for (i in 1:length(schemalist)){
      for (j in 1:length(schemalist[[i]])){
        tempstr=infolist[[i]][(schemalist[[i]])[j]]
        tempschemastr=gsub("\\.","",strsplit(tempstr,'\\:')[[1]][1])
        schemas[tempschemastr]
        if (tempschemastr %in% schemahad) {
        }
        else{schemahad[count0]=tempschemastr
        count0=count0+1}
      }
    }
    # adding headers:
    lines_rdf=paste0(lines_rdf," @prefix ",prefix,":    ","<",FullURI,"> .\n")
    for (i in 1:length(schemahad)){
      lines_rdf=paste(lines_rdf,schemas[schemahad[i]],"\n")
    }
    # creating a default association list
    DefaultAssociationlist=paste0("AssociationName\n","provone:hasSubProcess\n","provone:sourcePToCL\n","provone:CLtoDestP\n",
                  "provone:hasInPort\n","provone:hasOutPort\n","provone:hasDefaultParam\n",
                  "provone:DLToInPort\n","provone:outPortToDL\n","provone:inPortToDL\n",
                  "provone:DLToOutPort\n","provone:wasAttributedTo\n","provone:wasDerivedFrom\n",
                  "provone:dataOnLink\n","provone:used\n","provone:wasGeneratedBy\n",
                  "provone:wasAssociatedWith\n","provone:wasInformedBy\n","provone:isPartOf\n",
                  "provone:hadMember\n","cwfo:hasOutData\n","cwfo:hasInData\n")
    write(DefaultAssociationlist,file="DefaultAssociationNames.txt")

    Associationlist.df=read.table("DefaultAssociationNames.txt",sep=",",header=TRUE)

   # whether use default association list or not
    if (UserANM==FALSE){
       Associationlist.df=read.table("DefaultAssociationNames.txt",sep=",",header=TRUE)
     }
    else if (UserANM==TRUE) {
       Associationlist.df=read.table("AssociationNames.txt",sep=",",header=TRUE)
       }
    Associationlist=Associationlist.df$AssociationName

    # creating a default association library:
    DefaultAL=paste0("ParentClass,","ChildClass,","Ways,","Property,","ReverseProperty\n",
          "\"provone:Process\",","\"provone:Process\",","2,","\"provone:hasSubProcess\",","\"provone:wasDerivedFrom\"\n",
          "\"provone:Process\",","\"provone:Data\",","2,","\"provone:wasDerivedFrom\",","\"provone:hasMember\"\n",
          "\"provone:Process\",","\"provone:Visualization\",","2,","\"provone:wasDerivedFrom\",","\"provone:hasMember\"\n")
    write(DefaultAL,file="DefaultAssociationLibrary.txt")

    nodesnames=nodesclasses=nodesfrom=nodesto=nodesproperty=parentclass=property=line_rdf_vector=title1=esci=""
    templevel=parentlevel=parentindex=tempwordlist=0
    levelvector=rep(0,7)

    # get property of association by using parent entity class and child entity class automatically
    for (j in 1:length(infolist)){
      AssociationNUM=firstmeet=trigger1=0
      line_rdf=classeswords=""
      title0=infolist[[j]][2]
      title1[j]=title0
     ID=infolist[[j]][3]
      parentlevel=templevel
      templevel=infolist[[j]][1]
      tempclass=infolist[[j]][4]
      # levelvector saves existing levels
     if (infolist[[j]][1]==1){
        if (levelvector[1]==0){
         levelvector[1]=j
       }
      }
     if (infolist[[j]][1]==2){
       if (levelvector[2]==0){
          levelvector[2]=j
       }
      }
      if (infolist[[j]][1]==3){
       if (levelvector[3]==0){
         levelvector[3]=j
       }
      }
      if (infolist[[j]][1]==4){
       if (levelvector[4]==0){
          levelvector[4]=j
       }
     }
     if (infolist[[j]][1]==5){
       if (levelvector[5]==0){
          levelvector[5]=j
       }
      }
      if (infolist[[j]][1]==6){
       if (levelvector[6]==0){
         levelvector[6]=j
       }
      }
      if (infolist[[j]][1]==7){
       if (levelvector[7]==0){
         levelvector[7]=j
        }
      }
      # replace existing levels when found a new one
      if (as.numeric(parentlevel)!=0){
       if (as.numeric(templevel)>as.numeric(parentlevel)){
          parentindex=j-1
         parentclass=infolist[[j-1]][4]
       }
       else if (templevel==parentlevel){
          parentindex=levelvector[as.numeric(templevel)-1]
          parentclass=infolist[[as.numeric(parentindex)]][4]
        }
        else {
         levelvector[as.numeric(templevel)]=j
         parentindex=levelvector[as.numeric(templevel)-1]
         parentclass=infolist[[parentindex]][4]
        }
     }
     # judge association:
     if (UserAL==FALSE){
        AssociationsLib=read.table("DefaultAssociationLibrary.txt",sep=",",header=TRUE)
      }
      else if (UserAL==TRUE) {
         AssociationsLib=read.table("AssociationLibrary.txt",sep=",",header=TRUE)
      }
      tempPwordlist=which(AssociationsLib$ParentClass==parentclass)
      tempwordlist=which(AssociationsLib$ChildClass[tempPwordlist]==tempclass)
      # whether the relationship between parent and child classes are in association library
      if (length(tempwordlist)>1){
        AssociationNUM=tempPwordlist[min(tempwordlist)]
      }
      else if (length(tempwordlist)==1){
        AssociationNUM=tempPwordlist[tempwordlist]
      }
      # get association property and other information if it is in association library
      if (AssociationNUM>0){
        property=as.character(AssociationsLib$Property[AssociationNUM])
        # two ways association
       if (AssociationsLib$Ways[AssociationNUM]==2){
          nodesfrom=paste0(nodesfrom,infolist[[as.numeric(parentindex)]][2]," ")
          nodesto=paste0(nodesto,infolist[[j]][2]," ")
          nodesproperty=paste0(nodesproperty,property," ")
          nodesfrom=paste0(nodesfrom,infolist[[j]][2]," ")
          nodesto=paste0(nodesto,infolist[[as.numeric(parentindex)]][2]," ")
          nodesproperty=paste0(nodesproperty,AssociationsLib$ReverseProperty," ")
       }
        # one way association
        else if (AssociationsLib$Ways[AssociationNUM]==1){
          nodesfrom=paste0(nodesfrom,infolist[[as.numeric(parentindex)]][2]," ")
          nodesto=paste0(nodesto,infolist[[j]][2]," ")
          nodesproperty=paste0(nodesproperty,property," ")
        }

      }
      # if using default association to fill the nonexistent association in the library
      else if ((fillAssociation==TRUE)&(as.numeric(parentlevel)!=0)){
        property="str:has"
        nodesfrom=paste0(nodesfrom,infolist[[as.numeric(parentindex)]][2]," ")
        nodesto=paste0(nodesto,infolist[[j]][2]," ")
        nodesproperty=paste0(nodesproperty,property," ")
       nodesfrom=paste0(nodesfrom,infolist[[j]][2]," ")
        nodesto=paste0(nodesto,infolist[[as.numeric(parentindex)]][2]," ")
        nodesproperty=paste0(nodesproperty,"str:belongTo"," ")
     }
      #print ("This is levelvector")
      #print (levelvector)
      #print ("This is from")
      #print (nodesfrom)
      #print ("This is to")
      #print (nodesto)
      #print ("This is property")
      #print (nodesproperty)
  # deal with other information besides title, id and class
  for (i in 4:length(infolist[[j]])){
    tempword=""
    tempentity=""
    temp_line=""
    if (i==4){
      tempword=infolist[[j]][4]
      nodesnames=paste0(nodesnames,title0," ")
      classeswords=paste0(tempword)

      entityname=paste0(prefix,":",ID)
      title=paste0(entityname)
      line_rdf=paste0("\n ",title," a ",tempword)

      # print content depends on whether it is the last one or not
      if (i==length(infolist[[j]])){
        nodesclasses=paste0(nodesclasses,classeswords," ")
        line_rdf=paste(line_rdf,";","\n")
        title0=paste0("\"",title0,"\"")
        # add title as rdfs:label in the output file
        #line_rdf=paste(line_rdf,"\t","rdfs:label",title0,".","\n")#,".","\n")
      }
      else{
        line_rdf=paste(line_rdf)
      }

    }# out of if i==4
    else { # i>4
      tempword=infolist[[j]][i]
      # find manually input values
      print ("This is j")
      print (j)
      print ("This is tempword (tempword=infolist[[j]][i])")
      print (tempword)
      print ("This is infolist[[j]]")
      print (infolist[[j]])
      if (grepl("=",tempword)){
        print ("This is j get equal mark")
        print (j)
        # print founded class or classes in output file
        firstmeet=firstmeet+1
        if (firstmeet==1){
          line_rdf=paste0(line_rdf,";","\n")
        }

        tempwordlist=strsplit(tempword,"=")
        # if the manually typed input is an association, add this relation into nodes data frame
        print (tempwordlist[[1]][1])
        print (Associationlist)
        print (tempwordlist[[1]][1] %in% Associationlist)
        if (tempwordlist[[1]][1] %in% Associationlist)
        {
          trigger1=1
          nodesfrom=paste0(nodesfrom,title0," ")
          nodesto=paste0(nodesto,tempwordlist[[1]][2]," ")
          nodesproperty=paste0(nodesproperty,tempwordlist[[1]][1]," ")
        }
        else{
          temp_line=paste(tempwordlist[[1]][1],tempwordlist[[1]][2])
        }
        if (i==length(infolist[[j]])){
          nodesclasses=paste0(nodesclasses,classeswords," ")
          if (nchar(temp_line)>0){
            temp_line=paste("\t",temp_line,";","\n")
          }
          if (trigger1==0){
            esci=paste(esci,j)
            #title0=paste0("\"",title0,"\"")
            #temp_line=paste(temp_line,"\t","rdfs:label!!!",title0,".","\n")#,".","\n")
          }

        }
        else {
          if (nchar(temp_line)>0){
            temp_line=paste("\t",temp_line,";","\n")
          }
        }
      }
      # for multiple classes, paste each one after first class, seperate by comma
      else {
        classeswords=paste0(classeswords,",",tempword)
        line_rdf=paste0(line_rdf,", ",tempword)
        if (i==length(infolist[[j]])){
          nodesclasses=paste0(nodesclasses,classeswords," ")
          if (nchar(temp_line)>0){
            temp_line=paste("\t",temp_line,";","\n")
          }
          #title0=paste0("\"",title0,"\"")
          temp_line=paste(temp_line,";\n")#,"\t","rdfs:label",title0,";","\n")
        }
        else {
        }
      }
    }
    line_rdf=paste(line_rdf,temp_line)
    # save entities in a character sting variable
    line_rdf_vector[j]=line_rdf
  }
}
# get titles and IDs
titles=IDs=0
for (i in 1:length(infolist)){
  titles[i]=infolist[[i]][2]
  IDs[i]=infolist[[i]][3]
}

library(igraph)

nodesfrom2=strsplit(nodesfrom," ")
nodesto2=strsplit(nodesto," ")
nodesproperty2=strsplit(nodesproperty," ")
exceptnum=except=nodesfrom3=nodesto3=nodesproperty3=nodesnm=0
exceptwords=c("str:has","str:belongTo")

# use a diagonal matrix to find duplicate nodes when users manually typed in associations, and replace the
# default associations "str:has" and "str:belongTo" by user-defined associations
if ((length(nodesfrom2[[1]])-1)>=1){
for (i in 1:(length(nodesfrom2[[1]])-1)){
  for (j in ((i+1):length(nodesfrom2[[1]]))){
    if ((i!=j)&(length(nodesfrom2[[1]])>=j)&(length(nodesto2[[1]])>=j)){
      if((nodesfrom2[[1]][i]==nodesfrom2[[1]][j])&(nodesto2[[1]][i]==nodesto2[[1]][j])){
        exceptnum=exceptnum+1
        if (nodesproperty2[[1]][i] %in% exceptwords){
          except[exceptnum]=i
        }
        else { except[exceptnum]=j}
      }
    }
  }
}
  }

nodesnames2=strsplit(nodesnames," ")

# add nonexisting nodes to graph, even if they are not inserted as an entity
for (i in 1:length(nodesfrom2[[1]])){
  if (i %in% except){}
  else {
    nodesnm=nodesnm+1
    for (j in 1:length(IDs)){
      if (nodesfrom2[[1]][i]==IDs[j]){
        nodesfrom2[[1]][i]=titles[j]
        }
      else if (nodesto2[[1]][i]==IDs[j]){
        nodesto2[[1]][i]=titles[j]
        }
    }
    # add nonexisting nodes to graph, set class as AutoAdded

    if (nodesfrom2[[1]][i] %in% nodesnames2[[1]]){}
    else {
      nodesnames=paste0(nodesnames,nodesfrom2[[1]][i]," ")
      nodesclasses=paste0(nodesclasses,"AutoAdded"," ")
    }
    if (nodesto2[[1]][i] %in% nodesnames2[[1]]){}
    else {
      nodesnames=paste0(nodesnames,nodesto2[[1]][i]," ")
      nodesclasses=paste0(nodesclasses,"AutoAdded"," ")
    }
    nodesfrom3[nodesnm]=nodesfrom2[[1]][i]
    nodesto3[nodesnm]=nodesto2[[1]][i]
    nodesproperty3[nodesnm]=nodesproperty2[[1]][i]
    }
}


nodesnames2=strsplit(nodesnames," ")
nodesclasses2=strsplit(nodesclasses," ")

# add nodes and nesting information into a data frame
nodes <- data.frame(name = return_space(nodesnames2[[1]]),
                    class = nodesclasses2[[1]])

nesting <- data.frame(from = return_space(nodesfrom3),
                      to = return_space(nodesto3),
                      property = nodesproperty3)

g3 <- graph_from_data_frame(nesting, directed=TRUE, vertices=nodes)
E(g3)$label <- E(g3)$property

#print (nodes)
#print (nesting)
#print (esci)
# append manually type-in information to each entity
if (length(esci)>0){
escj=strsplit(esci," ")
#print (escj)
for (i in 1:length(line_rdf_vector)){
  if (i %in% escj[[1]]){
    #print (i)
    #line_rdf_vector[i]=paste(line_rdf_vector[i],"\t","rdfs:label","\"",title1[i],"\"",".","\n")
    tempnumber=which(nodesfrom3==titles[i])
  if (length(tempnumber)>0){
  for (j in 1:length(tempnumber)){
    #print ("j loop")
    #print (j)
    if (length(IDs[which(titles==nodesto3[tempnumber[j]])])>0){
      entityname2=paste0(prefix,":",IDs[which(titles==nodesto3[tempnumber[j]])])
    }
    else {
      entityname2=paste0(prefix,":",nodesto3[tempnumber[j]])
    }
    if (j==length(tempnumber)){
      #print ("j=l")
      #print (j)
      #print (tempnumber)
      #print (i)
      #print (line_rdf_vector[i])
      line_rdf_vector[i]=paste(line_rdf_vector[i],"\t",nodesproperty3[tempnumber[j]],entityname2,";","\n")
      line_rdf_vector[i]=paste(line_rdf_vector[i],"\t","rdfs:label","\"",title1[i],"\"",".","\n")
    }
    else{
      #print ("j!=l")
      #print (j)
      #print (tempnumber)
      #print (i)
      #print (line_rdf_vector[i])
      line_rdf_vector[i]=paste(line_rdf_vector[i],"\t",nodesproperty3[tempnumber[j]],entityname2,";","\n")
    }
    }
  }
  else # !if (length(tempnumber)>0)
  {line_rdf_vector[i]=paste(line_rdf_vector[i],"\t","rdfs:label","\"",title1[i],"\"",".","\n")}



  }
  else{
  tempnumber=which(nodesfrom3==titles[i])
  if (length(tempnumber)>0){
  for (j in 1:length(tempnumber)){
    if (length(IDs[which(titles==nodesto3[tempnumber[j]])])>0){
      entityname2=paste0(prefix,":",IDs[which(titles==nodesto3[tempnumber[j]])])
    }
    else {
      entityname2=paste0(prefix,":",nodesto3[tempnumber[j]])
    }
    if (j==length(tempnumber)){
      #print ("!")
      #print (j)
      #print (tempnumber)
      #print (i)
      #print (line_rdf_vector[i])
      line_rdf_vector[i]=paste(line_rdf_vector[i],"\t",nodesproperty3[tempnumber[j]],entityname2,";","\n")
      line_rdf_vector[i]=paste(line_rdf_vector[i],"\t","rdfs:label","\"",title1[i],"\"",".","\n")
    }
    else{
      #print ("!")
      #print (j)
      #print (tempnumber)
      #print (i)
      #print (line_rdf_vector[i])
      line_rdf_vector[i]=paste(line_rdf_vector[i],"\t",nodesproperty3[tempnumber[j]],entityname2,";","\n")
    }
    }
  }
    else # !if (length(tempnumber)>0)
  {line_rdf_vector[i]=paste(line_rdf_vector[i],"\t","rdfs:label","\"",title1[i],"\"",".","\n")}
  }#else

}
  }


# convert %20 to space
for (i in 1:length(line_rdf_vector)){
  line_rdf_vector[i]=return_space(line_rdf_vector[i])
  lines_rdf=paste(lines_rdf,line_rdf_vector[i])
}
# generate RDF file
if (rdf=="ttl"){
  write(lines_rdf,file=outputfile2)
print("Create a RDF file successfully. Please find the output file in:")
print(getwd())
print(paste("Your file name is:",outputfile2))
}
if (graph){
  if (HideAssociation==TRUE){
    nestinghide <- data.frame(from = nesting$from,
                      to = nesting$to,
                      property = rep("",nrow(nesting)))

    g3 <- graph_from_data_frame(nestinghide , directed=TRUE, vertices=nodes)
  }


E(g3)$label <- E(g3)$property
print(g3, e=TRUE, v=TRUE)
plot(g3, edge.arrow.size=.2, edge.curved=.4)
  }
}
  # original below (delet else):
  else if (dir_out == "" && file_out == "object") {
    lines
  }
  else if(rdf=="YW"){
    #YWoutline=readLines("YWoutline.txt")
    datetime <- format(Sys.time(), "%Y_%m_%d_%H_%M_%S")
    fileformat=".ttl" #".txt"
    outputfile2 <- paste("RDF_output_file_",datetime,fileformat,sep="")
    write(lines,file=outputfile2)


    YWoutline1=readLines(outputfile2)
    #print(YWoutline1)
    YWoutline=YWoutline1
    #print(YWoutline)
    #print(YWoutline)
    beginindex=grep("@begin",YWoutline)
    endindex=grep("@end",YWoutline)

    #beginindex=c(1,7,14,19,21)
    #endindex=c(12,17,25,26,27)
    content=as.character()
    addheader=as.character()
    if (length(beginindex)>1){
      i=1
      j=1
      l=length(endindex)
      leveldf=data.frame(start=rep(0,length(beginindex)),end=rep(0,length(endindex)))
      for (k in 1:(length(beginindex)+length(endindex))){
        if (i>length(beginindex)&j<=length(endindex)){
          for (m in nrow(leveldf):1){
            if (leveldf[m,2]==0){
              leveldf[m,2]=endindex[j]
              j=j+1
            }
          }
        }
        else{
          if (!is.na(beginindex[i])){

            if(beginindex[i]<endindex[j]){
              leveldf[i,1]=beginindex[i]
              i=i+1
            }
            else if(beginindex[i]>endindex[j]){
              leveldf[i-1,2]=endindex[j]
              j=j+1
            }
          }
        }
      }
    }
    leveldf

    list1=list(0)

    for (i in 1:nrow(leveldf)){
      jlines=as.numeric()
      templines=""
      ilines=c(leveldf$start[i]:leveldf$end[i])
      templines=YWoutline[ilines]
      for (j in 1:nrow(leveldf)){
        if (i!=j&(leveldf$start[j]>leveldf$start[i]&leveldf$end[j]<leveldf$end[i])){
          jlines=c(jlines,c(leveldf$start[j]:leveldf$end[j]))
          templines=templines[-jlines]
        }
      }

      delines=as.numeric()
      j=0

      for (j in 1:length(templines)){
        if (templines[j]==""){delines=c(delines,j)}
      }

      if(length(delines)>0){
        templines=templines[-delines]

      }


      if (i==1){
        #print (1)
        header=as.character()
        header=paste0(" @prefix myentity:    <http://www.myentity.example.com/> . \n",
                      " @prefix provone: <http://dataone.org/ns/provone#>  . \n")
        j=0
        for (j in 1:length(templines)){
          if(length(strsplit(templines[j],"@begin ")[[1]])>1){
            if(length(strsplit(templines[j],"@as ")[[1]])>1){
              entityname=strsplit(strsplit(templines[j],"@as ")[[1]][2]," ")[[1]][1]
            }
            else {
              entityname=strsplit(strsplit(templines[j],"@begin ")[[1]][2]," ")[[1]][1]
            }
            header=paste0(header, "\n","myentity",":",entityname," a ","provone:Workflow ")
          }
          else if (length(strsplit(templines[j],"@in ")[[1]])>1){
            if(length(strsplit(templines[j],"@as ")[[1]])>1){
              inputdata=strsplit(strsplit(templines[j],"@as ")[[1]][2]," ")[[1]][1]
            }
            else {
              inputdata=strsplit(strsplit(templines[j],"@in ")[[1]][2]," ")[[1]][1]
            }
            header=paste0(header," ;\n\t","provone:used ","\"",inputdata,"\" ")
          }
          else if (length(strsplit(templines[j],"@param ")[[1]])>1){
            if(length(strsplit(templines[j],"@as ")[[1]])>1){
              inputparam=strsplit(strsplit(templines[j],"@as ")[[1]][2]," ")[[1]][1]
            }
            else {
              inputparam=strsplit(strsplit(templines[j],"@param ")[[1]][2]," ")[[1]][1]
            }
            header=paste0(header,";\n\t","provone:used ","\"",inputparam,"\" ")
          }
          else if (length(strsplit(templines[j],"@out ")[[1]])>1){
            if(length(strsplit(templines[j],"@as ")[[1]])>1){
              output=strsplit(strsplit(templines[j],"@as ")[[1]][2]," ")[[1]][1]
            }
            else {
              output=strsplit(strsplit(templines[j],"@out ")[[1]][2]," ")[[1]][1]
            }
            header=paste0(header,";\n\t","provone:generate ","\"",output,"\" ")
          }
          else if (length(strsplit(templines[j],"@end ")[[1]])>1){

            #if(length(strsplit(templines[j],"@as ")[[1]])>1){
            #  endentity=strsplit(strsplit(templines[j],"@as ")[[1]][2]," ")[[1]][1]
            #}
            #else {
            #  endentity=strsplit(strsplit(templines[j],"@end ")[[1]][2]," ")[[1]][1]
            #}
            #header=paste0(header,".\n")
          }
        }
      }
      else{

        for (j in 1:length(templines)){
          if(length(strsplit(templines[j],"@begin ")[[1]])>1){
            if(length(strsplit(templines[j],"@as ")[[1]])>1){
              entityname=strsplit(strsplit(templines[j],"@as ")[[1]][2]," ")[[1]][1]
            }
            else {
              entityname=strsplit(strsplit(templines[j],"@begin ")[[1]][2]," ")[[1]][1]
            }
            content=paste0(content, "\n","myentity",":",entityname," a ","provone:ProcessExec ")
            addheader=paste0(addheader," ;\n\t","provone:wasInformedBy"," myentity",":",entityname)
          }
          else if (length(strsplit(templines[j],"@in ")[[1]])>1){
            if(length(strsplit(templines[j],"@as ")[[1]])>1){
              inputdata=strsplit(strsplit(templines[j],"@as ")[[1]][2]," ")[[1]][1]
            }
            else {
              inputdata=strsplit(strsplit(templines[j],"@in ")[[1]][2]," ")[[1]][1]
            }
            content=paste0(content," ;\n\t","provone:used ","\"",inputdata,"\" ")
          }
          else if (length(strsplit(templines[j],"@param ")[[1]])>1){
            if(length(strsplit(templines[j],"@as ")[[1]])>1){
              inputparam=strsplit(strsplit(templines[j],"@as ")[[1]][2]," ")[[1]][1]
            }
            else {
              inputparam=strsplit(strsplit(templines[j],"@param ")[[1]][2]," ")[[1]][1]
            }
            content=paste0(content,";\n\t","provone:used ","\"",inputparam,"\" ")
          }
          else if (length(strsplit(templines[j],"@out ")[[1]])>1){
            if(length(strsplit(templines[j],"@as ")[[1]])>1){
              output=strsplit(strsplit(templines[j],"@as ")[[1]][2]," ")[[1]][1]
            }
            else {
              output=strsplit(strsplit(templines[j],"@out ")[[1]][2]," ")[[1]][1]
            }
            content=paste0(content,";\n\t","provone:generate ","\"",output,"\" ")
          }
          else if (length(strsplit(templines[j],"@end ")[[1]])>1){
            if(length(strsplit(templines[j],"@as ")[[1]])>1){
              endentity=strsplit(strsplit(templines[j],"@as ")[[1]][2]," ")[[1]][1]
            }
            else {
              endentity=strsplit(strsplit(templines[j],"@end ")[[1]][2]," ")[[1]][1]
            }
            content=paste0(content,".\n")
          }
        }
      }

    }

    YWfile=paste0(header,addheader," .\n",content)
    #write(file,file="outputfile.ttl")

    write(YWfile,file=outputfile2)
    print("Create a RDF file successfully. Please find the output file in:")
    print(getwd())
    print(paste("Your file name is:",outputfile2))






  }
  else {
    cat(lines, file = path_out, sep = "\n")
  }
}

#   ____________________________________________________________________________
#   helper function: find_gran
# find maximal or minimal level of granularity used.
#' Find out granularity of a string vector
#'
#' This helper function takes a string (vector) as an input and searches for the
#' highest or lowest granularity. Granularity is defined in terms of a hash/
#' space sequence (see \code{\link{sum_str}}). The search is implemented using
#' a recursive approach.
#' @param direction either "up" or "down". Down means the algorithm starts with
#' the pattern containing a certain number of hashes (specified in
#' \code{highest}) and searches for a regex match. If it can't find one, it
#' reduces the number of hashes by one and searches again, until it finds a
#' pattern.
#' @param highest The highest level of granularity to search for.
#' @param lowest The lowest level of granularity to search for.
#' @param lines a character vector containing the lines to match against.
#' @examples
#'   strcode:::find_gran("down", highest = 3, lowest = 1, lines = c("##  .", "#   _"))
#' @keywords internal
#find_gran <- function(direction = "down", highest = 3, lowest = 1, lines) {
  # direction
#  if (direction == "up") {
#    current <- lowest
#    m <- 1
#  } else if (direction == "down") {
#    current <- highest
#    m <- -1
#  }

#    helper_find_gran <- function(direction) {
#    assert_number(current, lower = lowest, upper = highest)
#    pattern <- paste0("^", paste0(rep("#", current), collapse = ""),
#                      "\\s{", highest + 1 - current, "}.*$")
#    if (any(grepl(pattern, lines, perl = TRUE))) {
#      current
#    } else {
#      current <<- current + m * 1
#      helper_find_gran(direction = direction)
#    }
#  }
#  helper_find_gran(direction)
#}
XiaoliangJiang/IndependentStudy2017YW documentation built on May 29, 2019, 10:54 a.m.