runAnalysis: Perform scicloud analysis

Description Usage Arguments Value Author(s) See Also Examples

View source: R/2_runAnalysis.R

Description

The second function to be called to perform the analysis with scicloud after createScicloudList. It outputs a list of 4 components: IndVal, metaMatrix, RepresentativePapers and wordList for further use with inspectScicloud.

The function performs the analysis depending on the method argument. By default, the method is set to 'hclust' that identifies clusters using hclust. The clusters are publication communities based on the words used in the papers. To then identify the words relevant to the communities, it runs an indicator species analysis. Each word receives an indicator species value by indval for each cluster, showing how representative each word is within a cluster. The top representative words will then be visualized with the following plots:

The 'network' method on the other hand also employs a clustering approach, but uses a network analysis. When done, it returns a list of global and local measures and also generates a clustered matrix. This matrix can then be further processed in network programs like Gephi.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
runAnalysis(
  scicloudList,
  numberOfClusters = NA,
  dendrogram = TRUE,
  dendroLabels = c("truncated", "break"),
  minWordsPerCluster = 5,
  maxWordsPerCluster = 10,
  p = 0.05,
  exactPosition = FALSE,
  sortby = c("Eigenvector", "Degree", "Closeness", "Betweenness"),
  keep = 0.33,
  saveToWd = FALSE,
  method = c("hclust", "network", "both")
)

Arguments

scicloudList

output of createScicloudList

numberOfClusters

integer or NA; must be an integer value not more than 14 as more than 14 clusters are not recommended. An integer sets the number of clusters manually. For NA, the function automatically calculates the optimum number of clusters for a range of 1 till 12 possible clusters

dendrogram

logical, whether or not to show a dendrogram of the calculated clusters.

dendroLabels

allows "truncated" or "break". This either truncates the labels of the dendrogram leaves or puts a line break. Line breaks are not recommended for a large number of PDFs.

minWordsPerCluster

minimum number of words per cluster to be plotted in the wordcloud.

maxWordsPerCluster

maximum number of words per cluster to be plotted in the wordcloud.

p

the p-value that sets the significance level of individual words for the indicator species analysis. Only significant words will be plotted.

exactPosition

logical, the wordcloud tries to avoid overlapping labels for the sake of visual simplicity over perfect precision. When set to TRUE, the words position will be marked by a dot and the label will be connected with a line to it.

sortby

for the network method: the centrality measure to sort the words by, default is Eigenvector. Allows the following possible inputs: "Eigenvector", "Degree", "Closeness, "Betweenness".

keep

for the network method: numeric, keeps by default 0.33 of all the words, sorted by the argument given by sortby. A smaller amount of words to keep facilitates computations for later use.

saveToWd

a logical parameter whether or not to save the return of the function to the working directory. This is especially useful for later analysis steps. The file can be read in by using readRDS.

method

takes "network", "hclust" or "both" as a method

Value

'hclust' returns a list with the following components:


'network' returns a list with the following components:

Author(s)

Creator of the scicloud workflow: Henrik von Wehrden, henrik.von_wehrden@leuphana.de

Code by: Matthias Nachtmann, matthias.nachtmann@stud.leuphana.de, Lisa Gotzian, lisa.gotzian@stud.leuphana.de, Jia Yan Ng, Jia.Y.Ng@stud.leuphana.de, Johann Julius Beeck, johann.j.beeck@stud.leuphana.de

First version of scicloud: Matthias Nachtmann, matthias.nachtmann@stud.leuphana.de

See Also

Other scicloud functions: createScicloudList(), deleteRDS(), inspectScicloud(), searchScopus()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Not run: 

### Workflow of performing analysis using scicloud
myAPIKey <- "YOUR_API_KEY"
# retrieving data from PDFs and Scorpus website using API
scicloudList <- createScicloudList(myAPIKey = myAPIKey)

# Run the analysis with a specified no. of cluster
scicloudAnalysis <- runAnalysis(scicloudList = scicloudList, numberOfClusters = 4)

# Generate a summary of the analysis
scicloudSpecs <- inspectScicloud(scicloudAnalysis)

## End(Not run)

LisaGotzian/scicloud documentation built on March 29, 2021, 5:52 a.m.