Description Usage Arguments Value Author(s) See Also Examples
View source: R/2_runAnalysis.R
The second function to be called to perform the analysis with
scicloud after createScicloudList
. It outputs a list of 4
components: IndVal, metaMatrix, RepresentativePapers and wordList for
further use with inspectScicloud
.
The function performs the analysis depending on the method argument. By
default, the method is set to 'hclust' that identifies clusters using
hclust
. The clusters are publication communities based
on the words used in the papers. To then identify the words relevant to the
communities, it runs an indicator species analysis. Each word receives an
indicator species value by indval
for each cluster,
showing how representative each word is within a cluster. The top representative
words will then be visualized with the following plots:
a dendrogram of the clusters
a wordcloud of the publication communities
four visualizations of the communities by year and number of citations (which have been fetched from the Scopus API)
The 'network' method on the other hand also employs a clustering approach, but uses a network analysis. When done, it returns a list of global and local measures and also generates a clustered matrix. This matrix can then be further processed in network programs like Gephi.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | runAnalysis(
scicloudList,
numberOfClusters = NA,
dendrogram = TRUE,
dendroLabels = c("truncated", "break"),
minWordsPerCluster = 5,
maxWordsPerCluster = 10,
p = 0.05,
exactPosition = FALSE,
sortby = c("Eigenvector", "Degree", "Closeness", "Betweenness"),
keep = 0.33,
saveToWd = FALSE,
method = c("hclust", "network", "both")
)
|
scicloudList |
output of |
numberOfClusters |
integer or NA; must be an integer value not more than 14 as more than 14 clusters are not recommended. An integer sets the number of clusters manually. For NA, the function automatically calculates the optimum number of clusters for a range of 1 till 12 possible clusters |
dendrogram |
logical, whether or not to show a dendrogram of the calculated clusters. |
dendroLabels |
allows "truncated" or "break". This either truncates the labels of the dendrogram leaves or puts a line break. Line breaks are not recommended for a large number of PDFs. |
minWordsPerCluster |
minimum number of words per cluster to be plotted in the wordcloud. |
maxWordsPerCluster |
maximum number of words per cluster to be plotted in the wordcloud. |
p |
the p-value that sets the significance level of individual words for the indicator species analysis. Only significant words will be plotted. |
exactPosition |
logical, the wordcloud tries to avoid overlapping
labels for the sake of visual simplicity over perfect precision.
When set to |
sortby |
for the network method: the centrality measure to sort the words by, default is Eigenvector. Allows the following possible inputs: "Eigenvector", "Degree", "Closeness, "Betweenness". |
keep |
for the network method: numeric, keeps by default 0.33 of all the
words, sorted by the argument given by |
saveToWd |
a logical parameter whether or not to save the return of the
function to the working directory. This is especially useful for later
analysis steps. The file can be read in by using |
method |
takes "network", "hclust" or "both" as a method |
'hclust' returns a list with the following components:
IndVal
: the results of the indicator species analysis.
metaMatrix
: the metaMatrix that has been pre-processed
RepresentativePapers
: a dataframe of the most representative
papers of each publication community. Papers are representative if they contain
the highest number of significant words.
wordList
: a list of all words that have been used in the analysis.
'network' returns a list with the following components:
LocalMeasures
: local measures for both papers and words
ReducedLocalMeasures
: 1/3 of the words (!) with their
centrality measures & clustering according to three different clustering
methods, arranged by default by eigenvector centrality using sortby
ReducedIncidenceMatrix
: 1/3 of the words arranged by
eigenvector centrality, to be further processed e.g. in Gephi or with other
clustering functions
GlobalMeasures
: global measures of the network
Creator of the scicloud workflow: Henrik von Wehrden,
henrik.von_wehrden@leuphana.de
Code by: Matthias Nachtmann,
matthias.nachtmann@stud.leuphana.de,
Lisa Gotzian, lisa.gotzian@stud.leuphana.de,
Jia Yan Ng, Jia.Y.Ng@stud.leuphana.de,
Johann Julius Beeck, johann.j.beeck@stud.leuphana.de
First version of scicloud: Matthias Nachtmann, matthias.nachtmann@stud.leuphana.de
Other scicloud functions:
createScicloudList()
,
deleteRDS()
,
inspectScicloud()
,
searchScopus()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ## Not run:
### Workflow of performing analysis using scicloud
myAPIKey <- "YOUR_API_KEY"
# retrieving data from PDFs and Scorpus website using API
scicloudList <- createScicloudList(myAPIKey = myAPIKey)
# Run the analysis with a specified no. of cluster
scicloudAnalysis <- runAnalysis(scicloudList = scicloudList, numberOfClusters = 4)
# Generate a summary of the analysis
scicloudSpecs <- inspectScicloud(scicloudAnalysis)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.