createScicloudList: Create a scicloud list

Description Usage Arguments Value Author(s) See Also Examples

View source: R/1_createScicloudList.R

Description

The first function to be called to perform the analysis with scicloud. It outputs a list of 3 components: metaMatrix, Tf_Idf and wordList for further use with runAnalysis.
The function takes all scientific papers as PDF files from the "PDFs" folder in your working directory or any other specified directory to create a metaMatrix. It then further pre-processes the text (e.g. by stemming words with stemWords) and outputs a tf-idf matrix. As a last step, it fetches the papers' metadata from Scopus for which you'll need an Elsevier API key (https://dev.elsevier.com/index.jsp).
You have the option to limit the words to be used in the analysis with the argument 'keepWordsFile'.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
createScicloudList(
  directory = file.path(".", "PDFs"),
  scopusList = NA,
  myAPIKey = NA,
  language = "SMART",
  stemWords = TRUE,
  saveToWd = FALSE,
  ignoreWords = c(),
  keepWordsFile = NA,
  generateWordlist = FALSE
)

Arguments

directory

per default, the PDFs are expected to be in a folder named "PDFs", can be changed ad. lib.

scopusList

a finished metaMatrix from searchScopus

myAPIKey

your private API key for communicating with the Scopus API. You can request one at https://dev.elsevier.com/.

language

this defines the language of the stopwords to be filtered. The default is "SMART". Look at stopwords for more information.

stemWords

logical variable which is passed to processMetaDataMatrix.

saveToWd

a logical parameter whether or not to save the output of the function to the working directory. This is especially useful for later analysis steps. The file can be read in by using readRDS.

ignoreWords

a vector of words to be ignored which is passed to processMetaDataMatrix.

keepWordsFile

path to a .csv-file that specifies which words to keep for the analysis. Accepts 0/1 behind each word or takes the words as they are and disregards all other words for the analysis. If no word list is provided, all words are used.
You can generate a list with all words used in the current analysis by setting generateWordlist to TRUE. If you intend to use this option, delete all words you don't need and re-run the function with the updated word list by specifying keepWordsFile.

generateWordlist

logical, if set to TRUE, it generates a wordlist in your working directory. You can now add a 0/1 behind each word or delete rows you don't consider important to the analysis.

Value

Returns a list with the following components:

Author(s)

Creator of the scicloud workflow: Henrik von Wehrden, henrik.von_wehrden@leuphana.de

Code by: Jia Yan Ng, Jia.Y.Ng@stud.leuphana.de, Johann Julius Beeck, johann.j.beeck@stud.leuphana.de, Lisa Gotzian, lisa.gotzian@stud.leuphana.de, Prabesh Dhakal, prabesh.dhakal@stud.leuphana.de

First version of scicloud: Matthias Nachtmann, matthias.nachtmann@stud.leuphana.de

See Also

Other scicloud functions: deleteRDS(), inspectScicloud(), runAnalysis(), searchScopus()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Not run: 

### Workflow of performing analysis using scicloud
myAPIKey <- "YOUR_API_KEY"
# retrieving data from PDFs and Scorpus website using API
scicloudList <- createScicloudList(myAPIKey = myAPIKey)

# Run the analysis with a specified no. of cluster
scicloudAnalysis <- runAnalysis(scicloudList = scicloudList, numberOfClusters = 4)

# Generate a summary of the analysis
scicloudSpecs <- inspectScicloud(scicloudAnalysis)

## End(Not run)

LisaGotzian/scicloud documentation built on March 29, 2021, 5:52 a.m.