rebuild_data_source_cocit: Rebuild CoCit from papern

Description Usage Arguments Details Value Author(s) Examples

View source: R/rebuild_data_source_cocit.R

Description

This function builds a cocitations Table with DOI Information based on a paperTable. The Papers are normally extracted from the Web of Knowledge in csv Format. Make sure to add a row with Paper numbers as these are required to run the function.

Usage

1
rebuild_data_source_cocit(paperTable, ignoreCRs)

Arguments

paperTable

An imported csv table from web of knowledge with papern.No for each Row. File can be read like this: read.csv("file", sep = ";", header = TRUE, skip = 1) Make sure to add a column with the Name Papern.No. This row will be used to assign the citations to a paper in the generation

ignoreCRs

old

Details

Attention, this function can run for more than an hour based on the number ob papers given. It can run around 1 hour for around 9000 inidivual papers.

Value

A Dataframe with the following columns: PNo, autor, jahr, journal, version, seite, CR, DOI PNo: Paper Number autor: Author name jahr: Year of publishing journal: Journalname version: Version name of the Journal seite: Page on which the citation was released CR: Citationnumber, which is initilizied with 0 DOI: Digital Object Identifier

If a Value is not found its just replaced with an empty string, expect the year, which gets replaced with a String "None"

Author(s)

MFinst

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (paperTable, ignoreCRs = FALSE)
{
    start_time = Sys.time()
    PNos = paperTable$Papern.No
    CRs = paperTable$CR
    papersLength = length(PNos)
    CRPlaceholder = "0"
    allCRs = as.data.frame(x = c(), autor = c(), jahr = c(),
        journal = c(), version = c(), seite = c(), CR = c(),
        DOI = c(), stringsAsFactors = FALSE)
    allCRs = rbind(allCRs, c("x", "autor", "jahr", "journal",
        "version", "seite", "CR", "DOI"))
    allCRs = type.convert(x = allCRs, as.is = TRUE)
    allIgnoredCits = vector(mode = "character", length = 0)
    names(allCRs)[1] = "x"
    names(allCRs)[2] = "autor"
    names(allCRs)[3] = "jahr"
    names(allCRs)[4] = "journal"
    names(allCRs)[5] = "version"
    names(allCRs)[6] = "seite"
    names(allCRs)[7] = "CR"
    names(allCRs)[8] = "DOI"
    for (i in 1:papersLength) {
        print(paste(i, "of", papersLength))
        cocits = strsplit(x = as.character(CRs[i]), split = ";")
        rows = vector(mode = "character", length = 0)
        if (length(cocits[[1]] > 0)) {
            for (y in 1:length(cocits[[1]])) {
                citationInfoList = strsplit(cocits[[1]][y], ",")
                DOI = grep(x = citationInfoList[[1]], pattern = "DOI ",
                  value = TRUE)
                if (length(DOI) == 0) {
                  DOI = ""
                }
                else {
                  if (length(DOI) > 1) {
                    DOI = DOI[1]
                    DOI = gsub("\\[", "", x = DOI)
                  }
                  DOI = gsub("DOI ", "", x = DOI)
                }
                author = trimws(citationInfoList[[1]][1])
                year = trimws(citationInfoList[[1]][2])
                if (is.na(as.numeric(year))) {
                  year = "None"
                }
                if (length(citationInfoList[[1]]) >= 3) {
                  journal = trimws(citationInfoList[[1]][3])
                }
                else {
                  journal = ""
                }
                if (length(citationInfoList[[1]]) >= 4) {
                  version = trimws(citationInfoList[[1]][4])
                  if (!startsWith(version, "V")) {
                    version = ""
                  }
                }
                else {
                  version = ""
                }
                if (length(citationInfoList[[1]]) >= 5) {
                  seite = trimws(citationInfoList[[1]][5])
                  if (!startsWith(seite, "P")) {
                    seite = ""
                  }
                }
                else {
                  seite = ""
                }
                allCRs = rbind(allCRs, c(as.character(PNos[i]),
                  as.character(author), as.character(year), as.character(journal),
                  as.character(version), as.character(seite),
                  as.character(CRPlaceholder), as.character(DOI)))
            }
        }
    }
    end_time = Sys.time()
    final_time = end_time - start_time
    print(final_time)
    return(allCRs)
  }

mfinst/TM-CoCit-Support-FM documentation built on March 4, 2020, 8:38 p.m.