meta: Metadata Management

Description Usage Arguments Details References See Also Examples

Description

Accessing and modifying metadata of text documents and corpora.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
## S3 method for class 'PCorpus'
meta(x, tag = NULL, type = c("indexed", "corpus", "local"), ...)
## S3 replacement method for class 'PCorpus'
meta(x, tag, type = c("indexed", "corpus", "local"), ...) <- value
## S3 method for class 'SimpleCorpus'
meta(x, tag = NULL, type = c("indexed", "corpus"), ...)
## S3 replacement method for class 'SimpleCorpus'
meta(x, tag, type = c("indexed", "corpus"), ...) <- value
## S3 method for class 'VCorpus'
meta(x, tag = NULL, type = c("indexed", "corpus", "local"), ...)
## S3 replacement method for class 'VCorpus'
meta(x, tag, type = c("indexed", "corpus", "local"), ...) <- value
## S3 method for class 'PlainTextDocument'
meta(x, tag = NULL, ...)
## S3 replacement method for class 'PlainTextDocument'
meta(x, tag = NULL, ...) <- value
## S3 method for class 'XMLTextDocument'
meta(x, tag = NULL, ...)
## S3 replacement method for class 'XMLTextDocument'
meta(x, tag = NULL, ...) <- value
DublinCore(x, tag = NULL)
DublinCore(x, tag) <- value

Arguments

x

For DublinCore a TextDocument, and for meta a TextDocument or a Corpus.

tag

a character giving the name of a metadatum. No tag corresponds to all available metadata.

type

a character specifying the kind of corpus metadata (see Details).

...

Not used.

value

replacement value.

Details

A corpus has two types of metadata. Corpus metadata ("corpus") contains corpus specific metadata in form of tag-value pairs. Document level metadata ("indexed") contains document specific metadata but is stored in the corpus as a data frame. Document level metadata is typically used for semantic reasons (e.g., classifications of documents form an own entity due to some high-level information like the range of possible values) or for performance reasons (single access instead of extracting metadata of each document). The latter can be seen as a from of indexing, hence the name "indexed". Document metadata ("local") are tag-value pairs directly stored locally at the individual documents.

DublinCore is a convenience wrapper to access and modify the metadata of a text document using the Simple Dublin Core schema (supporting the 15 metadata elements from the Dublin Core Metadata Element Set https://dublincore.org/documents/dces/).

References

Dublin Core Metadata Initiative. https://dublincore.org/

See Also

meta for metadata in package NLP.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
data("crude")
meta(crude[[1]])
DublinCore(crude[[1]])
meta(crude[[1]], tag = "topics")
meta(crude[[1]], tag = "comment") <- "A short comment."
meta(crude[[1]], tag = "topics") <- NULL
DublinCore(crude[[1]], tag = "creator") <- "Ano Nymous"
DublinCore(crude[[1]], tag = "format") <- "XML"
DublinCore(crude[[1]])
meta(crude[[1]])
meta(crude)
meta(crude, type = "corpus")
meta(crude, "labels") <- 21:40
meta(crude)

Example output

Loading required package: NLP
  author       : character(0)
  datetimestamp: 1987-02-26 17:00:56
  description  : 
  heading      : DIAMOND SHAMROCK (DIA) CUTS CRUDE PRICES
  id           : 127
  language     : en
  origin       : Reuters-21578 XML
  topics       : YES
  lewissplit   : TRAIN
  cgisplit     : TRAINING-SET
  oldid        : 5670
  places       : usa
  people       : character(0)
  orgs         : character(0)
  exchanges    : character(0)
  contributor: character(0)
  coverage   : character(0)
  creator    : character(0)
  date       : 1987-02-26 17:00:56
  description: 
  format     : character(0)
  identifier : 127
  language   : en
  publisher  : character(0)
  relation   : character(0)
  rights     : character(0)
  source     : character(0)
  subject    : character(0)
  title      : DIAMOND SHAMROCK (DIA) CUTS CRUDE PRICES
  type       : character(0)
[1] "YES"
  contributor: character(0)
  coverage   : character(0)
  creator    : Ano Nymous
  date       : 1987-02-26 17:00:56
  description: 
  format     : XML
  identifier : 127
  language   : en
  publisher  : character(0)
  relation   : character(0)
  rights     : character(0)
  source     : character(0)
  subject    : character(0)
  title      : DIAMOND SHAMROCK (DIA) CUTS CRUDE PRICES
  type       : character(0)
  author       : Ano Nymous
  datetimestamp: 1987-02-26 17:00:56
  description  : 
  heading      : DIAMOND SHAMROCK (DIA) CUTS CRUDE PRICES
  id           : 127
  language     : en
  origin       : Reuters-21578 XML
  lewissplit   : TRAIN
  cgisplit     : TRAINING-SET
  oldid        : 5670
  places       : usa
  people       : character(0)
  orgs         : character(0)
  exchanges    : character(0)
  comment      : A short comment.
  format       : XML
data frame with 0 columns and 20 rows
list()
attr(,"class")
[1] "CorpusMeta"
   labels
1      21
2      22
3      23
4      24
5      25
6      26
7      27
8      28
9      29
10     30
11     31
12     32
13     33
14     34
15     35
16     36
17     37
18     38
19     39
20     40

tm documentation built on April 7, 2021, 3:01 a.m.