MBoxSource | R Documentation |
Create a mailbox source.
MBoxSource(mbox, format = "mbox", delim = NULL)
mbox |
a character string giving the path or URL to a mailbox stored in “mbox” format. |
format |
a character string giving the mbox format to use, with
possible values |
delim |
a character string giving a regexp to use for finding the
‘From ’ lines delimiting the messages, or |
A mailbox source interprets each e-mail stored in the mailbox as a document.
‘Mbox’ is a generic term for a family of related file formats used for holding collections of email messages. The messages are stored in a single mailbox text file separated by lines starting with the four characters ‘From’ followed by a space (the so-called ‘From ’ lines) and the sender's email address.
Clearly, there will be a problem if the message bodies contain lines which also start with ‘From’ followed by a space. There are four common variants of the mbox format to deal with this problem: in mboxo and mboxrd such lines get a greater-than sign prepended, whereas in mboxcl and mboxcl2 a ‘Content-Length:’ header field is used to record the message lengths. For more information, see https://en.wikipedia.org/wiki/Mbox and https://www.loc.gov/preservation/digital/formats/fdd/fdd000383.shtml which in turn points to https://www.loc.gov/preservation/digital/formats/fdd/fdd000384.shtml and https://www.loc.gov/preservation/digital/formats/fdd/fdd000385.shtml for the mboxo and mboxrd extensions.
The above LoC web page suggests that the ‘From ’ lines are
always of the form
‘From sender date moreinfo’
where sender is one word without spaces or tabs and date
(the delivery date of the message) always contains exactly 24
characters in Standard C asctime format. Thus, for the mbox
format, the default delimiter regexp for ‘From ’ lines actually
matches this form (with some timezone variants). For the mboxo
and mboxrd variants, the default delimiter regexp is
"^From "
.
The getElem()
method for class MBoxSource
strips the prepended greater-than signs for the mboxo and
mboxrd formats.
An object inheriting from MBoxSource
,
SimpleSource
,
and Source
.
Ingo Feinerer and Kurt Hornik
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.