MBoxSource: Mailbox Source

View source: R/source.R

MBoxSourceR Documentation

Mailbox Source

Description

Create a mailbox source.

Usage

MBoxSource(mbox, format = "mbox", delim = NULL)

Arguments

mbox

a character string giving the path or URL to a mailbox stored in “mbox” format.

format

a character string giving the mbox format to use, with possible values "mbox" (default), "mboxo", and "mboxrd".

delim

a character string giving a regexp to use for finding the ‘From ’ lines delimiting the messages, or NULL (default), which provides suitable regexps according to the mbox format.

Details

A mailbox source interprets each e-mail stored in the mailbox as a document.

‘Mbox’ is a generic term for a family of related file formats used for holding collections of email messages. The messages are stored in a single mailbox text file separated by lines starting with the four characters ‘From’ followed by a space (the so-called ‘From ’ lines) and the sender's email address.

Clearly, there will be a problem if the message bodies contain lines which also start with ‘From’ followed by a space. There are four common variants of the mbox format to deal with this problem: in mboxo and mboxrd such lines get a greater-than sign prepended, whereas in mboxcl and mboxcl2 a ‘Content-Length:’ header field is used to record the message lengths. For more information, see https://en.wikipedia.org/wiki/Mbox and https://www.loc.gov/preservation/digital/formats/fdd/fdd000383.shtml which in turn points to https://www.loc.gov/preservation/digital/formats/fdd/fdd000384.shtml and https://www.loc.gov/preservation/digital/formats/fdd/fdd000385.shtml for the mboxo and mboxrd extensions.

The above LoC web page suggests that the ‘From ’ lines are always of the form ‘From sender date moreinfo’ where sender is one word without spaces or tabs and date (the delivery date of the message) always contains exactly 24 characters in Standard C asctime format. Thus, for the mbox format, the default delimiter regexp for ‘From ’ lines actually matches this form (with some timezone variants). For the mboxo and mboxrd variants, the default delimiter regexp is "^From ".

The getElem() method for class MBoxSource strips the prepended greater-than signs for the mboxo and mboxrd formats.

Value

An object inheriting from MBoxSource, SimpleSource, and Source.

Author(s)

Ingo Feinerer and Kurt Hornik


tm.plugin.mail documentation built on Sept. 12, 2024, 5:07 p.m.