Description Usage Arguments Details Value See Also Examples
View source: R/TextReuseTextDocument.R
This is the constructor function for TextReuseTextDocument
objects.
This class is used for comparing documents.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | TextReuseTextDocument(
text,
file = NULL,
meta = list(),
tokenizer = tokenize_ngrams,
...,
hash_func = hash_string,
minhash_func = NULL,
keep_tokens = FALSE,
keep_text = TRUE,
skip_short = TRUE
)
is.TextReuseTextDocument(x)
has_content(x)
has_tokens(x)
has_hashes(x)
has_minhashes(x)
|
text |
A character vector containing the text of the document. This
argument can be skipped if supplying |
file |
The path to a text file, if |
meta |
A list with named elements for the metadata associated with this
document. If a document is created using the |
tokenizer |
A function to split the text into tokens. See
|
... |
Arguments passed on to the |
hash_func |
A function to hash the tokens. See
|
minhash_func |
A function to create minhash signatures of the document.
See |
keep_tokens |
Should the tokens be saved in the document that is returned or discarded? |
keep_text |
Should the text be saved in the document that is returned or discarded? |
skip_short |
Should short documents be skipped? (See details.) |
x |
An R object to check. |
This constructor function follows a three-step process. It reads in
the text, either from a file or from memory. It then tokenizes that text.
Then it hashes the tokens. Most of the comparison functions in this package
rely only on the hashes to make the comparison. By passing FALSE
to
keep_tokens
and keep_text
, you can avoid saving those
objects, which can result in significant memory savings for large corpora.
If skip_short = TRUE
, this function will return NULL
for very
short or empty documents. A very short document is one where there are two
few words to create at least two n-grams. For example, if five-grams are
desired, then a document must be at least six words long. If no value of
n
is provided, then the function assumes a value of n = 3
. A
warning will be printed with the document ID of a skipped document.
An object of class TextReuseTextDocument
. This object inherits
from the virtual S3 class TextDocument
in the NLP
package. It contains the following elements:
The text of the document.
The tokens created from the text.
Hashes created from the tokens.
The minhash signature of the document.
The document metadata,
including the filename (if any) in file
.
Accessors for TextReuse objects.
1 2 3 4 5 6 7 8 9 10 |
TextReuseTextDocument
file : /usr/lib/R/site-library/textreuse/extdata/legal/ny1850-match.txt
hash_func : hash_string
id : ny1850
tokenizer : tokenize_ngrams
content : <U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD> 597. Every action must be prosecuted in the name
of the real party in interest, except as otherwise provided in section 599.
..a<U+FFFD><U+FFFD><U+FFFD>
5./imended Code, <U+FFFD><U+FFFD> 111.
<U+FFFD><U+FFFD>598. In the case of an assignmen$file
[1] "/usr/lib/R/site-library/textreuse/extdata/legal/ny1850-match.txt"
$hash_func
[1] "hash_string"
$id
[1] "ny1850"
$tokenizer
[1] "tokenize_ngrams"
NULL
[1] 864021270 -1576256129 -659164900 -1012025761 45449552 -1175961819
<U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD> 597. Every action must be prosecuted in the name
of the real party in interest, except as otherwise provided in section 599.
..a<U+FFFD><U+FFFD><U+FFFD>
5./imended Code, <U+FFFD><U+FFFD> 111.
<U+FFFD><U+FFFD>598. In the case of an assignment of a thing in
action, the action by the assignee is without prejudice
to any set-off or other defence existing at the time of, or
before notice of, the sssignment ; but this section does
not apply to a negotiable promissory note or bill of exchange transferred in good faith and upon good considerations, before due.
yfmended Code, <U+FFFD><U+FFFD> 112.
<U+FFFD><U+FFFD> 599. An executor or administrator, a trustee of an
express trust, or a person expressly authorised by statute,
may sue without joining with him the persons for
whose benefit the action is prosecuted. A person with
whom, or in whose name, a contract is made, for the
benefit of another, is a trustee of an express trust, within the meaning of this section.
<U+FFFD><U+FFFD> 602. When an infant is a party, he must appear by
guardian, who may be appointed by the court in which
the action is prosecuted, or by a judge thereof
Jlmended Code, <U+FFFD><U+FFFD> 115.
<U+FFFD><U+FFFD> 603. The guardian must be appointed as -follows:
1. When the infant is plaintiff, upon the application
of the infant, if he be of the age of fourteen years, or if
under that age, upon the application of some other party
to the action, or of a relative or friend of the infant:
2. When the infant is defendant, upon the application
of the infant, if he be of the age of fourteen years, and
apply within twenty days after the service of the summons. If he be under the age of fourteen, or neglect
so to apply, then upon the application of any other
party to the action, or of a relative or friend of the infant.
<U+FFFD><U+FFFD> 607. When a husband and father has deserted his
family, the wife and mother may prosecute or defend,
in his name, any action which he might have prosecuted or defended, and shall have the same powers and
rights therein as he might have had.
To provide for cases of great hardship, that sometimes
happen.
<U+FFFD><U+FFFD> 608. All persons having an interest in the subject
of the action, and in obtaining the relief demanded,
may be joined as plaintiffs, except when otherwise provided in this title.
Jmended Code, <U+FFFD><U+FFFD> 117.
<U+FFFD><U+FFFD>609. Any person may be made a defendant, who
has or claims an interest in the controversy, adverse to
the plainti&', or who is a necessary party to a complete
determination or settlement of the question involved
therein.
<U+FFFD><U+FFFD>610. Of the parties to the action, those who are
united in interest must be joined as plaintiffs or defendants; but if the consent of any one, who should have
been joined as plaintiff, cannot be obtained, he may be
made a defendant, the reason thereof being stated in
the complaint: and when the question is one of a
common or general interest of many persons, or when
the parties are numerous and it is impracticable to
bring them all before the court, one or more may sue
or defend for the benefit of all.
Jimended Code, <U+FFFD><U+FFFD>119.
<U+FFFD><U+FFFD> 611. Persons severally liable upon the same obligation or instrument, including the parties to bills of exchange and promissory notes, and sureties on the same
or separate instruments, may, all or any of them, be
included in the same action, at the option of the plaintiff.
Amended Code, <U+FFFD><U+FFFD>l20, amended.
<U+FFFD><U+FFFD>612. An action does not abate by the death, marriage or other disability of a party, or by the transfer of
any interest therein, if the cause of action survi've' or<U+FFFD><U+FFFD><U+FFFD>
continue. In case of the death, marriage, or other disabilityof a party, the court on motion, may allow the
action to be continued by or against his representative
or successor in interest. In case of any other transfer
of interest, the action may be continued in the name
of the original party; or the court may allow the person to whom the transfer is made to be substituted in
the action.
dmended Code, <U+FFFD><U+FFFD> 121.
<U+FFFD><U+FFFD> 613. The court may determine any controversy between pilrties before it, when <U+FFFD><U+FFFD><U+FFFD>it can be done without
prejudice to the rights of others, or by saving their
rights; but when a complete determination of the controversy cannot be had without the presence of other
parties, the court must order them to be brought in.
And when, in an action for the recovery of real or personal property. a person, not a party to the action, but
having an interest in the subject thereof, makes application to the court to be made a party, it may order
him to be brought in by the proper amendment.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.