Nothing
#' qdapDictionaries
#'
#' A collection of dictionaries and Word Lists to Accompany the qdap Package
#' @docType package
#' @name qdapDictionaries
#' @aliases qdapDictionaries package-qdapDictionaries
NULL
#' Augmented List of Grady Ward's English Words and Mark Kantrowitz's Names List
#'
#' A dataset containing a vector of Grady Ward's English words augmented with
#' \code{\link[qdapDictionaries]{DICTIONARY}},
#' \href{http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names}{Mark Kantrowitz's names list},
#' other proper nouns, and contractions.
#'
#' @details A dataset containing a vector of Grady Ward's English words
#' augmented with proper nouns (U.S. States, Countries, Mark Kantrowitz's Names List,
#' and months) and contractions. That dataset is augmented for spell checking purposes.
#'
#' @docType data
#' @keywords datasets
#' @name GradyAugmented
#' @usage data(GradyAugmented)
#' @format A vector with 122806 elements
#' @references Moby Thesaurus List by Grady Ward \url{http://www.gutenberg.org} \cr \cr
#' List of names from Mark Kantrowitz \url{http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/}.
#' A copy of the \href{http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/readme.txt}{README}
#' is available \href{http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/readme.txt}{here}
#' per the author's request.
NULL
#' Buckley & Salton Stopword List
#'
#' A stopword list containing a character vector of stopwords.
#'
#' @details \href{http://www.lextek.com/manuals/onix/stopwords2.html}{From Onix Text Retrieval Toolkit API Reference}:
#' "This stopword list was built by Gerard Salton and Chris Buckley for the
#' experimental SMART information retrieval system at Cornell University.
#' This stopword list is generally considered to be on the larger side and so
#' when it is used, some implementations edit it so that it is better suited
#' for a given domain and audience while others use this stopword list as it
#' stands."
#'
#' @note Reduced from the original 571 words to 546.
#'
#' @docType data
#' @keywords datasets
#' @name BuckleySaltonSWL
#' @usage data(BuckleySaltonSWL)
#' @format A character vector with 546 elements
#' @references \url{http://www.lextek.com/manuals/onix/stopwords2.html}
NULL
#' Nettalk Corpus Syllable Data Set
#'
#' A dataset containing syllable counts.
#'
#' @note This data set is based on the Nettalk Corpus but has some researcher
#' word deletions and additions based on the needs of the
#' \code{\link[qdap]{syllable_sum}} algorithm.
#'
#' @details
#' \itemize{
#' \item word. The word
#' \item syllables. Number of syllables
#' }
#'
#' @docType data
#' @keywords datasets
#' @name DICTIONARY
#' @usage data(DICTIONARY)
#' @format A data frame with 20137 rows and 2 variables
#' @references Sejnowski, T.J., and Rosenberg, C.R. (1987). "Parallel networks
#' that learn to pronounce English text" in Complex Systems, 1, 145-168.
#' Retrieved from: \url{http://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Nettalk+Corpus)}
#'
#' \href{http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/nettalk/}{UCI Machine Learning Repository website}
NULL
#' Onix Text Retrieval Toolkit Stopword List 1
#'
#' A stopword list containing a character vector of stopwords.
#'
#' @details \href{http://www.lextek.com/manuals/onix/stopwords1.html}{From Onix Text Retrieval Toolkit API Reference}:
#' "This stopword list is probably the most widely used stopword list. It
#' covers a wide number of stopwords without getting too aggressive and
#' including too many words which a user might search upon."
#'
#' @note Reduced from the original 429 words to 404.
#'
#' @docType data
#' @keywords datasets
#' @name OnixTxtRetToolkitSWL1
#' @usage data(OnixTxtRetToolkitSWL1)
#' @format A character vector with 404 elements
#' @references \url{http://www.lextek.com/manuals/onix/stopwords1.html}
NULL
#' Fry's 100 Most Commonly Used English Words
#'
#' A stopword list containing a character vector of stopwords.
#'
#' @details Fry's Word List: The first 25 make up about one-third of all printed
#' material in English. The first 100 make up about one-half of all printed
#' material in English. The first 300 make up about 65\% of all printed
#' material in English."
#'
#'
#' @docType data
#' @keywords datasets
#' @name Top100Words
#' @usage data(Top100Words)
#' @format A character vector with 100 elements
#' @references Fry, E. B. (1997). Fry 1000 instant words. Lincolnwood, IL:
#' Contemporary Books.
NULL
#' Fry's 200 Most Commonly Used English Words
#'
#' A stopword list containing a character vector of stopwords.
#'
#' @details Fry's Word List: The first 25 make up about one-third of all printed
#' material in English. The first 100 make up about one-half of all printed
#' material in English. The first 300 make up about 65\% of all printed
#' material in English."
#'
#'
#' @docType data
#' @keywords datasets
#' @name Top200Words
#' @usage data(Top200Words)
#' @format A character vector with 200 elements
#' @references Fry, E. B. (1997). Fry 1000 instant words. Lincolnwood, IL:
#' Contemporary Books.
NULL
#' Fry's 25 Most Commonly Used English Words
#'
#' A stopword list containing a character vector of stopwords.
#'
#' @details Fry's Word List: The first 25 make up about one-third of all printed
#' material in English. The first 100 make up about one-half of all printed
#' material in English. The first 300 make up about 65\% of all printed
#' material in English."
#'
#' @docType data
#' @keywords datasets
#' @name Top25Words
#' @usage data(Top25Words)
#' @format A character vector with 25 elements
#' @references Fry, E. B. (1997). Fry 1000 instant words. Lincolnwood, IL:
#' Contemporary Books.
NULL
#' Fry's 1000 Most Commonly Used English Words
#'
#' A stopword list containing a character vector of stopwords.
#'
#' @details Fry's 1000 Word List makes up 90\% of all printed text.
#'
#' @docType data
#' @keywords datasets
#' @name Fry_1000
#' @usage data(Fry_1000)
#' @format A vector with 1000 elements
#' @references Fry, E. B. (1997). Fry 1000 instant words. Lincolnwood, IL:
#' Contemporary Books.
NULL
#' Leveled Dolch List of 220 Common Words
#'
#' Edward William Dolch's list of 220 Most Commonly Used Words by reading level.
#'
#' @details Dolch's Word List made up 50-75\% of all printed text in 1936.
#' \itemize{
#' \item Word. The word
#' \item Level. The reading level of the word
#' }
#'
#' @docType data
#' @keywords datasets
#' @name Leveled_Dolch
#' @usage data(Leveled_Dolch)
#' @format A data frame with 220 rows and 2 variables
#' @references Dolch, E. W. (1936). A basic sight vocabulary. Elementary School
#' Journal, 36, 456-460.
NULL
#' Dolch List of 220 Common Words
#'
#' Edward William Dolch's list of 220 Most Commonly Used Words.
#'
#' @details Dolch's Word List made up 50-75\% of all printed text in 1936.
#'
#' @docType data
#' @keywords datasets
#' @name Dolch
#' @usage data(Dolch)
#' @format A vector with 220 elements
#' @references Dolch, E. W. (1936). A basic sight vocabulary. Elementary School
#' Journal, 36, 456-460.
NULL
#' Small Abbreviations Data Set
#'
#' A dataset containing abbreviations and their qdap friendly form.
#'
#' @details
#' \itemize{
#' \item abv. Common transcript abbreviations
#' \item rep. qdap representation of those abbreviations
#' }
#'
#' @docType data
#' @keywords datasets
#' @name abbreviations
#' @usage data(abbreviations)
#' @format A data frame with 14 rows and 2 variables
NULL
#' Action Word List
#'
#' A dataset containing a vector of action words. This is a subset of the
#' Moby project: Moby Part-of-Speech.
#'
#' @details
#' From Grady Ward's Moby project:
#' "This second edition is a particularly thorough revision of the original Moby
#' Part-of-Speech. Beyond the fifteen thousand new entries, many thousand more
#' entries have been scrutinized for correctness and modernity. This is
#' unquestionably the largest P-O-S list in the world. Note that the many included
#' phrases means that parsing algorithms can now tokenize in units larger than a
#' single word, increasing both speed and accuracy."
#'
#' @docType data
#' @keywords datasets
#' @name action.verbs
#' @usage data(action.verbs)
#' @format A vector with 1569 elements
NULL
#' Adverb Word List
#'
#' A dataset containing a vector of adverbs words. This is a subset of the
#' Moby project: Moby Part-of-Speech.
#'
#' @details
#' From Grady Ward's Moby project:
#' "This second edition is a particularly thorough revision of the original Moby
#' Part-of-Speech. Beyond the fifteen thousand new entries, many thousand more
#' entries have been scrutinized for correctness and modernity. This is
#' unquestionably the largest P-O-S list in the world. Note that the many included
#' phrases means that parsing algorithms can now tokenize in units larger than a
#' single word, increasing both speed and accuracy."
#'
#' @docType data
#' @keywords datasets
#' @name adverb
#' @usage data(adverb)
#' @format A vector with 13398 elements
NULL
#' Contraction Conversions
#'
#' A dataset containing common contractions and their expanded form.
#'
#' @details
#' \itemize{
#' \item contraction. The contraction word.
#' \item expanded. The expanded form of the contraction.
#' }
#'
#' @docType data
#' @keywords datasets
#' @name contractions
#' @usage data(contractions)
#' @format A data frame with 70 rows and 2 variables
NULL
#' Emoticons Data Set
#'
#' A dataset containing common emoticons (adapted from
#' \href{http://www.lingo2word.com/lists/emoticon_listH.html}{Popular Emoticon List}).
#'
#' @details
#' \itemize{
#' \item meaning. The meaning of the emoticon
#' \item emoticon. The graphic representation of the emoticon
#' }
#'
#' @docType data
#' @keywords datasets
#' @name emoticon
#' @usage data(emoticon)
#' @format A data frame with 81 rows and 2 variables
#' @references
#' \url{http://www.lingo2word.com/lists/emoticon_listH.html}
NULL
#' Syllable Lookup Key
#'
#' A dataset containing a syllable lookup key (see \code{DICTIONARY}).
#'
#' @details For internal use.
#'
#' @docType data
#' @keywords datasets
#' @name key.syl
#' @usage data(key.syl)
#' @format A hash key with a modified DICTIONARY data set.
#' @references
#' \href{http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/nettalk/}{UCI Machine Learning Repository website}
NULL
#' Amplifying Words
#'
#' A dataset containing a vector of words that amplify word meaning.
#'
#' @details
#' Valence shifters are words that alter or intensify the meaning of the polarized
#' words and include negators and amplifiers. Negators are, generally, adverbs
#' that negate sentence meaning; for example the word like in the sentence, "I do
#' like pie.", is given the opposite meaning in the sentence, "I do not like
#' pie.", now containing the negator not. Amplifiers are, generally, adverbs or
#' adjectives that intensify sentence meaning. Using our previous example, the
#' sentiment of the negator altered sentence, "I seriously do not like pie.", is
#' heightened with addition of the amplifier seriously. Whereas de-amplifiers
#' decrease the intensity of a polarized word as in the sentence "I barely like
#' pie"; the word "barely" deamplifies the word like.
#'
#' @docType data
#' @keywords datasets
#' @name amplification.words
#' @usage data(amplification.words)
#' @format A vector with 49 elements
NULL
#' De-amplifying Words
#'
#' A dataset containing a vector of words that de-amplify word meaning.
#'
#' @details
#' Valence shifters are words that alter or intensify the meaning of the polarized
#' words and include negators and amplifiers. Negators are, generally, adverbs
#' that negate sentence meaning; for example the word like in the sentence, "I do
#' like pie.", is given the opposite meaning in the sentence, "I do not like
#' pie.", now containing the negator not. Amplifiers are, generally, adverbs or
#' adjectives that intensify sentence meaning. Using our previous example, the
#' sentiment of the negator altered sentence, "I seriously do not like pie.", is
#' heightened with addition of the amplifier seriously. Whereas de-amplifiers
#' decrease the intensity of a polarized word as in the sentence "I barely like
#' pie"; the word "barely" deamplifies the word like.
#'
#' @docType data
#' @keywords datasets
#' @name deamplification.words
#' @usage data(deamplification.words)
#' @format A vector with 13 elements
NULL
#' Interjections
#'
#' A dataset containing a character vector of common interjections.
#'
#' @docType data
#' @keywords datasets
#' @name interjections
#' @usage data(interjections)
#' @format A character vector with 139 elements
#' @references
#' \url{http://www.vidarholen.net/contents/interjections/}
NULL
#' Negating Words
#'
#' A dataset containing a vector of words that negate word meaning.
#'
#' @details
#' Valence shifters are words that alter or intensify the meaning of the polarized
#' words and include negators and amplifiers. Negators are, generally, adverbs
#' that negate sentence meaning; for example the word like in the sentence, "I do
#' like pie.", is given the opposite meaning in the sentence, "I do not like
#' pie.", now containing the negator not. Amplifiers are, generally, adverbs or
#' adjectives that intensify sentence meaning. Using our previous example, the
#' sentiment of the negator altered sentence, "I seriously do not like pie.", is
#' heightened with addition of the amplifier seriously. Whereas de-amplifiers
#' decrease the intensity of a polarized word as in the sentence "I barely like
#' pie"; the word "barely" deamplifies the word like.
#'
#' @docType data
#' @keywords datasets
#' @name negation.words
#' @usage data(negation.words)
#' @format A vector with 23 elements
NULL
#' Negative Words
#'
#' A dataset containing a vector of negative words.
#'
#' @details
#' A sentence containing more negative words would be deemed a negative sentence,
#' whereas a sentence containing more positive words would be considered positive.
#'
#' @docType data
#' @keywords datasets
#' @name negative.words
#' @usage data(negative.words)
#' @format A vector with 4776 elements
#' @references Hu, M., & Liu, B. (2004). Mining opinion features in customer
#' reviews. National Conference on Artificial Intelligence.
#'
#' \url{http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html}
NULL
#' Positive Words
#'
#' A dataset containing a vector of positive words.
#'
#' @details
#' A sentence containing more negative words would be deemed a negative sentence,
#' whereas a sentence containing more positive words would be considered positive.
#'
#' @docType data
#' @keywords datasets
#' @name positive.words
#' @usage data(positive.words)
#' @format A vector with 2003 elements
#' @references Hu, M., & Liu, B. (2004). Mining opinion features in customer
#' reviews. National Conference on Artificial Intelligence.
#'
#' \url{http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html}
NULL
#' Preposition Words
#'
#' A dataset containing a vector of common prepositions.
#'
#' @docType data
#' @keywords datasets
#' @name preposition
#' @usage data(preposition)
#' @format A vector with 162 elements
NULL
#' Polarity Lookup Key
#'
#' A dataset containing a polarity lookup key (see \code{\link[qdap]{polarity}}).
#'
#'
#' @docType data
#' @keywords datasets
#' @name key.pol
#' @usage data(key.pol)
#' @format A hash key with words and corresponding values.
#' @references Hu, M., & Liu, B. (2004). Mining opinion features in customer
#' reviews. National Conference on Artificial Intelligence.
#'
#' \url{http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html}
NULL
#' Language Assessment by Mechanical Turk (labMT) Sentiment Words
#'
#' A dataset containing words, average happiness score (polarity), standard
#' deviations, and rankings.
#'
#' @details
#' \itemize{
#' \item word. The word.
#' \item happiness_rank. Happiness ranking of words based on average happiness
#' scores.
#' \item happiness_average. Average happiness score.
#' \item happiness_standard_deviation. Standard deviations of the happiness
#' scores.
#' \item twitter_rank. Twitter ranking of the word.
#' \item google_rank. Google ranking of the word.
#' \item nyt_rank. New York Times ranking of the word.
#' \item lyrics_rank. lyrics ranking of the word.
#' }
#'
#' @docType data
#' @keywords datasets
#' @name labMT
#' @usage data(labMT)
#' @format A data frame with 10222 rows and 8 variables
#' @references
#' Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., & Danforth, C.M. (2011)
#' Temporal patterns of happiness and information in a global social network:
#' Hedonometrics and twitter. PLoS ONE 6(12): e26752.
#' doi:10.1371/journal.pone.0026752
#'
#' \url{http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0026752.s001}
NULL
#' Synonym Lookup Key
#'
#' A dataset containing a synonym lookup key.
#'
#'
#' @docType data
#' @keywords datasets
#' @name key.syn
#' @usage data(key.syn)
#' @format A hash key with 10976 rows and 2 variables (words and synonyms).
#' @references Scraped from:
#' \href{http://dictionary.reverso.net/english-synonyms/}{Reverso Online Dictionary}.
#' The word list fed to \href{http://dictionary.reverso.net/english-synonyms/}{Reverso}
#' is the unique words from the combination of \code{DICTIONARY} and
#' \code{labMT}.
NULL
#' First Names and Gender (U.S.)
#'
#' A dataset containing 1990 U.S. census data on first names.
#'
#' @details
#' \itemize{
#' \item name. A first name.
#' \item per.freq. Frequency in percent of the name by gender.
#' \item cum.freq. Cumulative frequency in percent of the name by gender.
#' \item rank. Rank of the name by gender.
#' \item gender. Gender of the combined male/female list (M/F).
#' \item gender2. Gender of the combined male/female list with "B" in place of
#' overlapping (M/F) names.
#' \item pred.sex. Predicted gender of the names with B's in \code{gender2}
#' replaced with the gender that had a higher \code{per.freq}.
#' }
#'
#' @docType data
#' @keywords datasets
#' @name NAMES
#' @usage data(NAMES)
#' @format A data frame with 5493 rows and 7 variables
#' @references \url{http://www.census.gov}
NULL
#' First Names and Predictive Gender (U.S.)
#'
#' A truncated version of the \code{NAMES} dataset used for predicting.
#'
#' @details
#' \itemize{
#' \item name. A first name.
#' \item gender2. Gender of the combined male/female list with "B" in place of
#' overlapping (M/F) names.
#' \item pred.sex. Predicted gender of the names with B's in \code{gender2}
#' replaced with the gender that had a higher \code{per.freq}.
#' }
#'
#' @docType data
#' @keywords datasets
#' @name NAMES_SEX
#' @usage data(NAMES_SEX)
#' @format A data frame with 5162 rows and 3 variables
#' @references \url{http://www.census.gov}
NULL
#' First Names and Predictive Gender (U.S.) List
#'
#' A list version of the \code{NAMES_SEX} dataset broken down by
#' first letter.
#'
#' @details Alphabetical list of dataframes with the following variables:
#' \itemize{
#' \item name. A first name.
#' \item gender2. Gender of the combined male/female list with "B" in place of
#' overlapping (M/F) names.
#' \item pred.sex. Predicted gender of the names with B's in \code{gender2}
#' replaced with the gender that had a higher \code{per.freq}.
#' }
#'
#' @docType data
#' @keywords datasets
#' @name NAMES_LIST
#' @usage data(NAMES_LIST)
#' @format A list with 26 elements
#' @references \url{http://www.census.gov}
NULL
#' Function Words
#'
#' A vector of function words from
#' \href{http://myweb.tiscali.co.uk/wordscape/museum/funcword.html}{John and Muriel Higgins's list}
#' used for the text game ECLIPSE. The lest is augmented with additional
#' contractions from \code{\link[qdapDictionaries]{contractions}}.
#'
#'
#' @docType data
#' @keywords datasets
#' @name function.words
#' @usage data(function.words)
#' @format A vector with 350 elements
#' @references \url{http://myweb.tiscali.co.uk/wordscape/museum/funcword.html}
NULL
#' Words that Indicate Strength
#'
#' A subset of the
#' \href{http://www.wjh.harvard.edu/~inquirer/inqdict.txt}{Harvard IV Dictionary}
#' containing a vector of words indicating strength.
#'
#' @docType data
#' @keywords datasets
#' @name strong.words
#' @usage data(strong.words)
#' @format A vector with 1474 elements
#' @references \url{http://www.wjh.harvard.edu/~inquirer/inqdict.txt}
NULL
#' Words that Indicate Weakness
#'
#' A subset of the
#' \href{http://www.wjh.harvard.edu/~inquirer/inqdict.txt}{Harvard IV Dictionary}
#' containing a vector of words indicating weakness.
#'
#' @docType data
#' @keywords datasets
#' @name weak.words
#' @usage data(weak.words)
#' @format A vector with 647 elements
#' @references \url{http://www.wjh.harvard.edu/~inquirer/inqdict.txt}
NULL
#' Words that Indicate Power
#'
#' A subset of the
#' \href{http://www.wjh.harvard.edu/~inquirer/inqdict.txt}{Harvard IV Dictionary}
#' containing a vector of words indicating power.
#'
#' @docType data
#' @keywords datasets
#' @name power.words
#' @usage data(power.words)
#' @format A vector with 624 elements
#' @references \url{http://www.wjh.harvard.edu/~inquirer/inqdict.txt}
NULL
#' Words that Indicate Submission
#'
#' A subset of the
#' \href{http://www.wjh.harvard.edu/~inquirer/inqdict.txt}{Harvard IV Dictionary}
#' containing a vector of words indicating submission.
#'
#' @docType data
#' @keywords datasets
#' @name submit.words
#' @usage data(submit.words)
#' @format A vector with 262 elements
#' @references \url{http://www.wjh.harvard.edu/~inquirer/inqdict.txt}
NULL
#' Strength Lookup Key
#'
#' A dataset containing a strength lookup key.
#'
#' @docType data
#' @keywords datasets
#' @name key.strength
#' @usage data(key.strength)
#' @format A hash key with strength words.
#' @references \url{http://www.wjh.harvard.edu/~inquirer/inqdict.txt}
NULL
#' Power Lookup Key
#'
#' A dataset containing a power lookup key.
#'
#'
#' @docType data
#' @keywords datasets
#' @name key.power
#' @usage data(key.power)
#' @format A hash key with power words.
#' @references \url{http://www.wjh.harvard.edu/~inquirer/inqdict.txt}
NULL
#' Alemany's Discourse Markers
#'
#' A dataset containing discourse markers
#'
#' @details A dictionary of \emph{discourse markers} from
#' \href{http://www.cs.famaf.unc.edu.ar/~laura/shallowdisc4summ/tesi_electronica.pdf}{Alemany (2005)}.
#' "In this lexicon, discourse markers are characterized by their structural
#' (continuation or elaboration) and semantic (revision, cause, equality,
#' context) meanings, and they are also associated to a morphosyntactic class
#' (part of speech, PoS), one of adverbial (A), phrasal (P) or conjunctive (C)...
#' Sometimes a discourse marker is \bold{underspecified} with respect to a
#' meaning. We encode this with a hash. This tends to happen with structural
#' meanings, because these meanings can well be established by discursive
#' mechanisms other than discourse markers, and the presence of the discourse
#' marker just reinforces the relation, whichever it may be." (p. 191).
#' \itemize{
#' \item marker. The discourse marker
#' \item type. The semantic type (typically overlaps with \code{semantic} except in the special types
#' \item structural. How the marker is used structurally
#' \item semantic. How the marker is used semantically
#' \item pos. Part of speech: adverbial (A), phrasal (P) or conjunctive (C)
#' }
#'
#' @docType data
#' @keywords datasets
#' @name discourse.markers.alemany
#' @usage data(discourse.markers.alemany)
#' @format A data frame with 97 rows and 5 variables
#' @references Alemany, L. A. (2005). Representing discourse for automatic text summarization via
#' shallow NLP techniques (Unpublished doctoral dissertation). Universitat de Barcelona, Barcelona.\cr
#' \cr
#' \url{http://www.cs.famaf.unc.edu.ar/~laura/shallowdisc4summ/tesi_electronica.pdf} \cr
#' \url{http://russell.famaf.unc.edu.ar/~laura/shallowdisc4summ/discmar/#description}
NULL
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.