brussels_reviews_anno | R Documentation |
Reviews of the AirBnB customerswhich are tokenised, POS tagged and lemmatised.
The data contains 1 row per document/token and contains the fields
doc_id, language, sentence_id, token_id, token, lemma, xpos.
Data has been converted from UTF-8 to ASCII as in iconv(x, from = "UTF-8", to = "ASCII//TRANSLIT")
in order
to be able to comply to CRAN policies.
http://insideairbnb.com/brussels: information of 2015-10-03
brussels_reviews
, brussels_listings
## brussels_reviews_anno data(brussels_reviews_anno) head(brussels_reviews_anno) sort(table(brussels_reviews_anno$xpos)) ## Not run: ## ## If you want to construct a similar dataset as the ## brussels_reviews_anno dataset based on the udpipe library, do as follows ## library(udpipe) library(data.table) data(brussels_reviews) ## The brussels_reviews contains comments on Airbnb sites in 3 languages: es, fr and nl table(brussels_reviews$language) bxl_anno <- split(brussels_reviews, brussels_reviews$language) ## Annotate the Spanish comments m <- udpipe_download_model(language = "spanish-ancora") m <- udpipe_load_model(file = m$file_model) bxl_anno$es <- udpipe_annotate(object = m, x = bxl_anno$es$feedback, doc_id = bxl_anno$es$id) ## Annotate the French comments m <- udpipe_download_model(language = "french-partut") m <- udpipe_load_model(file = m$file_model) bxl_anno$fr <- udpipe_annotate(object = m, x = bxl_anno$fr$feedback, doc_id = bxl_anno$fr$id) ## Annotate the Dutch comments m <- udpipe_download_model(language = "dutch-lassysmall") m <- udpipe_load_model(file = m$file_model) bxl_anno$nl <- udpipe_annotate(object = m, x = bxl_anno$nl$feedback, doc_id = bxl_anno$nl$id) brussels_reviews_anno <- lapply(bxl_anno, as.data.frame) brussels_reviews_anno <- rbindlist(brussels_reviews_anno) str(brussels_reviews_anno) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.