# ignore this fiddly bit and just leave it as is knitr::opts_chunk$set(comment=NA) library("stringr")
z <- c()
. Concatenate z
with some other numeric vectors and explain what z
is.YOUR DESCRIPTIVE ANSWER HERE
Z <- ""
. Imagine that s
is a character vector but you don't know anything about it. What is the value of str_c(Z, s)
? How about str_c(s, Z)
? Try a few example values of your own for s
.# this code is not executed, so you can write abstract identities str_c(Z, s) <- # fill in the blank str_c(s, Z) <- # fill in the blank
Explain in words the difference between this and paste(s, Z)
.
z
, c
) and (Z
, str_c
).OPTIONAL ANSWER HERE
sep
parameter to str_c
? Demonstrate your answer with an example:x <- "Let us go then" y <- "you and I" DEFAULT <- "?????" # CHANGE THIS str_c(x, y) str_c(x, y, sep=DEFAULT) # value should be same as previous
(Bonus. What could the default value for collapse
possibly be?)
sep
for paste
?EXPLAIN HERE
Copy the TextAnalysisWithR/data/plainText/melville.txt
file to the same directory as your homework .Rmd
file. Then you can replace the longer path with a simple mention of melville.txt
.
Copy into the following code block the code for loading the text of the Gutenberg Moby-Dick from Jockers:
# INSERT HERE: code to read the text into a new variable text.v <- "fix me"
EXPLAIN HERE
Expression using match
:
Single expression to strip paratexts:
Go ahead and do the practice problem.
# INSERT: fill in some code from Jockers here # then: top_word_counts <- 1:100 # REPLACE THIS with an expression for the # frequencies of the top 100 words in Moby-Dick plot(top_word_counts)
Download the plain text of The Sheik from Project Gutenberg. Use "Save as" in your browser and put the file in the same directory as this homework, and give it the name sheik-gutenberg.txt
.
Load it into a character vector in R. Rather than use scan
, use the simpler function readLines
:
# uncomment the next line when you have downloaded the file to the right place # sheik_lines <- readLines("sheik-gutenberg.txt")
sheik
, body
, love
, and the
occur in this novel, ignoring case, and the total number of words in the novel. Assign these as named elements of a vector sheik_counts
.Let's stipulate that the text starts after the title page, with the word "CHAPTER," and ends before "THE END." Insert your code here:
# fill in operations on sheik_lines... sheik_counts <- numeric() # fill in assignments to sheik_counts sheik_counts["sheik"] <- 0 sheik_counts["body"] <- 0 # etc. # and sheik_total sheik_total <- 0
# your code here
BRIEF DISCUSSION HERE
story <- "For sale: baby shoes, unworn." story_words <- unlist(str_split(story, "\\W+")) story_words <- story_words[story_words != ""]
story_bigrams <- str_c() # what goes here?
c()
, replace them with sequences using :
.story_bigrams <- str_c() # improved version here
length(story_words)
.story_bigrams <- str_c() # even more improved here
sheik_bigrams <- str_c() # what goes here?
# your code for constructing the bigrams table and listing the top 10
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.