Description Usage Arguments Value Warning Note Author(s) References Examples
Read .docx, .csv or .xlsx files into R.
1 2 3 4 5 6 |
file |
The name of the file which the data are to be
read from. Each row of the table appears as one line of
the file. If it does not contain an absolute path, the
file name is relative to the current working directory,
|
col.names |
A character vector specifying the column names of the transcript columns. |
text.var |
A character string specifying the name of
the text variable will ensure that variable is classed as
character. If NULL |
merge.broke.tot |
logical. If TRUE and if the file being read in is .docx with broken space between a single turn of talk read.transcript will attempt to merge these into a single turn of talk. |
header |
logical. If TRUE the file contains the names of the variables as its first line. |
dash |
A character string to replace the en and em dashes special characters (default is to remove). |
ellipsis |
A character string to replace the ellipsis special characters (default is text ...). |
quote2bracket |
logical. If TRUE replaces curly quotes with curly braces (default is FALSE). If FALSE curly quotes are removed. |
rm.empty.rows |
logical. If TURE
|
na.strings |
A vector of character strings which are to be interpreted as NA values. |
sep |
The field separator character. Values on each
line of the file are separated by this character. The
default of NULL instructs
|
skip |
Integer; the number of lines of the data file to skip before beginning to read data. |
nontext2factor |
logical. If TRUE attempts to convert any non text to a factor. |
... |
Further arguments to be passed to
|
Returns a dataframe of dialogue and people.
read.transcript
may contain errors if
the file being read in is .docx. The researcher should
carefully investigate each transcript for errors before
further parsing the data.
If a transcript is a .docx file read transcript expects two columns (generally person and dialogue) with some sort of separator (default is colon separator). .doc files must be converted to .docx before reading in.
Bryan Goodrich and Tyler Rinker <tyler.rinker@gmail.com>.
https://github.com/trinker/qdap/wiki/Reading-.docx-%5BMS-Word%5D-Transcripts-into-R
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #Note: to view the document below use the path:
gsub("trans1.docx", "", system.file("extdata/trans1.docx", package = "qdap"))
(doc1 <- system.file("extdata/trans1.docx", package = "qdap"))
(doc2 <- system.file("extdata/trans2.docx", package = "qdap"))
(doc3 <- system.file("extdata/trans3.docx", package = "qdap"))
(doc4 <- system.file("extdata/trans4.xlsx", package = "qdap"))
dat1 <- read.transcript(doc1)
truncdf(dat1, 40)
dat2 <- read.transcript(doc1, col.names = c("person", "dialogue"))
truncdf(dat2, 40)
dat2b <- rm_row(dat2, "person", "[C") #remove bracket row
truncdf(dat2b, 40)
## read.transcript(doc2) #throws an error (need skip)
dat3 <- read.transcript(doc2, skip = 1); truncdf(dat3, 40)
## read.transcript(doc3, skip = 1) #throws an error; wrong sep
dat4 <- read.transcript(doc3, sep = "-", skip = 1); truncdf(dat4, 40)
dat5 <- read.transcript(doc4); truncdf(dat5, 40) #an .xlsx file
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.