Retrieve and Process Textual Data from PTT

as_urlTurn PTT board name to URL
browseOpen web page in default browser
check_404Check web page for 404 error
comment2corpusConvert 'comment' list-column to 'corpus' list-column
down_htmlDowload HTML files to local directory
example_postsRetreive example data set of posts data frame
extr_post_categoryExtract post category from title
get_postGet all information from an individual PTT post
get_post_commentRetrieve user comments from an individual PTT post
get_post_contentRetrieve content from an individual PTT post
get_post_metaRetrieve mata data from an individual PTT post
get_ptt_dictGet PTT dictionary
hotboardsReturn a data frame with popular boards info
index2dfExtract data from multiple index pages of a PTT board.
mutate_content_lenWord count 'content' col of get_post
mutate_content_urlExtract and remove URL from 'content' column of a data frame...
parse_comment_dateParse dates to add year in PTT post comments
parse_post_dateExtract the publish date of a PTT post
ping2zhPingyin-Character translation
post2corpusConvert post data frame to corpus objects
post2dfExtract information from PTT posts
pttGet PTT info
read_html2Read PTT pages with "over18-confirmation"
scrape-indexHelper functions for scraping index pages
segmentWord segmentation for PTT post content and comments.
seg_pttWord segmentation for PTT post content
