Man pages for digital-geopolitics/dgblogs
Scrape text from blogs

default_get_article_urlsExtract article links from index page
default_get_index_urlsGet urls for index pages
download_articleDownload article
download_articlesDownload all articles
download_blogDownload blog articles
get_node_textExtract text from html/xml node
get_urlHTTPrequest
load_configLoad blog configuration from file
load_index_parsersImport html parsing functions from file
merge_xpMerge xpath expressions
parse_all_articlesParse all articles
parse_articleParse blog article
parse_blog_articlesParse blog
parse_dateParse date from string
pipePipe operator
print_article_contentPrint article content
scrape_article_urlsFind new article urls on index pages and insert into db
scrape_blogScrape article urls from blog
was_redirectedCheck if download was redirected
digital-geopolitics/dgblogs documentation built on March 22, 2022, 6:40 p.m.