cas_download_legacy: Downloads html pages based on a vector of links
In giocomai/castarter: Content Analysis Starter Toolkit

cas_download_legacy

R Documentation

Downloads html pages based on a vector of links

Description

Downloads html pages based on a vector of links.

Usage

cas_download_legacy(
  url,
  type = "contents",
  custom_folder = NULL,
  custom_path = NULL,
  file_format = "html",
  url_to_download = NULL,
  size = 500,
  wget_system = FALSE,
  method = "auto",
  missing_pages = TRUE,
  start = 1,
  wait = 1,
  ignore_ssl_certificates = FALSE,
  use_headless_chromium = FALSE,
  headless_chromium_wait = 1,
  use_phantomjs = FALSE,
  create_script = FALSE,
  project = NULL,
  website = NULL,
  base_folder = NULL
)

Arguments

`url`	A character vector of urls, or a data frame with at least two columns named `id` and `url`.
`type`	Accepted values are either "contents" (default), "index".
`custom_folder`	Defaults to NULL. If given, overrides the "type" param and stores files in given path as a subfolder of project/website. Folder must already exist, and should be empty.
`url_to_download`	Defaults to NULL. If given, expected to be a logical vector to be applied to the given urls. If given, it takes precedence over `missing_pages` and `size`.
`size`	Defaults to 500. It represents the minimum size in bytes that downloaded html files should have: files that are smaller will be downloaded again. Used only when missing_pages == FALSE.
`wget_system`	Logical, defaults to FALSE. Calls wget as a system command through the system() function. Wget must be previously installed on the system.
`method`	Defaults to "auto". Method is passed to the function utils::download.file(); available options are "internal", "wininet" (Windows only) "libcurl", "wget" and "curl". For more information see ?utils::download.file()
`missing_pages`	Logical, defaults to TRUE. If TRUE, verifies if a downloaded html file exists for each element in articlesLinks; when there is no such file, it downloads it.
`start`	Integer. Only url with position higher than start in the url vector will be downloaded: `url[start:length(url)]`
`ignore_ssl_certificates`	Logical, defaults to FALSE. If TRUE it uses wget to download the page, and does not check if the SSL certificate is valid. Useful, for example, for https pages with expired or mis-configured SSL certificate.
`use_headless_chromium`	Logical, defaults to FALSE. If TRUE uses the `crrri` package to download pages. Useful in particular when web pages are generated via javascript. See in particular: https://github.com/RLesur/crrri#system-requirements
`headless_chromium_wait`	Numeric, in seconds. How long should headless chrome wait after loading page?
`create_script`	Logical, defaults to FALSE. Tested on Linux only. If TRUE, creates a downloadPages.sh executable file that can be used to download all relevant pages from a terminal.
`project`	Name of 'castarter2' project. Must correspond to the name of a folder in the current working directory.
`website`	Name of a website included in a 'castarter2' project. Must correspond to the name of a sub-folder of the project folder.
`path`	Defaults to NULL. If given, overrides the "type" and "custom_folder" param and stores files in given path.

Value

By default, returns nothing, used for its side effects (downloads html files in relevant folder). Download files can then be imported in a vector with the function ImportHtml.

Examples

## Not run: 
if (interactive()) {
  cas_download(url)
}

## End(Not run)

giocomai/castarter documentation built on April 23, 2024, 11:14 p.m.

giocomai/castarter index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

giocomai/castarter
Content Analysis Starter Toolkit

cas_download_legacy: Downloads html pages based on a vector of links
In giocomai/castarter: Content Analysis Starter Toolkit

Downloads html pages based on a vector of links

Description

Usage

Arguments

Value

Examples

Related to cas_download_legacy in giocomai/castarter...

R Package Documentation

Browse R Packages

We want your feedback!

giocomai/castarter Content Analysis Starter Toolkit

cas_download_legacy: Downloads html pages based on a vector of links In giocomai/castarter: Content Analysis Starter Toolkit

Downloads html pages based on a vector of links

Description

Usage

Arguments

Value

Examples

Related to cas_download_legacy in giocomai/castarter...

R Package Documentation

Browse R Packages

We want your feedback!

giocomai/castarter
Content Analysis Starter Toolkit

cas_download_legacy: Downloads html pages based on a vector of links
In giocomai/castarter: Content Analysis Starter Toolkit