rorpy: R or Python? Classification of webpages by code content.

Description Usage Arguments Value Examples

View source: R/rorpy.R

Description

This function fetches code chunks from a vector of urls (where code is assumed to be tagged <pre>, <code> or <textarea>). The function uses a pretrained random forest classifier to calculates the probability that the code is R, Python or neither.

Usage

1
rorpy(url, show_progress = TRUE)

Arguments

url

Either a character vector containing urls, or a list containing xml_document objects returned by xml2::read_html().

show_progress

Boolean flag, defaults to TRUE. Whether to show progress bar when multiple urls are provided.

Value

A tibble containing the probability that the input url contains R code (column r), Python code (column py) or another code type (column other). If no code can be found the probability vector will be 0. If there are problems fetching data from a url, the NAs will be returned instead of classification probabilities.

Examples

1
2
3
4
# not run:
# rorpy("https://google.com") # no code here...
# rorpy("http://dplyr.tidyverse.org") # 99\% sure it's R.......  
# rorpy("https://keras.io") # also about 99\% sure it's python

alastairrushworth/rorpy documentation built on June 14, 2020, 2:18 p.m.