Web Scrape | John Hamilton Bradford, Ph.D.

Web Scraping Experiment

This is an attempt to collect meta-data from links to academic articles. There are several R packages for both web crawling and data extraction, including Rcrawler, rvest, and scrapeR. Among these, only RCrawler has capabilities for both data extraction and web crawling. I won’t need to make use of the latter functionality here, since I already have a list of url’s that need to be mined. Instead, I’m mostly interested in web usage mining and web content mining, the extraction of “valuable information from web content” (Khalil and Fakir 2017).