It's not just for stats: Mining the web with R

David Springate
May 2014

Summary

  1. What is web mining?
  2. Why use R for web mining?
  3. Web mining toolkit
  4. Example - Downloading multiple files
  5. Case study - Web mining for a new job
  6. Further info

What is web mining?

“Using data mining techniques to discover patterns from the web”

Getting hold of scientific data:

  • Using web APIs (Pubmed, Genbank, Google Geocoder etc.)
  • Screen scraping (traversing and processing data from web pages)
  • Downloading files from the web (HTML, XML, csv etc.)

Why do web mining?

Baffled

“Let me Google that for you…”