what is scraping?

Posted by missnadia on February 5, 2018

convenient tool with ambiguous legality

i started learning about scraping from the flatiron curriculum a couple days ago and as a result started to research more about it to gain a futher understanding of it.

what is it? there are three types of scraping: screen scraping, web scraping, and report mining. the type of scraping reviewed in the curriculum is called web scraping. generally, web scraping is a programming technique that extracts targeted information using the markup language (HTML/XHTML) of that particular webpage. the scraped data is then used in various ways to produce products and applications that utilize that data for their content. the data gathered from scraping is used for websites, applications, or reports consisting of information produced from sources such as product reviews, hotel/airfare/product prices, real estate listings, indexing websites, weather reports, and data analytics.

the code along for the lesson on scraping demonstrated how to extract course information from flatiron’s website using the Scraper class. well, what can i do with that information? let’s say a potential new student wanted to attend a coding bootcamp, but wasn’t sure which school he/she wanted to attend, a website or application that allows its users to compare and contrast the course information and offerings from various schools would utilize the data extracted from the code along to provide their content.

although scraping provides a convenient means of gathering data from multiple sources, defensive measures have been used to protect certain data from scrapers. for example, a protocol called ‘robot.txt’ is deployed by websites to instruct bots which areas of the website should not be accessed.

is it legal? there are three major legal claims to prevent web scraping (copyright infringement, violation of the Computer Fraud and Abuse Act, trespass to chattel); however, the laws governing scraping are governed by ambiguous rules and precedent because the topic of scraping is a new concept in law. although scraping provides a convenient means data collection, unauthorized scraping could violate privacy laws and produced uncertainty for how that collected information will be used.