Data Scraping – What is it?

Everywhere you go on the internet is filled with Data. If you’re looking at an image on Instagram, that is data behind the scenes displaying that image for you. If you’re reading an article on a new platform, there is data behind the scenes displaying that data as well. If you’re reading this article, this might seem very elementary for you.

This Data is used and shared throughout the web. I’m sure you know that when we want to access some data for an app or project that we are working on we will look for an API to use. However, what if we can’t find an API that is providing the specific type of information that we need. Ahhh that is where Data Scraping comes in. Now API’s basically are “scraping” data, or pulling data, from the source that they intend. But it is just another method of Data scraping. API’s working differently in the fact that an API, application programming interface, is an intermediary that allows one software to talk to another software. Basically, an API allows the user to open up data and functionality to other users and developers.

API’s are the most popular, and easiest, ways of gathering the data that you need. However, it has some downsides. First of all, an API does not have access to ALL of the data, it is very specific. Also, there are limits to the amount of times you can “call” and access this data. These rates vary from one API to another. Most of the time to get more usage out of an API you have to pay a monthly or yearly rate. Also there is an issue of legality. Now, arguably, it is not illegal to use an API and get data and is not copy-rightable. However, the database where that data is stored that you are accessing can, arguably, be copyrighted. So this is something that should be considered when using API’s.

Now the other option for getting data that you need, is Web or Data scraping. These are the best options for you for many reasons. For one, you always get up to date data. You are not relying on the providers of an API to keep their information up to date.

Second, you can customize and specify the exact data you are looking for. Sometimes an API just won’t cut it and you can’t find the right API to get the information you are looking for. So with Web Scraping you can specify exactly what you need.

Third, there are no RATE LIMITS!! You control everything. You don’t have to pay someone to make more “calls” for the data or to just gain access in general. You are implementing the scraper and getting the data yourself. (Assuming you make the scraper and don’t outsource to a web scraping service. Yes you can do this… Lazy!)

Also, you can stay anonymous. You can gather the data that you need and stay private without providing your information to a websites Administrator. With an API you must register and account and get your own key, and they track every request for data that you make. So it is practically impossible to stay anonymous while utilizing an API.

Both options, API’s and Web Scraping, have their purpose and are very powerful tools. You must decide which one will better suit the needs that you have.

Next post I will be going into detail on how to make your own web scraper utilizing a Ruby on Rails app.

Leave a comment