DataScraper-API


on Dec. 14, 2017, 10:42 p.m.

A consumer goods data scraper on a web spider that can search through online stores, collecting data on name, price, id, stock, descriptions, as well as determine whether an item is at a sale price. It does this by using ScraPy with CSS selectors and XPaths to get relevant links from a content page, and then spider through the pages again using CSS/XPaths to get data from elements. If the page uses JavaScript, some data does not appear in the get request. To get around this issue, this project uses Selenium with a Google Chrome web driver as an automated browser.
Once the data is collected it is piped into a styled spreadsheet .xls format using OpenPyxl. The data scraper was deployed on a server using ScraPyD, and controlled with an API to run jobs at given times.




  • ScraPy
    ScraPyD
    Requests
    Selenium
    OpenPyxl
    PyQuery