About the project

The project purpose is to scrape news that is relevant to technology companies. We currently have three scrapers. Each scrapes news from different websites relevant to a specific company. You can search new by special tags to the most relevant news. The basic service was created to manage the scrapers. It's similar to ScrapingHub, but with fewer features. You can set the list of companies for scraping, run/stop any scraper, and check logs. There is also a feature that allows the scraper to run periodically.

  • Duration
    6 months
  • Client
    Thibaut Mallet de Chauny
  • Category
    big data & analytics
  • TypeWeb Application
visit website

python and Scrappy

We made scrapping scripts using pure python and Scrappy framework, simple admin panel to manage the gathered data was made with Flask. Pandas was used to clean the dataset.

previous project next project