First GitHub Scraper

A step-by-step introduction to free, automated web scraping with GitHub’s powerful Actions feature.

You will learn how to:

  • Create a GitHub repository to store your code

  • Use Python to scrape data from the web

  • Configure GitHub Actions to schedule the scrape

  • Automatically save the results to the repository

  • Send a Slack notification when new data arrive

Table of contents

About

This guide was prepared for a training session at the National Institute for Computer-Assisted Reporting (NICAR)’s 2022 conference in Atlanta. The authors are Iris Lee, Aadit Tambe and Ben Welsh. The tutorial is published as open-source software.