1. Create a repository¶
This chapter will walk you through how to make a repository hosted by GitHub that holds code you can edit on your local computer.
The first step is to visit github.com.
Click the green button in upper-left corner to create a new code repository.
On the next page, fill in a name for your repository. Something like
my-first-github-scraper will work, but you can name it anything.
Make sure the repo is public, which ensures your scraper will run for free. Then hit the green button at the bottom of the page.
There are several ways of interacting with the repository you created online in the previous step from your local machine.
We’ll use a tool called GitHub Desktop, a GUI developed by GitHub to make communicate with the repository you created, today.
Advanced users take advantage of their computer’s command line to have more control over managing their code on github.com. We cover this advanced method of cloning the repository in step 1.4. It requires using your computer’s command-line interface.
Head over to the GitHub Desktop website at desktop.github.com.
Next, depending on the type of computer you’re using — a Mac or a PC — download the software.
Depending on the directory you downloaded GitHub Desktop to, you should see a newly downloaded zipped folder. Opening it will reveal the GitHub Desktop application.
Let’s open the application, and sign in. Open the settings window of the application.
In the “Preferences” window, click “Sign In.”
A window will show up prompting us to sign in to the account we created in [1.1] through the browser.
Assuming you’re already signed in to GitHub in your browser, this instance should will automatically authenticate your account and direct you back to the GitHub Desktop application.
Now, let’s use GitHub Desktop to download the repo we created in [1.1] on your local computer in the next step.
There are numerous methods for downloading the code in an online repository, which GitHub calls “cloning.” They are covered in GitHub’s documentation.
This tutorial will demonstrate how to use GitHub Desktop (which we installed in step [1.2]) to download clone the repository (which we created in step [1.1]).
In the GitHub Desktop app, click on the “Add” button, and then on the “Clone Repository” option.
A window will show up asking us which repository associated with our GitHub account we would like to clone on our computer.
As you start typing the name of the repository — “my-first-github-scraper” — you will see it listed in the dropdown options.
In this window, you will see an option for specifying a “Local Path.” This is the directory on your computer in which the repository will be cloned. In my case, it is in a directory called
Click on the blue “Clone” button.
GitHub Desktop will clone the repository, and you will be taken to a screen that looks like this. Note, in the top left corner is the name of the repository we are currently in.
Next we’ll install a Python web scraper and start downloading data.
While there are numerous ways to interact with your repository on GitHub, advanced users generally use the command line for managing their files — also known pulling and pushing code — between their local machines and the remote repository on GitHub for more control.
This optional section will demonstrate how to use the
gh command-line utility to accomplish what we did in step [1.3]. If you don’t have it installed, visit cli.github.com and follow the instructions there.
1.4.1. Introduction to the command line¶
Whether you know it or not, there is a way to open a special window and directly issue commands to your operating system. Different systems give this tool slightly different names, but they all have some form of it.
On Windows this is called the “command prompt.” On MacOS it is called the “terminal.” Others may call it the “command line.” They’re the same thing, just in different slightly shapes.
This is the tool we’ll use to make a copy of your repository on your computer. Depending on your operating system and personal preferences, open a terminal program so we can get started.
If you’re a Windows user, we recommend you avoid the standard command line provided by the operating system. Instead, you’d be well served by the Windows Subsystem for Linux, which will create a development environment better suited for open-source software work.
We recommend you install the Ubuntu distribution from the Windows Store. This will give you access to a generic terminal without the quirks of Windows.
Once you have your terminal open, it will start you off in your computer’s home directory, much like your file explorer.
Let’s verify that using a command called
pwd, which stands for present working directory. The output is the full path of your terminal’s current location in the file system. You should get back something like
Next let’s enter the
ls command to see all of its subdirectories. The terminal should print out the same list of folders you can see in your home directory via the file explorer.
1.4.2. Create a code directory¶
Using GitHub Desktop took care of this step for us. However, if we were to use
gh to clone the repository, we would need to first create a directory on the computer to store our code.
We will use the
mkdir command to create a new directory in the same style as the Desktop, Documents and Downloads folders included by most operating systems.
We will name this folder
Code. To verify the command works, open the file explorer and navigate to your home folder. After it’s run, you should see the new directory alongside the rest.
Now jump into the new directory with the
cd command, which operates the same as double clicking on a folder in your file explorer.
This is the location where we’ll download a copy of your repository.
1.4.3. Clone the repository¶
In order to clone the repository, you need to make sure you have
gh installed by executing the following command, which should print out the version of gh you have installed.
The output should look something like this:
gh version 2.5.1 (2022-02-15)
If you get an error instead, open a fresh terminal and try again. If it’s still not working, revisit cli.github.com to make sure you’ve followed all the necessary steps.
gh to login to GitHub, which will verify that your computer has permission to access and edit the repositories owned by your account.
gh auth login
After you authenticate, it’s time to clone the new repository we created. Edit the code below by inserting your user name and repository. Then run it.
gh repo clone https://github.com/<your-username>/<your-repo>
In my case, the command looks like this:
gh repo clone https://github.com/aadittambe/my-first-github-scraper
After clone completes, run the
ls command again. You should see a new folder created by
cd to move into the directory, where we can begin work.