1. Create a repository

This chapter will walk you through creating a repository hosted by GitHub that will hold code you can edit on your local computer.

1.1. Get started on GitHub

The first step is to visit github.com.

github.com

If you have an account, you should log in. If you don’t have an account, you’ll need to make one. Once that’s done GitHub will take you to your personal homepage.

signed in

Click the green button in upper-left corner to create a new code repository.

new repo button

On the next page, fill in a name for your repository. Something like my-first-github-scraper will work, but you can name it anything.

Make sure the repo is public, which ensures your scraper will run for free. Then hit the green button at the bottom of the page.

make new repo

1.2. Enter GitHub Desktop

There are several ways to interact with the repository you created online in the previous step from your local machine.

We’ll use a tool called GitHub Desktop, a Graphical User Interface (GUI) developed by GitHub to communicate with the repository you just created.

Note

Advanced users take advantage of their computer’s command line to have more control over managing their code on github.com. We cover this advanced method of cloning the repository in step 1.4. It requires using your computer’s command-line interface.

Head over to the GitHub Desktop website at desktop.github.com.

GitHub Desktop page

Next, depending on the type of computer you’re using — a Mac or a PC — download the software.

Depending on the directory you downloaded GitHub Desktop to, you should see a newly downloaded zipped folder. Opening it will reveal the GitHub Desktop application.

GitHub Desktop app downloaded

Let’s open the application and sign in. Open the settings window of the application.

GitHub Desktop settings 1

In the “Preferences” window, click “Sign In.”

GitHub Desktop settings 2

A window will show up prompting us to sign in to the account we created in 1.1 through the browser.

GitHub Desktop browser sign in

Assuming you’re already signed in to GitHub in your browser, this instance should automatically authenticate your account and direct you back to the GitHub Desktop application.

GitHub Desktop browser authentication

Now, let’s use GitHub Desktop to download the repo we created in 1.1 onto your local computer in the next step.

1.3. Clone the repository

There are numerous methods for downloading the code in an online repository, which GitHub calls “cloning.” They are covered in GitHub’s documentation.

This tutorial will demonstrate how to use GitHub Desktop (which we installed in step 1.2) to download clone the repository (which we created in step 1.1).

In the GitHub Desktop app, click the “Add” button, and then select the “Clone Repository” option.

GitHub Desktop clone the repo

A window will show up asking which repository associated with your GitHub account you would like to clone to your computer.

GitHub Desktop clone the GitHub Scraper repo

As you start typing the name of the repository — “my-first-github-scraper” — you will see it listed in the dropdown options.

In this window, you will see an option for specifying a “Local Path.” This is the directory on your computer to which the repository will be cloned. In my case, it is in a directory called Code.

GitHub Desktop repo local path

Click on the blue “Clone” button.

GitHub Desktop final clone

GitHub Desktop will clone the repository, and you will be taken to a screen that looks like this. Note, in the top left corner is the name of the repository we are currently in.

GitHub Desktop success

Next we’ll install a Python web scraper and start downloading data.

1.4. Advanced Git (Optional)

While there are numerous ways to interact with your repository on GitHub, advanced users generally use the command line for managing their files — also known pulling and pushing code — between their local machines and the remote repository on GitHub for more control.

This optional section will demonstrate how to use the gh command-line utility to accomplish what we did in step 1.3. If you don’t have it installed, visit cli.github.com and follow the instructions there.

1.4.1. Introduction to the command line

Whether you know it or not, there is a way to open a special window and directly issue commands to your operating system. Different systems give this tool slightly different names, but they all have some form of it.

On Windows this is called the “command prompt.” On MacOS it is called the “terminal.” Others may call it the “command line.” They’re the same thing, just in slightly different shapes.

This is the tool we’ll use to make a copy of your repository on your computer. Depending on your operating system and personal preferences, open a terminal program so we can get started.

Note

If you’re a Windows user, we recommend you avoid the standard command line provided by the operating system. Instead, you’d be well served by the Windows Subsystem for Linux, which will create a development environment better suited for open-source software work.

We recommend you install the Ubuntu distribution from the Windows Store. This will give you access to a generic terminal without the quirks of Windows.

Once you have your terminal open, it will start you off in your computer’s home directory, much like your file explorer.

Let’s verify that using a command called pwd, which stands for present working directory. The output is the full path of your terminal’s current location in the file system. You should get back something like /Users/aadittambe.

pwd

Next let’s enter the ls command to see all of its subdirectories. The terminal should print out the same list of folders you can see in your home directory via the file explorer.

ls

1.4.2. Create a code directory

Using GitHub Desktop took care of this step for us. However, if we were to use gh to clone the repository, we would need to first create a directory on the computer to store our code.

We will use the mkdir command to create a new directory in the same style as the Desktop, Documents and Downloads folders included by most operating systems.

We will name this folder Code. To verify the command works, open the file explorer and navigate to your home folder. After it has run, you should see the new directory alongside the rest.

mkdir Code

Now jump into the new directory with the cd command, which operates the same as double clicking on a folder in your file explorer.

cd Code

This is the location where we’ll download a copy of your repository.

1.4.3. Clone the repository

In order to clone the repository, you need to make sure you have gh installed by executing the following command, which should print out the version of gh you have installed.

gh --version

The output should look something like this:

gh version 2.5.1 (2022-02-15)
https://github.com/cli/cli/releases/tag/v2.5.2

Note

If you get an error instead, open a fresh terminal and try again. If it’s still not working, revisit cli.github.com to make sure you’ve followed all the necessary steps.

Use gh to login to GitHub, which will verify that your computer has permission to access and edit the repositories owned by your account.

gh auth login

After you authenticate, it’s time to clone the new repository we created. Edit the code below by inserting your user name and repository. Then run it.

gh repo clone https://github.com/<your-username>/<your-repo>

In my case, the command looks like this:

gh repo clone https://github.com/aadittambe/my-first-github-scraper

After clone completes, run the ls command again. You should see a new folder created by gh.

ls

Use cd to move into the directory, where we can begin work.

cd my-first-github-scraper