Adding sites and bundles¶
This page explains how to add new sites and bundles to the project by making a code contribution to the open-source repository.
Adding a site¶
1. Add record to
Adding a new site requires that a new row be added to
sources/sites.csv with, at a minimum, the Twitter handle, URL, name, location, time zone, country and language of the target.
Time zones should be provided in Python’s standard formatting scheme. Country’s should be provided as a two-digit ISO 3166-1 alpha code. Languages should be provided as a two-digit ISO 639-1 alpha code. You can override the system’s default by adding an optional attribute for the time delay before the screenshot, which, if provided, is expected in milliseconds.
Test the screenshot¶
After doing that, you should verify the site works by running the
screenshot.py command and inspecting the result.
pipenv run python -m newshomepages.screenshot your-handle
Hide ads and popups¶
There are two techniques for acheiving the goal:
Adding a selector to the
newshomepages.screenshotmodule. This should be done in cases where the offending element appears to be generated by a third-party library that may occur on other sites.
You can devise a file in
ad_unitor class of
popup. If you identify the id or class of a page element you’d like to hide, it could be inserted into the scheme.
document.querySelectorAll( '#ad_unit,.popup' // <-- Pull your page’s identifiers here. If there's more than one thing to target you can comma seperate them. ).forEach(el => el.remove())
This method also can accomodate more complicated manipulations of the page. Consult the examples in the repository to explore other techniques for targeting and hiding page elements.
Add to a bundle¶
Then you should link the site’s row to one or more of the topical bundles defined in
sources/bundles.csv. This is done by putting the slugs of the desired bundles into your site’s bundle field.
If you’d like to link a site with more than bundle, you should separate the slugs with
|. For example, MSNBC is bundled with both national news outlets and left wing sites. So it’s bundle field looks like:
If an suitable bundle for your site does not exist, you can add one to the separate bundle data file, as described below.
Adding a bundle¶
Bundles are collections of sites that are grouped together for archiving, presentation and analysis. Adding a new bundle requires that a new row be added to
sources/bundles.csv with a slug, name, location and timezone. When its slug value is entered in the bundle field of the
sites.csv file, the site is considered a part of the bundle.
While all sites in our directory are archived at least two times per day, bundles can have additional archiving runs scheduled via GitHub Actions. This allows for optimizing our runs for local time and also results in a tweet automatically posted to the @newshomepages account.
Adding a new batch run requires creating a new YAML file in the
.github/workflows directory that inherits from a reusable workflow shared by similar files. It should be named
archive-your-bundle-slug.yml. If you’d like to schedule a new bundle run, submit a file like this via pull request. You should only need to customize the name, the cron and the bundle.
name: "Archive: Your bundle name" on: workflow_dispatch: schedule: - cron: "0 18 * * *" # <-- Your bundle's schedule goes here. jobs: archive-bundle: name: Archive bundle uses: palewire/news-homepages/.github/workflows/reusable-archive-bundle-workflow.yml@main with: bundle: your-bundle-slug secrets: inherit