Adding sites and bundles

Adding a site

1. Add record to sites.csv file

Adding a new site requires that a new row be added to sources/sites.csv with, at a minimum, the Twitter handle, URL, name, location, time zone, country and language of the target. Time zones should be provided in Python’s standard formatting scheme. Country’s should be provided as a two-digit ISO 3166-1 alpha code. Languages should be provided as a two-digit ISO 639-1 alpha code. You can override the system’s default by adding an optional attribute for the time delay before the screenshot, which, if provided, is expected in milliseconds.

Test the screenshot

After doing that, you should verify the site works by running the screenshot.py command and inspecting the result.

pipenv run python -m newshomepages.screenshot your-handle

Hide ads and popups

If there are popups or ads that interfere with the screenshot, our aim is to eliminate them via JavaScript.

There are two techniques for acheiving the goal:

  1. Adding a selector to the target_list in the newshomepages.screenshot module. This should be done in cases where the offending element appears to be generated by a third-party library that may occur on other sites.

  2. You can devise a file in sources/javascript with its name slugged to match the Twitter handle of the site. This snippet will be run for just that site. Here’s a generic example that would remove any elements with the id of ad_unit or class of popup. If you identify the id or class of a page element you’d like to hide, it could be inserted into the scheme.

document.querySelectorAll(
  '#ad_unit,.popup' // <-- Pull your page’s identifiers here. If there's more than one thing to target you can comma seperate them.
).forEach(el => el.remove())

This method also can accomodate more complicated manipulations of the page. Consult the examples in the repository to explore other techniques for targeting and hiding page elements.

Add to a bundle

Then you should link the site’s row to one or more of the topical bundles defined in sources/bundles.csv. This is done by putting the slugs of the desired bundles into your site’s bundle field.

If you’d like to link a site with more than bundle, you should separate the slugs with |. For example, MSNBC is bundled with both national news outlets and left wing sites. So it’s bundle field looks like:

us-national|us-left-wing

If an suitable bundle for your site does not exist, you can add one to the separate bundle data file, as described below.

Adding a bundle

Bundles are collections of sites that are grouped together for archiving, presentation and analysis. Adding a new bundle requires that a new row be added to sources/bundles.csv with a slug, name, location and timezone. When its slug value is entered in the bundle field of the sites.csv file, the site is considered a part of the bundle.

Scheduling actions

While all sites in our directory are archived at least two times per day, bundles can have additional archiving runs scheduled via GitHub Actions. This allows for optimizing our runs for local time and also results in a tweet automatically posted to the @newshomepages account.

Adding a new batch run requires creating a new YAML file in the .github/workflows directory that inherits from a reusable workflow shared by similar files. It should be named archive-your-bundle-slug.yml. If you’d like to schedule a new bundle run, submit a file like this via pull request. You should only need to customize the name, the cron and the bundle.

name: "Archive: Your bundle name"

on:
  workflow_dispatch:
  schedule:
    - cron: "0 18 * * *"  # <-- Your bundle's schedule goes here.

jobs:
  archive-bundle:
    name: Archive bundle
    uses: palewire/news-homepages/.github/workflows/[email protected]
    with:
      bundle: your-bundle-slug
    secrets: inherit

Note

If you lack the technical skills or time to add sources yourself, you can always make a request of the project’s maintainers by submitting a feature request on GitHub or by emailing b@palewi.re.