Adding sites and bundles¶
This page explains how to add new sites and bundles to the project by making a code contribution to the open-source repository.
Note
If you lack the technical skills or time to add sources yourself, you can always make a request of the project’s maintainers by filling out this form on GitHub or by emailing b@palewi.re.
Adding a site¶
1. Add record to sites.csv
file¶
Adding a new site requires that a new row be added to sources/sites.csv
with, at a minimum, the Twitter handle, URL, name, location, time zone, country and language of the target.
Time zones should be provided in Python’s standard formatting scheme. Country’s should be provided as a two-digit ISO 3166-1 alpha code. Languages should be provided as a two-digit ISO 639-1 alpha code. You can override the system’s default by adding an optional attribute for the time delay before the screenshot, which, if provided, is expected in milliseconds.
Test the screenshot¶
After doing that, you should verify the site works by running the screenshot.py
command and inspecting the result.
pipenv run python -m newshomepages.screenshot your-handle
Hide ads and popups¶
If there are popups or ads that interfere with the screenshot, our aim is to eliminate them via JavaScript.
There are two techniques for acheiving the goal:
Adding a selector to the
target_list
in thenewshomepages.screenshot
module. This should be done in cases where the offending element appears to be generated by a third-party library that may occur on other sites.You can devise a file in
sources/javascript
with its name slugged to match the Twitter handle of the site. This snippet will be run for just that site. Here’s a generic example that would remove any elements with the id ofad_unit
or class ofpopup
. If you identify the id or class of a page element you’d like to hide, it could be inserted into the scheme.
document.querySelectorAll(
'#ad_unit,.popup' // <-- Pull your page’s identifiers here. If there's more than one thing to target you can comma seperate them.
).forEach(el => el.remove())
This method also can accomodate more complicated manipulations of the page. Consult the examples in the repository to explore other techniques for targeting and hiding page elements.
Add to a bundle¶
Then you should link the site’s row to one or more of the topical bundles defined in sources/bundles.csv
. This is done by putting the slugs of the desired bundles into your site’s bundle field.
If you’d like to link a site with more than bundle, you should separate the slugs with |
. For example, MSNBC is bundled with both national news outlets and left wing sites. So it’s bundle field looks like:
us-national|us-left-wing
If an suitable bundle for your site does not exist, you can add one to the separate bundle data file, as described below.
Adding a bundle¶
Bundles are collections of sites that are grouped together for archiving, presentation and analysis. Adding a new bundle requires that a new row be added to sources/bundles.csv
with a slug, name, location and timezone. When its slug value is entered in the bundle field of the sites.csv
file, the site is considered a part of the bundle.
Scheduling actions¶
While all sites in our directory are archived at least two times per day, bundles can have additional archiving runs scheduled via GitHub Actions. This allows for optimizing our runs for local time and also results in a tweet automatically posted to the @newshomepages account.
Adding a new batch run requires creating a new YAML file in the .github/workflows
directory that inherits from a reusable workflow shared by similar files. It should be named archive-your-bundle-slug.yml
. If you’d like to schedule a new bundle run, submit a file like this via pull request. You should only need to customize the name, the cron and the bundle.
name: "Archive: Your bundle name"
on:
workflow_dispatch:
schedule:
- cron: "0 18 * * *" # <-- Your bundle's schedule goes here.
jobs:
archive-bundle:
name: Archive bundle
uses: palewire/news-homepages/.github/workflows/reusable-archive-bundle-workflow.yml@main
with:
bundle: your-bundle-slug
secrets: inherit