Reference

Documentation for a selection of our system’s common internal tools

Commands

accessibility

Save the accessiblity JSON of a single site.

accessibility [OPTIONS] HANDLE

Options

-o, --output-dir <output_dir>

Arguments

HANDLE

Required argument

analyze

Analyze our data extracts.

analyze [OPTIONS] COMMAND [ARGS]...

drudge

Analayze Drudge hyperlinks.

analyze drudge [OPTIONS]

lighthouse

Analyze Lighthouse scores.

analyze lighthouse [OPTIONS]

archive

Save a webpage screenshot to an archive.org collection.

archive [OPTIONS] HANDLE

Options

-i, --input-dir <input_dir>

Arguments

HANDLE

Required argument

batch

Print a batch of sites.

batch [OPTIONS] COMMAND [ARGS]...

sites-by-batch

Print site handles in the provided batch as a JSON list.

batch sites-by-batch [OPTIONS] BATCH

Options

-b, --batches <batches>

Arguments

BATCH

Required argument

sites-by-bundle

Print site handles in the provided bundle as a JSON list.

batch sites-by-bundle [OPTIONS] BUNDLE

Arguments

BUNDLE

Required argument

sites-by-country

Print site handles in the provided country as a JSON list.

batch sites-by-country [OPTIONS] COUNTRY

Arguments

COUNTRY

Required argument

discorder

Post images to Discord.

discorder [OPTIONS] COMMAND [ARGS]...

bundle

Post all images for a bundle.

discorder bundle [OPTIONS] SLUG

Options

-i, --input-dir <input_dir>

Arguments

SLUG

Required argument

country

Post all images for a country.

discorder country [OPTIONS] CODE

Options

-i, --input-dir <input_dir>

Arguments

CODE

Required argument

docs

Update templated documentation pages.

docs [OPTIONS] COMMAND [ARGS]...

accessibility-ranking

Create page ranking sites by Lighthouse accessibility score.

docs accessibility-ranking [OPTIONS]

bundle-detail

Create bundle detail pages.

docs bundle-detail [OPTIONS]

bundle-list

Create bundle list.

docs bundle-list [OPTIONS]

country-detail

Create country detail pages.

docs country-detail [OPTIONS]

country-list

Create country list.

docs country-list [OPTIONS]

drudge-ranking

Create page ranking sites by appearance on drudgereport.com.

docs drudge-ranking [OPTIONS]

language-detail

Create languages detail pages.

docs language-detail [OPTIONS]

language-list

Create language list.

docs language-list [OPTIONS]

latest-screenshots

Create page showing all of the latest screenshots.

docs latest-screenshots [OPTIONS]

performance-ranking

Create page ranking sites by Lighthouse performance score.

docs performance-ranking [OPTIONS]

site-detail

Create source detail pages.

docs site-detail [OPTIONS]

site-detail-accessibility-chart

Create the JSON data file for the site detail page’s accessibility chart.

docs site-detail-accessibility-chart [OPTIONS]

site-detail-lighthouse-analysis-chart

Create the JSON data file for the site detail page’s lighthouse analysis chart.

docs site-detail-lighthouse-analysis-chart [OPTIONS]

site-detail-lighthouse-chart

Create the JSON data file for the site detail page’s lighthouse chart.

docs site-detail-lighthouse-chart [OPTIONS]

site-detail-screenshot-chart

Create the JSON data file for the site detail page’s screenshots chart.

docs site-detail-screenshot-chart [OPTIONS]

source-list

Create source list.

docs source-list [OPTIONS]

extract

Extract data from the Internet Archive collection.

extract [OPTIONS] COMMAND [ARGS]...

consolidate

Consolidate Internet Archive metadata into CSV files.

extract consolidate [OPTIONS]

download-accessibility

Download and parse the provided site’s accessibility files.

extract download-accessibility [OPTIONS] HANDLE

Arguments

HANDLE

Required argument

download-items

Download the full list of Internet Archive items as JSON.

extract download-items [OPTIONS]

Options

-y, --year <year>
--site <site>
--country <country>
--language <language>
--bundle <bundle>
--batch <batch>

download-lighthouse

Download and parse the provided site’s Lighthouse files.

extract download-lighthouse [OPTIONS]

Options

--site <site>
--country <country>
--language <language>
--bundle <bundle>
--days <days>
--output-path <output_path>

download-wayback

Download and parse the provided site’s Wayback Machine files.

extract download-wayback [OPTIONS] HANDLE

Arguments

HANDLE

Required argument

latest-files

Parse and consolidate the latest files for each site.

extract latest-files [OPTIONS]

Options

--init

mosaic

Create image mosaics.

mosaic [OPTIONS] COMMAND [ARGS]...

gif

Combine images into a mosaic GIF.

mosaic gif [OPTIONS]

Options

-i, --input-dir <input_dir>
-o, --output-dir <output_dir>

jpg

Combine images into jpgs ready for Twitter.

mosaic jpg [OPTIONS]

Options

-i, --input-dir <input_dir>
-o, --output-dir <output_dir>

rss

Create RSS feeds.

rss [OPTIONS] COMMAND [ARGS]...

bundles

Create bundle feeds.

rss bundles [OPTIONS]

countries

Create country feeds.

rss countries [OPTIONS]

opml

Create an OPML file with all site feeds.

rss opml [OPTIONS]

sites

Create site feeds.

rss sites [OPTIONS]

screenshot

Screenshot the provided homepage.

screenshot [OPTIONS] HANDLE

Options

-o, --output-dir <output_dir>
-w, --wait <wait>
-x, --width <width>
-y, --height <height>

Arguments

HANDLE

Required argument

slack

Post image to Slack channel.

slack [OPTIONS] ARTIFACT_PATH

Arguments

ARTIFACT_PATH

Required argument

telegrammer

Send a Telegram message.

telegrammer [OPTIONS] COMMAND [ARGS]...

bundle

Send a bundle of sources.

telegrammer bundle [OPTIONS] SLUG

Options

-i, --input-dir <input_dir>

Arguments

SLUG

Required argument

country

Send all sources from a single country.

telegrammer country [OPTIONS] CODE

Options

-i, --input-dir <input_dir>

Arguments

CODE

Required argument

mosaic

Tweet a mosaic GIF.

telegrammer mosaic [OPTIONS]

Options

-i, --input-dir <input_dir>

single

Send a single source.

telegrammer single [OPTIONS] HANDLE

Options

-i, --input-dir <input_dir>

Arguments

HANDLE

Required argument

tweet

Send a tweet.

tweet [OPTIONS] COMMAND [ARGS]...

accessibility-report

Tweet a periodic report on Lighthouse accessibility scores.

tweet accessibility-report [OPTIONS]

bundle

Tweet four sources as a single tweet.

tweet bundle [OPTIONS] SLUG

Options

-i, --input-dir <input_dir>

Arguments

SLUG

Required argument

country

Tweet four sources as a single tweet.

tweet country [OPTIONS] CODE

Options

-i, --input-dir <input_dir>

Arguments

CODE

Required argument

mosaic

Tweet a mosaic GIF.

tweet mosaic [OPTIONS]

Options

-i, --image-path <input_path>

performance-report

Tweet a periodic report on Lighthouse performance scores.

tweet performance-report [OPTIONS]

single

Tweet a single source.

tweet single [OPTIONS] HANDLE

Options

-i, --input-dir <input_dir>

Arguments

HANDLE

Required argument

status-report

Tweet a periodic status report.

tweet status-report [OPTIONS]

update-list

Update a Twitter list with all of our sources.

tweet update-list [OPTIONS]

Options

-n <number>

wayback

Archive a URL in the Wayback Machine.

wayback [OPTIONS] HANDLE

Options

-o, --output-dir <output_dir>

Arguments

HANDLE

Required argument

Utilities

The utils module contains a variety of functions used by our commands.

newshomepages.utils.batch(li: List, n: int)

Yield n number of sequential chunks from l.

newshomepages.utils.chunk(iterable: List, length: int) → List[List]

Split the provided list into chunks of the provided length.

Parameters
  • iterable (list) – The master list to split.

  • length (int) – The size of the chunks you want

Returns a list of lists.

newshomepages.utils.get_accessibility_df() → pandas.core.frame.DataFrame

Get the full list of accessibility files from our extracts.

Returns a DataFrame.

newshomepages.utils.get_accessibility_list() → List[Dict[str, Any]]

Get the full list of accessibility from our extracts.

Returns a list of dictionaries.

newshomepages.utils.get_bundle(slug: str) → Dict

Get the metadata for the provided bundle.

Parameters

slug (str) – The unique string identifier of the bundle.

Returns a dictionary.

newshomepages.utils.get_bundle_list() → List[Dict]

Get the full list of site bundles.

Returns a list of dictionaries.

newshomepages.utils.get_country(code: str) → Dict

Get the metadata for the provided country.

Parameters

slug (str) – The unique string identifier of the bundle.

Returns a dictionary.

newshomepages.utils.get_country_df() → pandas.core.frame.DataFrame

Get the list of countries.

Returns a pandas DataFrame.

newshomepages.utils.get_country_list()

Get the full list of countries.

Returns a list of dictionaries.

Get the full list of hyperlink files from our extracts.

Returns a DataFrame.

Get the full list of hyperlink from our extracts.

Returns a list of dictionaries.

newshomepages.utils.get_javascript(handle: str) → Optional[str]

Get the JavaScript file to run before the screenshot, if it exists.

Parameters

handle (str) – The Twitter handle of the site you want.

Returns a JavaScript string ready to be run. Or None, if no file exists.

newshomepages.utils.get_language_df() → pandas.core.frame.DataFrame

Get the list of languages.

Returns a pandas DataFrame.

newshomepages.utils.get_language_list()

Get the list of languages.

Returns a list of dictionaries.

newshomepages.utils.get_lighthouse_df() → pandas.core.frame.DataFrame

Get the full list of Lighthouse files from our extracts.

Returns a DataFrame.

newshomepages.utils.get_lighthouse_list() → List[Dict[str, Any]]

Get the full list of lighthouse audits from our extracts.

Returns a list of dictionaries.

newshomepages.utils.get_screenshot_df() → pandas.core.frame.DataFrame

Get the full list of screenshot files from our extracts.

Returns a DataFrame.

newshomepages.utils.get_screenshot_list() → List[Dict[str, Any]]

Get the full list of screenshots from our extracts.

Returns a list of dictionaries.

newshomepages.utils.get_screenshots_by_site(site: Dict) → List[Dict]

Get the list of screenshots for the provided site.

Returns a list of dictionaries.

newshomepages.utils.get_site(handle: str) → Dict

Get the metadata for the provided site.

Parameters

handle (str) – The Twitter handle of the site you want.

Returns a dictionary.

newshomepages.utils.get_site_df() → pandas.core.frame.DataFrame

Get the full list of supported sites.

Returns a DataFrame.

newshomepages.utils.get_site_list()

Get the full list of supported sites.

Returns a list of dictionaries.

newshomepages.utils.get_sites_in_batch(batch_number: int, batches: int = 10) → List[Dict]

Get all the sites in the provided batch.

Parameters
  • batch_number (int) – The number of the batch to pull.

  • batches (int) – The total number of batches.

Returns a list of site dictionaries.

newshomepages.utils.get_sites_in_bundle(slug: str) → List[Dict]

Get all the sites in the provided bundle.

Parameters

slug (str) – The unique string identifier of the bundle.

Returns a list of site dictionaries.

newshomepages.utils.get_sites_in_country(slug: str) → List[Dict]

Get all the sites in the provided country.

Parameters

slug (str) – The two digit alpha code of the country.

Returns a list of site dictionaries.

newshomepages.utils.get_sites_in_language(code: str) → List[Dict]

Get all the sites in the provided language.

Parameters

slug (str) – The two digit alpha code of the country.

Returns a list of site dictionaries.

newshomepages.utils.get_user_agent() → str

Return a user agent string ready to pass to a browser.

newshomepages.utils.get_wayback_df() → pandas.core.frame.DataFrame

Get the full list of wayback files from our extracts.

Returns a DataFrame.

newshomepages.utils.intcomma(value)

Convert an integer to a string containing commas every three digits.

For example, 3000 becomes ‘3,000’ and 45000 becomes ‘45,000’.

newshomepages.utils.numoji(number: int) → str

Convert a number into a series of emojis for Slack.

Parameters

number (int) – The number to convert into emoji

Returns: Am emoji string

newshomepages.utils.parse_archive_artifact(url_list: List) → Dict

Parse the archive artifacts saved as JSON during our update runs.

newshomepages.utils.parse_archive_url(url: str)

Parse the handle and timestamp from an archive.org URL.