Reference

Documentation for a selection of our system’s common internal tools

Commands

accessibility

Save the accessiblity tree of the provided site.

accessibility [OPTIONS] HANDLE

Options

-o, --output-dir <output_dir>
--timeout <timeout>

Arguments

HANDLE

Required argument

analyze

analyze [OPTIONS] COMMAND [ARGS]...

cli

Analyze the Drudge Report.

drudge-entities

Analyze Drudge entities.

analyze drudge-entities [OPTIONS]

Options

-o, --output-dir <output_dir>

cli

Analyze Lighthouse reports.

lighthouse

Analyze Lighthouse scores.

analyze lighthouse [OPTIONS]

Options

-o, --output-dir <output_dir>

cli

Analyze US right wing sources.

archive

Save assets to an archive.org collection.

archive [OPTIONS] HANDLE

Options

-i, --input-dir <input_dir>
--latest

Crosspost to the latest archive.org item

--verbose

Display the upload progress to archive.org

--timeout <timeout>

Arguments

HANDLE

Required argument

batch

Print a batch of sites.

batch [OPTIONS] COMMAND [ARGS]...

sites-by-batch

Print site handles in the provided batch as a JSON list.

batch sites-by-batch [OPTIONS] BATCH

Options

-b, --batches <batches>

Arguments

BATCH

Required argument

sites-by-bundle

Print site handles in the provided bundle as a JSON list.

batch sites-by-bundle [OPTIONS] BUNDLE

Arguments

BUNDLE

Required argument

sites-by-country

Print site handles in the provided country as a JSON list.

batch sites-by-country [OPTIONS] COUNTRY

Arguments

COUNTRY

Required argument

discorder

Post images to Discord.

discorder [OPTIONS] COMMAND [ARGS]...

bundle

Post all images for a bundle.

discorder bundle [OPTIONS] SLUG

Options

-i, --input-dir <input_dir>

Arguments

SLUG

Required argument

country

Post all images for a country.

discorder country [OPTIONS] CODE

Options

-i, --input-dir <input_dir>

Arguments

CODE

Required argument

single

Post a single source.

discorder single [OPTIONS] HANDLE

Options

-i, --input-dir <input_dir>

Arguments

HANDLE

Required argument

site

Update templated documentation pages.

site [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...

accessibility-ranking

Create page ranking sites by Lighthouse accessibility score.

site accessibility-ranking [OPTIONS]

bundle-detail

Create bundle detail pages.

site bundle-detail [OPTIONS]

bundle-list

Create bundle list.

site bundle-list [OPTIONS]

country-detail

Create country detail pages.

site country-detail [OPTIONS]

country-list

Create country list.

site country-list [OPTIONS]

drudge

Create page ranking sites by appearance on drudgereport.com.

site drudge [OPTIONS]

language-detail

Create languages detail pages.

site language-detail [OPTIONS]

language-list

Create language list.

site language-list [OPTIONS]

latest-screenshots

Create page showing all of the latest screenshots.

site latest-screenshots [OPTIONS]

performance-ranking

Create page ranking sites by Lighthouse performance score.

site performance-ranking [OPTIONS]

site-detail

Create source detail pages.

site site-detail [OPTIONS]

site-detail-accessibility-chart

Create the JSON data file for the site detail page’s accessibility chart.

site site-detail-accessibility-chart [OPTIONS]

site-detail-lighthouse-analysis-chart

Create the JSON data file for the site detail page’s lighthouse analysis chart.

site site-detail-lighthouse-analysis-chart [OPTIONS]

site-detail-lighthouse-chart

Create the JSON data file for the site detail page’s lighthouse chart.

site site-detail-lighthouse-chart [OPTIONS]

site-detail-screenshot-chart

Create the JSON data file for the site detail page’s screenshots chart.

site site-detail-screenshot-chart [OPTIONS]

source-list

Create source list.

site source-list [OPTIONS]

extract

extract [OPTIONS] COMMAND [ARGS]...

cli

Consolidate Internet Archive metadata into CSV files.

consolidate

Consolidate Internet Archive metadata into CSV files.

extract consolidate [OPTIONS]

Options

-o, --output-dir <output_dir>

cli

Download items from our archive.org collection as JSON.

items

Download items from our archive.org collection as JSON.

extract items [OPTIONS] HANDLE

Options

-y, --year <year>
-o, --output-dir <output_dir>

Arguments

HANDLE

Required argument

cli

Download and parse the provided site’s accessibility files.

accessibility

Download and parse the provided site’s accessibility files.

extract accessibility [OPTIONS] HANDLE

Arguments

HANDLE

Required argument

cli

Download and parse the provided site’s hyperlinks files.

cli

Download and parse the provided site’s Lighthouse files.

lighthouse

Download and parse the provided site’s Lighthouse files.

extract lighthouse [OPTIONS]

Options

--site <site>
--country <country>
--language <language>
--bundle <bundle>
--days <days>
-o, --output-path <output_path>

cli

Download and parse the provided site’s Wayback Machine files.

wayback

Download and parse the provided site’s Wayback Machine files.

extract wayback [OPTIONS] HANDLE

Arguments

HANDLE

Required argument

html

Save HTML for the provided homepage.

html [OPTIONS] HANDLE

Options

-o, --output-dir <output_dir>
-w, --wait <wait>

Arguments

HANDLE

Required argument

mosaic

Create image mosaics.

mosaic [OPTIONS] COMMAND [ARGS]...

gif

Combine images into a mosaic GIF.

mosaic gif [OPTIONS]

Options

-i, --input-dir <input_dir>
-o, --output-dir <output_dir>

jpg

Combine images into jpgs ready for Twitter.

mosaic jpg [OPTIONS]

Options

-i, --input-dir <input_dir>
-o, --output-dir <output_dir>

rss

Create RSS feeds.

rss [OPTIONS] COMMAND [ARGS]...

bundles

Create bundle feeds.

rss bundles [OPTIONS]

countries

Create country feeds.

rss countries [OPTIONS]

opml

Create an OPML file with all site feeds.

rss opml [OPTIONS]

sites

Create site feeds.

rss sites [OPTIONS]

screenshot

Screenshot the provided homepage.

screenshot [OPTIONS] HANDLE

Options

-o, --output-dir <output_dir>
-w, --wait <wait>
-x, --width <width>
-y, --height <height>
-f, --full-page

Screenshot the whole page

Arguments

HANDLE

Required argument

slack

Post image to Slack channel.

slack [OPTIONS] ARTIFACT_PATH

Arguments

ARTIFACT_PATH

Required argument

telegrammer

Send a Telegram message.

telegrammer [OPTIONS] COMMAND [ARGS]...

bundle

Send a bundle of sources.

telegrammer bundle [OPTIONS] SLUG

Options

-i, --input-dir <input_dir>

Arguments

SLUG

Required argument

country

Send all sources from a single country.

telegrammer country [OPTIONS] CODE

Options

-i, --input-dir <input_dir>

Arguments

CODE

Required argument

mosaic

Tweet a mosaic GIF.

telegrammer mosaic [OPTIONS]

Options

-i, --input-dir <input_dir>

single

Send a single source.

telegrammer single [OPTIONS] HANDLE

Options

-i, --input-dir <input_dir>

Arguments

HANDLE

Required argument

tweet

Send a tweet.

tweet [OPTIONS] COMMAND [ARGS]...

accessibility-report

Tweet a periodic report on Lighthouse accessibility scores.

tweet accessibility-report [OPTIONS]

bundle

Tweet four sources as a single tweet.

tweet bundle [OPTIONS] SLUG

Options

-i, --input-dir <input_dir>

Arguments

SLUG

Required argument

country

Tweet four sources as a single tweet.

tweet country [OPTIONS] CODE

Options

-i, --input-dir <input_dir>

Arguments

CODE

Required argument

mosaic

Tweet a mosaic GIF.

tweet mosaic [OPTIONS]

Options

-i, --image-path <input_path>

performance-report

Tweet a periodic report on Lighthouse performance scores.

tweet performance-report [OPTIONS]

single

Tweet a single source.

tweet single [OPTIONS] HANDLE

Options

-i, --input-dir <input_dir>

Arguments

HANDLE

Required argument

status-report

Tweet a periodic status report.

tweet status-report [OPTIONS]

update-list

Update a Twitter list with all of our sources.

tweet update-list [OPTIONS]

Options

-n <number>

wayback

Archive a URL in the Wayback Machine.

wayback [OPTIONS] HANDLE

Options

-o, --output-dir <output_dir>

Arguments

HANDLE

Required argument

Utilities

The utils module contains a variety of functions used by our commands.

newshomepages.utils.batch(li: List, n: int)

Yield n number of sequential chunks from l.

newshomepages.utils.chunk(iterable: List, length: int) → List[List]

Split the provided list into chunks of the provided length.

Parameters
  • iterable (list) – The master list to split.

  • length (int) – The size of the chunks you want

Returns a list of lists.

newshomepages.utils.download_url(url: str, output_path: pathlib.Path, timeout: int = 180)

Download the provided URL to the provided path.

newshomepages.utils.get_accessibility_df() → pandas.core.frame.DataFrame

Get the full list of accessibility files from our extracts.

Returns a DataFrame.

newshomepages.utils.get_accessibility_list() → List[Dict[str, Any]]

Get the full list of accessibility from our extracts.

Returns a list of dictionaries.

newshomepages.utils.get_bundle(slug: str) → Dict

Get the metadata for the provided bundle.

Parameters

slug (str) – The unique string identifier of the bundle.

Returns a dictionary.

newshomepages.utils.get_bundle_list() → List[Dict]

Get the full list of site bundles.

Returns a list of dictionaries.

newshomepages.utils.get_country(code: str) → Dict

Get the metadata for the provided country.

Parameters

slug (str) – The unique string identifier of the bundle.

Returns a dictionary.

newshomepages.utils.get_country_df() → pandas.core.frame.DataFrame

Get the list of countries.

Returns a pandas DataFrame.

newshomepages.utils.get_country_list() → List[Dict]

Get the full list of countries.

Returns a list of dictionaries.

newshomepages.utils.get_extract_df(name: str, **kwargs) → pandas.core.frame.DataFrame

Read in the requests extracts CSV as a dataframe.

Get the full list of hyperlink files from our extracts.

Returns a DataFrame.

Get the full list of hyperlink from our extracts.

Returns a list of dictionaries.

newshomepages.utils.get_javascript(handle: str) → Optional[str]

Get the JavaScript file to run before the screenshot, if it exists.

Parameters

handle (str) – The Twitter handle of the site you want.

Returns a JavaScript string ready to be run. Or None, if no file exists.

newshomepages.utils.get_json_url(url: str)

Get JSON data from the provided URL.

newshomepages.utils.get_language_df() → pandas.core.frame.DataFrame

Get the list of languages.

Returns a pandas DataFrame.

newshomepages.utils.get_language_list() → List[Dict]

Get the list of languages.

Returns a list of dictionaries.

newshomepages.utils.get_lighthouse_df() → pandas.core.frame.DataFrame

Get the full list of Lighthouse files from our extracts.

Returns a DataFrame.

newshomepages.utils.get_lighthouse_list() → List[Dict[str, Any]]

Get the full list of lighthouse audits from our extracts.

Returns a list of dictionaries.

newshomepages.utils.get_local_time(site: Dict) → datetime.datetime

Get the current time in the provided site’s timezone.

Parameters

site (dict) – A site’s data dictionary.

Returns the current item as a timezone-aware datetime object.

newshomepages.utils.get_screenshot_df() → pandas.core.frame.DataFrame

Get the full list of screenshot files from our extracts.

Returns a DataFrame.

newshomepages.utils.get_screenshot_list() → List[Dict[str, Any]]

Get the full list of screenshots from our extracts.

Returns a list of dictionaries.

newshomepages.utils.get_screenshots_by_site(site: Dict) → List[Dict]

Get the list of screenshots for the provided site.

Returns a list of dictionaries.

newshomepages.utils.get_site(handle: str) → Dict

Get the metadata for the provided site.

Parameters

handle (str) – The Twitter handle of the site you want.

Returns a dictionary.

newshomepages.utils.get_site_df() → pandas.core.frame.DataFrame

Get the full list of sites.

Returns a DataFrame.

newshomepages.utils.get_site_list() → List[Dict]

Get the full list of supported sites.

Returns a list of dictionaries.

newshomepages.utils.get_sites_in_batch(batch_number: int, batches: int = 10) → List[Dict]

Get all the sites in the provided batch.

Parameters
  • batch_number (int) – The number of the batch to pull.

  • batches (int) – The total number of batches.

Returns a list of site dictionaries.

newshomepages.utils.get_sites_in_bundle(slug: str) → List[Dict]

Get all the sites in the provided bundle.

Parameters

slug (str) – The unique string identifier of the bundle.

Returns a list of site dictionaries.

newshomepages.utils.get_sites_in_country(slug: str) → List[Dict]

Get all the sites in the provided country.

Parameters

slug (str) – The two digit alpha code of the country.

Returns a list of site dictionaries.

newshomepages.utils.get_sites_in_language(code: str) → List[Dict]

Get all the sites in the provided language.

Parameters

slug (str) – The two digit alpha code of the country.

Returns a list of site dictionaries.

newshomepages.utils.get_url(url: str, timeout: int = 30)

Get the provided URL.

newshomepages.utils.get_user_agent() → str

Provide a user-agent string.

Returns a string ready to use as a header in web request.

newshomepages.utils.get_wayback_df() → pandas.core.frame.DataFrame

Get the full list of wayback files from our extracts.

Returns a DataFrame.

newshomepages.utils.intcomma(value: Union[int, str]) → str

Convert an integer to a string containing commas every three digits.

For example, 3000 becomes ‘3,000’ and 45000 becomes ‘45,000’.

Parameters

value (int) – The integer to format

Returns a string with the result.

newshomepages.utils.numoji(number: int) → str

Convert a number into a series of emojis for Slack.

Parameters

number (int) – The number to convert into emoji

Returns: Am emoji string

newshomepages.utils.parse_archive_url(url: str) → Dict

Parse the handle and timestamp from an archive.org URL.

Parameters

url (str) – An archive.org URL

Returns a dictinary with the identifier, handle and timestamp parsed out.

newshomepages.utils.safe_ia_handle(handle: str) → str

Santize a handle so its safe to use as an archive.org slug.

Parameters

handle (str) – The unique string identifier of the site.

Returns a lowercase string that’s ready to use.

newshomepages.utils.write_csv(dict_list: List[Dict], path: pathlib.Path, verbose: bool = True) → None

Write a list of dictionaries to a CSV file at the provided path.

Parameters
  • data (Any) – Any Python object ready to be serialized as JSON.

  • path (Path) – The filesystem Path where the object should be written.

  • verbose (bool) – Whether or not to log the action prior to execution. (Default: True)

Returns nothing.

newshomepages.utils.write_json(data: Any, path: pathlib.Path, indent: int = 2, verbose: bool = True) → None

Write JSON data to the provided path.

Parameters
  • data (Any) – Any Python object ready to be serialized as JSON.

  • path (Path) – The filesystem Path where the object should be written.

  • indent (int) – The number of identations to include in the JSON. (Default: 2)

  • verbose (bool) – Whether or not to log the action prior to execution. (Default: True)

Returns nothing.