Reference¶
Documentation for a selection of our system’s common internal tools
Table of contents
Commands¶
accessibility¶
Save the accessiblity tree of the provided site.
accessibility [OPTIONS] HANDLE
Options
-
-o
,
--output-dir
<output_dir>
¶
-
--timeout
<timeout>
¶
Arguments
-
HANDLE
¶
Required argument
analyze¶
analyze [OPTIONS] COMMAND [ARGS]...
cli¶
Analyze the Drudge Report.
cli¶
Analyze Lighthouse reports.
archive¶
Save assets to an archive.org collection.
archive [OPTIONS] HANDLE
Options
-
-i
,
--input-dir
<input_dir>
¶
-
--latest
¶
Crosspost to the latest archive.org item
-
--verbose
¶
Display the upload progress to archive.org
-
--timeout
<timeout>
¶
Arguments
-
HANDLE
¶
Required argument
batch¶
Print a batch of sites.
batch [OPTIONS] COMMAND [ARGS]...
sites-by-batch¶
Print site handles in the provided batch as a JSON list.
batch sites-by-batch [OPTIONS] BATCH
Options
-
-b
,
--batches
<batches>
¶
Arguments
-
BATCH
¶
Required argument
discorder¶
Post images to Discord.
discorder [OPTIONS] COMMAND [ARGS]...
bundle¶
Post all images for a bundle.
discorder bundle [OPTIONS] SLUG
Options
-
-i
,
--input-dir
<input_dir>
¶
Arguments
-
SLUG
¶
Required argument
site¶
Update templated documentation pages.
site [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...
accessibility-ranking¶
Create page ranking sites by Lighthouse accessibility score.
site accessibility-ranking [OPTIONS]
latest-screenshots¶
Create page showing all of the latest screenshots.
site latest-screenshots [OPTIONS]
performance-ranking¶
Create page ranking sites by Lighthouse performance score.
site performance-ranking [OPTIONS]
site-detail-accessibility-chart¶
Create the JSON data file for the site detail page’s accessibility chart.
site site-detail-accessibility-chart [OPTIONS]
site-detail-hyperlink-chart¶
Create the JSON data file for the site detail page’s hyperlinks chart.
site site-detail-hyperlink-chart [OPTIONS]
site-detail-lighthouse-analysis-chart¶
Create the JSON data file for the site detail page’s lighthouse analysis chart.
site site-detail-lighthouse-analysis-chart [OPTIONS]
site-detail-lighthouse-chart¶
Create the JSON data file for the site detail page’s lighthouse chart.
site site-detail-lighthouse-chart [OPTIONS]
site-detail-screenshot-chart¶
Create the JSON data file for the site detail page’s screenshots chart.
site site-detail-screenshot-chart [OPTIONS]
extract¶
extract [OPTIONS] COMMAND [ARGS]...
cli¶
Consolidate Internet Archive metadata into CSV files.
cli¶
Download items from our archive.org collection as JSON.
cli¶
Download and parse the provided site’s accessibility files.
cli¶
Download and parse the provided site’s hyperlinks files.
cli¶
Download and parse the provided site’s Lighthouse files.
html¶
Save HTML for the provided homepage.
html [OPTIONS] HANDLE
Options
-
-o
,
--output-dir
<output_dir>
¶
-
-w
,
--wait
<wait>
¶
Arguments
-
HANDLE
¶
Required argument
hyperlinks¶
Save all hyperlinks as JSON for a site or bundle.
hyperlinks [OPTIONS] HANDLE
Options
-
-o
,
--output-dir
<output_dir>
¶
-
--timeout
<timeout>
¶
Arguments
-
HANDLE
¶
Required argument
mosaic¶
Create image mosaics.
mosaic [OPTIONS] COMMAND [ARGS]...
screenshot¶
Screenshot the provided homepage.
screenshot [OPTIONS] HANDLE
Options
-
-o
,
--output-dir
<output_dir>
¶
-
-w
,
--wait
<wait>
¶
-
-x
,
--width
<width>
¶
-
-y
,
--height
<height>
¶
-
-f
,
--full-page
¶
Screenshot the whole page
Arguments
-
HANDLE
¶
Required argument
slack¶
Post image to Slack channel.
slack [OPTIONS] ARTIFACT_PATH
Arguments
-
ARTIFACT_PATH
¶
Required argument
telegrammer¶
Send a Telegram message.
telegrammer [OPTIONS] COMMAND [ARGS]...
bundle¶
Send a bundle of sources.
telegrammer bundle [OPTIONS] SLUG
Options
-
-i
,
--input-dir
<input_dir>
¶
Arguments
-
SLUG
¶
Required argument
tweet¶
Send a tweet.
tweet [OPTIONS] COMMAND [ARGS]...
accessibility-report¶
Tweet a periodic report on Lighthouse accessibility scores.
tweet accessibility-report [OPTIONS]
bundle¶
Tweet four sources as a single tweet.
tweet bundle [OPTIONS] SLUG
Options
-
-i
,
--input-dir
<input_dir>
¶
Arguments
-
SLUG
¶
Required argument
country¶
Tweet four sources as a single tweet.
tweet country [OPTIONS] CODE
Options
-
-i
,
--input-dir
<input_dir>
¶
Arguments
-
CODE
¶
Required argument
performance-report¶
Tweet a periodic report on Lighthouse performance scores.
tweet performance-report [OPTIONS]
Utilities¶
The utils module contains a variety of functions used by our commands.
-
newshomepages.utils.
batch
(li: List, n: int)¶ Yield n number of sequential chunks from l.
-
newshomepages.utils.
chunk
(iterable: List, length: int) → List[List]¶ Split the provided list into chunks of the provided length.
- Parameters
iterable (list) – The master list to split.
length (int) – The size of the chunks you want
Returns a list of lists.
-
newshomepages.utils.
download_url
(url: str, output_path: pathlib.Path, timeout: int = 180)¶ Download the provided URL to the provided path.
-
newshomepages.utils.
get_accessibility_df
() → pandas.core.frame.DataFrame¶ Get the full list of accessibility files from our extracts.
Returns a DataFrame.
-
newshomepages.utils.
get_accessibility_list
() → List[Dict[str, Any]]¶ Get the full list of accessibility from our extracts.
Returns a list of dictionaries.
-
newshomepages.utils.
get_bundle
(slug: str) → Dict¶ Get the metadata for the provided bundle.
- Parameters
slug (str) – The unique string identifier of the bundle.
Returns a dictionary.
-
newshomepages.utils.
get_bundle_list
() → List[Dict]¶ Get the full list of site bundles.
Returns a list of dictionaries.
-
newshomepages.utils.
get_country
(code: str) → Dict¶ Get the metadata for the provided country.
- Parameters
slug (str) – The unique string identifier of the bundle.
Returns a dictionary.
-
newshomepages.utils.
get_country_df
() → pandas.core.frame.DataFrame¶ Get the list of countries.
Returns a pandas DataFrame.
-
newshomepages.utils.
get_country_list
() → List[Dict]¶ Get the full list of countries.
Returns a list of dictionaries.
-
newshomepages.utils.
get_extract_df
(name: str, **kwargs) → pandas.core.frame.DataFrame¶ Read in the requests extracts CSV as a dataframe.
-
newshomepages.utils.
get_hyperlink_df
() → pandas.core.frame.DataFrame¶ Get the full list of hyperlink files from our extracts.
Returns a DataFrame.
-
newshomepages.utils.
get_hyperlink_list
() → List[Dict[str, Any]]¶ Get the full list of hyperlink from our extracts.
Returns a list of dictionaries.
-
newshomepages.utils.
get_javascript
(handle: str) → Optional[str]¶ Get the JavaScript file to run before the screenshot, if it exists.
- Parameters
handle (str) – The Twitter handle of the site you want.
Returns a JavaScript string ready to be run. Or None, if no file exists.
-
newshomepages.utils.
get_json_url
(url: str)¶ Get JSON data from the provided URL.
-
newshomepages.utils.
get_language_df
() → pandas.core.frame.DataFrame¶ Get the list of languages.
Returns a pandas DataFrame.
-
newshomepages.utils.
get_language_list
() → List[Dict]¶ Get the list of languages.
Returns a list of dictionaries.
-
newshomepages.utils.
get_lighthouse_df
() → pandas.core.frame.DataFrame¶ Get the full list of Lighthouse files from our extracts.
Returns a DataFrame.
-
newshomepages.utils.
get_lighthouse_list
() → List[Dict[str, Any]]¶ Get the full list of lighthouse audits from our extracts.
Returns a list of dictionaries.
-
newshomepages.utils.
get_local_time
(site: Dict) → datetime.datetime¶ Get the current time in the provided site’s timezone.
- Parameters
site (dict) – A site’s data dictionary.
Returns the current item as a timezone-aware datetime object.
-
newshomepages.utils.
get_screenshot_df
() → pandas.core.frame.DataFrame¶ Get the full list of screenshot files from our extracts.
Returns a DataFrame.
-
newshomepages.utils.
get_screenshot_list
() → List[Dict[str, Any]]¶ Get the full list of screenshots from our extracts.
Returns a list of dictionaries.
-
newshomepages.utils.
get_screenshots_by_site
(site: Dict) → List[Dict]¶ Get the list of screenshots for the provided site.
Returns a list of dictionaries.
-
newshomepages.utils.
get_site
(handle: str) → Dict¶ Get the metadata for the provided site.
- Parameters
handle (str) – The Twitter handle of the site you want.
Returns a dictionary.
-
newshomepages.utils.
get_site_df
() → pandas.core.frame.DataFrame¶ Get the full list of sites.
Returns a DataFrame.
-
newshomepages.utils.
get_site_list
() → List[Dict]¶ Get the full list of supported sites.
Returns a list of dictionaries.
-
newshomepages.utils.
get_sites_in_batch
(batch_number: int, batches: int = 10) → List[Dict]¶ Get all the sites in the provided batch.
- Parameters
batch_number (int) – The number of the batch to pull.
batches (int) – The total number of batches.
Returns a list of site dictionaries.
-
newshomepages.utils.
get_sites_in_bundle
(slug: str) → List[Dict]¶ Get all the sites in the provided bundle.
- Parameters
slug (str) – The unique string identifier of the bundle.
Returns a list of site dictionaries.
-
newshomepages.utils.
get_sites_in_country
(slug: str) → List[Dict]¶ Get all the sites in the provided country.
- Parameters
slug (str) – The two digit alpha code of the country.
Returns a list of site dictionaries.
-
newshomepages.utils.
get_sites_in_language
(code: str) → List[Dict]¶ Get all the sites in the provided language.
- Parameters
slug (str) – The two digit alpha code of the country.
Returns a list of site dictionaries.
-
newshomepages.utils.
get_url
(url: str, timeout: int = 30)¶ Get the provided URL.
-
newshomepages.utils.
get_user_agent
() → str¶ Provide a user-agent string.
Returns a string ready to use as a header in web request.
-
newshomepages.utils.
get_wayback_df
() → pandas.core.frame.DataFrame¶ Get the full list of wayback files from our extracts.
Returns a DataFrame.
-
newshomepages.utils.
intcomma
(value: Union[int, str]) → str¶ Convert an integer to a string containing commas every three digits.
For example, 3000 becomes ‘3,000’ and 45000 becomes ‘45,000’.
- Parameters
value (int) – The integer to format
Returns a string with the result.
-
newshomepages.utils.
numoji
(number: int) → str¶ Convert a number into a series of emojis for Slack.
- Parameters
number (int) – The number to convert into emoji
Returns: Am emoji string
-
newshomepages.utils.
parse_archive_url
(url: str) → Dict¶ Parse the handle and timestamp from an archive.org URL.
- Parameters
url (str) – An archive.org URL
Returns a dictinary with the identifier, handle and timestamp parsed out.
-
newshomepages.utils.
safe_ia_handle
(handle: str) → str¶ Santize a handle so its safe to use as an archive.org slug.
- Parameters
handle (str) – The unique string identifier of the site.
Returns a lowercase string that’s ready to use.
-
newshomepages.utils.
write_csv
(dict_list: List[Dict], path: pathlib.Path, verbose: bool = True) → None¶ Write a list of dictionaries to a CSV file at the provided path.
- Parameters
data (Any) – Any Python object ready to be serialized as JSON.
path (Path) – The filesystem Path where the object should be written.
verbose (bool) – Whether or not to log the action prior to execution. (Default: True)
Returns nothing.
-
newshomepages.utils.
write_json
(data: Any, path: pathlib.Path, indent: int = 2, verbose: bool = True) → None¶ Write JSON data to the provided path.
- Parameters
data (Any) – Any Python object ready to be serialized as JSON.
path (Path) – The filesystem Path where the object should be written.
indent (int) – The number of identations to include in the JSON. (Default: 2)
verbose (bool) – Whether or not to log the action prior to execution. (Default: True)
Returns nothing.