censusbatchgeocoder

A simple Python wrapper for the U.S. Census Geocoding Services API batch service

Installation

pipenv install censusbatchgeocoder

Basic usage

Importing the library

import censusbatchgeocoder

According to the official Census documentation, the input is expected to contain the following fields:

  • id: Your unique identifier for the record

  • address: Structure number and street name (required)

  • city: City name (required)

  • state: State (optional)

  • zipcode: ZIP Code (optional)

You can geocode a comma-delimited file from the filesystem. Results are returned as a list of dictionaries.

An example could look like this:

id,address,city,state,zipcode
1,1600 Pennsylvania Ave NW,Washington,DC,20006
2,202 W. 1st Street,Los Angeles,CA,90012

Which is then passed in like this:

results = censusbatchgeocoder.geocode("./my_file.csv")

The results are returned with the following columns from the Census

  • id: The unique id provided with the record.

  • returned_address: The address that was submitted to the geocoder.

  • geocoded_address: The address of the match returned by the geocoder.

  • is_match: Whether or not the geocoder found a match.

  • is_exact: The precision of the match.

  • coordinates: The longitude and latitude of the match together in a string.

  • longitude: The longitude of the match as a float.

  • latitude: The latitude of the match as a float.

  • tiger_line: The Census TIGER line of the match.

  • side: The side of the Census TIGER line of the match.

  • state_fips: The FIPS state code identifying the state of the match.

  • county_fips: The FIPS county code identifying the county of the match.

  • tract: The Census tract of the match.

  • block: The Census block of the match.

print(results)
[
    {
        "address": "1600 Pennsylvania Ave NW",
        "block": "1031",
        "city": "Washington",
        "coordinates": "-77.03535,38.898754",
        "county_fips": "001",
        "geocoded_address": "1600 Pennsylvania Ave NW, Washington, DC, 20006",
        "id": "1",
        "is_exact": "Non_Exact",
        "is_match": "Match",
        "latitude": 38.898754,
        "longitude": -77.03535,
        "returned_address": "1600 PENNSYLVANIA AVE NW, WASHINGTON, DC, 20502",
        "side": "L",
        "state": "DC",
        "state_fips": "11",
        "tiger_line": "76225813",
        "tract": "006202",
        "zipcode": "20006",
    },
    {
        "address": "202 W. 1st Street",
        "block": "1034",
        "city": "Los Angeles",
        "coordinates": "-118.24456,34.053005",
        "county_fips": "037",
        "geocoded_address": "202 W. 1st Street, Los Angeles, CA, 90012",
        "id": "2",
        "is_exact": "Exact",
        "is_match": "Match",
        "latitude": 34.053005,
        "longitude": -118.24456,
        "returned_address": "202 W 1ST ST, LOS ANGELES, CA, 90012",
        "side": "L",
        "state": "CA",
        "state_fips": "06",
        "tiger_line": "141618115",
        "tract": "207400",
        "zipcode": "90012",
    },
]

Any extra metadata fields included in the file are still present in the returned data.

So the my_metadata column here…

id,address,city,state,zipcode,my_metadata
1,1600 Pennsylvania Ave NW,Washington,DC,20006,foo
2,202 W. 1st Street,Los Angeles,CA,90012,bar

.. is still there after you geocode.

censusbatchgeocoder.geocode("./my_file.csv")
[
    {
        "address": "1600 Pennsylvania Ave NW",
        "block": "1031",
        "city": "Washington",
        "coordinates": "-77.03535,38.898754",
        "county_fips": "001",
        "geocoded_address": "1600 Pennsylvania Ave NW, Washington, DC, 20006",
        "id": "1",
        "is_exact": "Non_Exact",
        "is_match": "Match",
        "latitude": 38.898754,
        "longitude": -77.03535,
        "returned_address": "1600 PENNSYLVANIA AVE NW, WASHINGTON, DC, 20502",
        "my_metadata": "foo",
        "side": "L",
        "state": "DC",
        "state_fips": "11",
        "tiger_line": "76225813",
        "tract": "006202",
        "zipcode": "20006",
    },
    {
        "address": "202 W. 1st Street",
        "block": "1034",
        "city": "Los Angeles",
        "coordinates": "-118.24456,34.053005",
        "county_fips": "037",
        "geocoded_address": "202 W. 1st Street, Los Angeles, CA, 90012",
        "id": "2",
        "is_exact": "Exact",
        "is_match": "Match",
        "latitude": 34.053005,
        "longitude": -118.24456,
        "returned_address": "202 W 1ST ST, LOS ANGELES, CA, 90012",
        "my_metadata": "foo",
        "side": "L",
        "state": "CA",
        "state_fips": "06",
        "tiger_line": "141618115",
        "tract": "207400",
        "zipcode": "90012",
    },
]

Custom column names

If you have column headers that do not exactly match those expected by the geocoder you should override them.

So a file like this:

foo,bar,baz,bada,boom
1,521 SWARTHMORE AVENUE,PACIFIC PALISADES,CA,90272-4350
2,2015 W TEMPLE STREET,LOS ANGELES,CA,90026-4913

Can be mapped like this:

censusbatchgeocoder.geocode(
    self.weird_path, id="foo", address="bar", city="baz", state="bada", zipcode="boom"
)

Optional columns

The state and ZIP Code columns are optional. If your data doesn’t have them, pass None as keyword arguments.

censusbatchgeocoder.geocode("./my_file.csv", state=None, zipcode=None)

Lists of dictionaries

A list of dictionaries, like those created by the csv module’s DictReader can also be mapped.

my_list = [
    {
        "address": "521 SWARTHMORE AVENUE",
        "city": "PACIFIC PALISADES",
        "id": "1",
        "state": "CA",
        "zipcode": "90272-4350",
    },
    {
        "address": "2015 W TEMPLE STREET",
        "city": "LOS ANGELES",
        "id": "2",
        "state": "CA",
        "zipcode": "90026-4913",
    },
]
censusbatchgeocoder.geocode(my_list)

pandas DataFrames

You can geocode a pandas DataFrame by converting it into a list of dictionaries.

result = censusbatchgeocoder.geocode(df.to_dict("records"))

Then convert it back into a DataFrame.

result_df = pd.DataFrame(result)

That’s it.

File objects

You can also geocode an in-memory file object of data in CSV format.

my_data = """id,address,city,state,zipcode
1,1600 Pennsylvania Ave NW,Washington,DC,20006
2,202 W. 1st Street,Los Angeles,CA,90012"""
censusbatchgeocoder.geocode(io.StringIO(my_data))

Different encodings

If you are using Python 2 and your CSV file has an unusual encoding that’s causing problems, try explicitly passing in the encoding name.

censusbatchgeocoder.geocode("./my_file.csv", encoding="utf-8-sig")

Other resources