6. Columns

In this chapter we’ll begin our analysis by learning how to inspect a column from a DataFrame.

6.1. Accessing a column

We’ll begin with the prop_name column where the proposition each committee sought to influence is stored.

To see the contents of a column separate from the rest of the DataFrame, add the column’s name to the DataFrame’s variable following a period.

Hide code cell content
import pandas as pd
committee_list = pd.read_csv("https://raw.githubusercontent.com/california-civic-data-coalition/first-python-notebook/master/docs/src/_static/committees.csv")
committee_list.prop_name
0      PROPOSITION 051 - SCHOOL BONDS. FUNDING FOR K-...
1      PROPOSITION 051 - SCHOOL BONDS. FUNDING FOR K-...
2      PROPOSITION 051 - SCHOOL BONDS. FUNDING FOR K-...
3      PROPOSITION 051 - SCHOOL BONDS. FUNDING FOR K-...
4      PROPOSITION 052 - STATE FEES ON HOSPITALS. FED...
                             ...                        
97     PROPOSITION 067- REFERENDUM TO OVERTURN BAN ON...
98     PROPOSITION 067- REFERENDUM TO OVERTURN BAN ON...
99     PROPOSITION 067- REFERENDUM TO OVERTURN BAN ON...
100    PROPOSITION 067- REFERENDUM TO OVERTURN BAN ON...
101    PROPOSITION 067- REFERENDUM TO OVERTURN BAN ON...
Name: prop_name, Length: 102, dtype: object

That will list the column out as a Series, just like the ones we created from scratch in chapter two.

And, just as we did then, you can now start tacking on additional methods that will analyze the contents of the column.

Note

You can also access columns a second way, like this:

committee_list['prop_name']

This method isn’t as pretty, but it’s required if your column has a space in its name, which would break the simpler dot-based method.

6.2. Count a column’s values

In this case, the column is filled with characters. So we don’t want to calculate statistics like the median and average, as we did before.

There’s another built-in pandas tool that will total up the frequency of values in a column. In this case that could be used to answer the question: Which proposition had the most committees?

The method is called value_counts and it’s just as easy to use as sum, min or max. All you need to do it is add a period after the column name and chain it on the tail end of your cell.

Run the code and you should see the lengthy proposition names ranked by their number of committees.

committee_list.prop_name.value_counts()
PROPOSITION 057 - CRIMINAL SENTENCES. JUVENILE CRIMINAL PROCEEDINGS AND SENTENCING. INITIATIVE CONSTITUTIONAL AMENDMENT AND STATUTE.                           13
PROPOSITION 056 - CIGARETTE TAX TO FUND HEALTHCARE, TOBACCO USE PREVENTION, RESEARCH, AND LAW ENFORCEMENT. INITIATIVE CONSTITUTIONAL AMENDMENT AND STATUTE.    12
PROPOSITION 064- MARIJUANA LEGALIZATION. INITIATIVE STATUTE.                                                                                                   11
PROPOSITION 066- DEATH PENALTY. PROCEDURES. INITIATIVE STATUTE.                                                                                                 9
PROPOSITION 055 - TAX EXTENSION TO FUND EDUCATION AND HEALTHCARE. INITIATIVE CONSTITUTIONAL AMENDMENT.                                                          8
PROPOSITION 067- REFERENDUM TO OVERTURN BAN ON SINGLE-USE PLASTIC BAGS.                                                                                         7
PROPOSITION 062- DEATH PENALTY. INITIATIVE STATUTE.                                                                                                             7
PROPOSITION 059- SB 254 (CHAPTER 20, STATUTES OF 2016), ALLEN. CAMPAIGN FINANCE: VOTER INSTRUCTION                                                              6
PROPOSITION 058 - SB 1174 (CHAPTER 753, STATUTES OF 2014), LARA. ENGLISH LANGUAGE EDUCATION                                                                     4
PROPOSITION 063- FIREARMS. AMMUNITION SALES. INTIATIVE STATUTE.                                                                                                 4
PROPOSITION 054 - LEGISLATURE. LEGISLATION AND PROCEEDINGS. INITIATIVE CONSTITUTIONAL AMENDMENT AND STATUTE.                                                    4
PROPOSITION 053 - REVENUE BONDS. STATEWIDE VOTER APPROVAL. INITIATIVE CONSTITUTIONAL AMENDMENT.                                                                 4
PROPOSITION 051 - SCHOOL BONDS. FUNDING FOR K-12 SCHOOL AND COMMUNITY COLLEGE FACILITIES. INITIATIVE STATUTORY AMENDMENT.                                       4
PROPOSITION 052 - STATE FEES ON HOSPITALS. FEDERAL MEDI-CAL MATCHING FUNDS. INITIATIVE STATUTORY AND CONSTITUTIONAL AMENDMENT.                                  3
PROPOSITION 061- STATE PRESCRIPTION DRUG PURCHASES. PRICING STANDARDS. INITIATIVE STATUTE.                                                                      3
PROPOSITION 060- ADULT FILMS. CONDOMS. HEALTH REQUIREMENTS. INITIATIVE STATUTE.                                                                                 2
PROPOSITION 065- CARRY-OUT BAGS. CHARGES. INITIATIVE STATUTE.                                                                                                   1
Name: prop_name, dtype: int64

6.3. Reset a DataFrame

You may have noticed that even though the result has two columns, pandas did not return a clean-looking table in the same way as head did for our DataFrame.

That’s because our column, a Series, acts a little bit different than the DataFrame created by read_csv.

In most instances, if you have an ugly Series generated by a method like value_counts and you want to convert it into a pretty DataFrame you can do so by tacking on the reset_index method onto the tail end.

committee_list.prop_name.value_counts().reset_index()
index prop_name
0 PROPOSITION 057 - CRIMINAL SENTENCES. JUVENILE... 13
1 PROPOSITION 056 - CIGARETTE TAX TO FUND HEALTH... 12
2 PROPOSITION 064- MARIJUANA LEGALIZATION. INITI... 11
3 PROPOSITION 066- DEATH PENALTY. PROCEDURES. IN... 9
4 PROPOSITION 055 - TAX EXTENSION TO FUND EDUCAT... 8
5 PROPOSITION 067- REFERENDUM TO OVERTURN BAN ON... 7
6 PROPOSITION 062- DEATH PENALTY. INITIATIVE STA... 7
7 PROPOSITION 059- SB 254 (CHAPTER 20, STATUTES ... 6
8 PROPOSITION 058 - SB 1174 (CHAPTER 753, STATUT... 4
9 PROPOSITION 063- FIREARMS. AMMUNITION SALES. I... 4
10 PROPOSITION 054 - LEGISLATURE. LEGISLATION AND... 4
11 PROPOSITION 053 - REVENUE BONDS. STATEWIDE VOT... 4
12 PROPOSITION 051 - SCHOOL BONDS. FUNDING FOR K-... 4
13 PROPOSITION 052 - STATE FEES ON HOSPITALS. FED... 3
14 PROPOSITION 061- STATE PRESCRIPTION DRUG PURCH... 3
15 PROPOSITION 060- ADULT FILMS. CONDOMS. HEALTH ... 2
16 PROPOSITION 065- CARRY-OUT BAGS. CHARGES. INIT... 1

Why do Series and DataFrames behave differently? Why does reset_index have such a weird name?

Like so much in computer programming, the answer is simply, “because the people who created the library said so.” It’s important to learn that all open-source programming tools are made by humans, and humans have their quirks. Over time you’ll see pandas has more than a few.

As a beginner, you should just accept the oddities and keep moving. As you get more advanced, if there’s something about the system you think could be improved you should consider contributing to the Python code that operates the library.