---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: '0.8'
    jupytext_version: '1.4.1'
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

```{include} ../_templates/nav.html
```

# Columns

In this chapter we'll begin our analysis by learning how to inspect a column from a DataFrame.

## Accessing a column

We'll begin with the `prop_name` column where the proposition each committee sought to influence is stored.

To see the contents of a column separate from the rest of the DataFrame, add the column's name to the DataFrame's variable following a period.

```{code-cell}
:tags: [hide-cell]

import pandas as pd
committee_list = pd.read_csv("https://raw.githubusercontent.com/california-civic-data-coalition/first-python-notebook/master/docs/src/_static/committees.csv")
```

```{code-cell}
committee_list.prop_name
```

That will list the column out as a Series, just like the ones we created from scratch in {doc}`chapter three </pandas/index>`.

And, just as we did {doc}`then </pandas/index>`, you can now start tacking on additional methods that will analyze the contents of the column.

In this case, the column is filled with characters. So we don't want to calculate statistics like the median and average, as we did before.

You can also access columns a second way, like this:

```{code-cell}
committee_list['prop_name']
```

This method isn't as pretty, but it's required if your column has a space in its name, which would break the simpler dot-based method.

## Counting a column's values

There's another built-in pandas tool that will total up the frequency of values in a column. In this case that could be used to answer the question: Which proposition had the most committees?

The method is called `` `value_counts `` and it's just as easy to use as sum, min or max. All you need to do it is add a period after the column name and chain it on the tail end of your cell.

```{code-cell}
committee_list.prop_name.value_counts()
```

Run the code and you should see the lengthy proposition names ranked by their number of committees.

## Resetting a DataFrame

You may have noticed that even though the result has two columns, pandas did not return a clean-looking table in the same way as head did for our DataFrame.

That's because our column, a Series, acts a little bit different than the DataFrame created by `read_csv`.

In most instances, if you have an ugly Series generated by a method like `value_counts` and you want to convert it into a pretty DataFrame you can do so by tacking on the `reset_index` method onto the tail end.

```{code-cell}
committee_list.prop_name.value_counts().reset_index()
```

Why do Series and DataFrames behave differently? Why does `reset_index` have such a weird name?

Like so much in computer programming, the answer is simply, "because the people who created the library said so." It's important to learn that all open-source programming tools are made by humans, and humans have their quirks. Over time you'll see pandas has more than a few.

As a beginner, you should just accept the oddities and keep moving. As you get more advanced, if there's something about the system you think could be improved you should consider [contributing](https://pandas.pydata.org/pandas-docs/stable/development/contributing.html) to the Python code that operates the library.