4. Money in politics¶
In November 2016, California voters had a lot of decisions to make.
Millions of votes were cast across the state to choose who should be America’s next president, to send more than 50 members off to Congress, to select a new U.S. senator and to refill most of the seats in the Sacramento statehouse.
On top of all that, every ballot in the state included a list of propostions, the yes or no questions that give voters the power to directly change the law.
The slate of propositions vary with each election. In 2016, 17 different proposals were put to voters.
Should the state take out $9 billion in bonds to fund education? Should the cost of prescription drugs be limited? Should the cigarette tax be increased? Should recreational marijuana use be legalized? Should actors in pornographic films be required to wear condoms? Should the state stop administering the death penalty? Or should it instead speed up administering the death penalty?
And that’s just the start. The election guide the state sends to every registered voter set a new record for length at 224 pages long.
By election day, nearly $500 million dollars was spent by political campaigns that sought to influence voters to vote yes or no on those 17 propositions.
We know that because every dollar that is raised or spent by political campaigns in the state of California has to be disclosed. That’s thanks to the Political Reform Act of 1974, which was itself enacted by voters via a proposition.
Groups that support or oppose propositions, known as “committees,” must register with the secretary of state and file periodic reports. Those reports list the names, occupations and employers of donors, as well as how campaign funds are spent.
Those reports are stored in a public database maintained by the government. Almost every state has one like it. In California, the database is called CAL-ACCESS.
The CAL-ACCESS website offers tools to inspect recent filings, and a bulk download where you can access the raw data.
Unfortunately, the downloadable files are so jumbled, dirty and difficult that they are rarely used.
The situation is so bad that a sitting secretary of state condemned CAL-ACCESS as a Frankenstein monster of code.
In 2014, a team of journalists, academics and developers formed to solve the problem. They called themselves the California Civic Data Coalition.
Their project aimed to create an open-source pipeline that converted the raw data published by CAL-ACCESS into refined data files that a beginner, like yourself, can easily pick up and analyze.
The coalition’s effort has drawn hundreds of contributions from developers and journalists at dozens of news organizations and was named a winner of the Knight News Challenge.
Experimental versions of the coalition’s data files enabled the Los Angeles Times to calculate the $500 million figure quoted earlier in this chapter. It’s work has also powered interactive graphics and several other investigations into the role of money in state politics.
You can review, install and contribute to the coalition’s open-source codebase on GitHub.
Currently, the coalition’s website archives the data published each day by the state and offers more complete documentation and easier access to the original files.
In the next chapter, we will show how to import that data into pandas and your notebook to start an analysis.