First Python Notebook

10. Sort¶

Another simple but common technique for analyzing data is sorting. This can be useful for ranking the DataFrame to show the first and last members of the table according to a particular column.

The sort_values method is how pandas does it. It expects you to provide it with the name of the column to sort by in quotes. Try sorting by our computed field.

merged_list.sort_values("per_100k_hours")

	latimes_make_and_model	accidents	total_hours	per_hour	per_100k_hours
1	AIRBUS 130	1	1053786	9.489593e-07	0.094896
11	SIKORSKY 76	2	915515	2.184563e-06	0.218456
10	SCHWEIZER 269	5	1139326	4.388560e-06	0.438856
2	AIRBUS 135	4	884596	4.521838e-06	0.452184
4	BELL 206	30	5501308	5.453249e-06	0.545325
0	AGUSTA 109	2	362172	5.522238e-06	0.552224
5	BELL 407	13	2113788	6.150096e-06	0.615010
8	ROBINSON R22	20	2970806	6.732180e-06	0.673218
3	AIRBUS 350	29	3883490	7.467510e-06	0.746751
6	HUGHES 369	13	1201688	1.081812e-05	1.081812
7	MCDONNELL DOUGLAS 369	6	550689	1.089544e-05	1.089544
9	ROBINSON R44	38	2359729	1.610354e-05	1.610354

Note that by default sort_values returns the DataFrame sorted in ascending order from lowest to highest. You can show the largest values first by passing in an optional keyword argument called ascending. When it is set to False, the DataFrame is sorted in descending order.

merged_list.sort_values("per_100k_hours", ascending=False)

	latimes_make_and_model	accidents	total_hours	per_hour	per_100k_hours
9	ROBINSON R44	38	2359729	1.610354e-05	1.610354
7	MCDONNELL DOUGLAS 369	6	550689	1.089544e-05	1.089544
6	HUGHES 369	13	1201688	1.081812e-05	1.081812
3	AIRBUS 350	29	3883490	7.467510e-06	0.746751
8	ROBINSON R22	20	2970806	6.732180e-06	0.673218
5	BELL 407	13	2113788	6.150096e-06	0.615010
0	AGUSTA 109	2	362172	5.522238e-06	0.552224
4	BELL 206	30	5501308	5.453249e-06	0.545325
2	AIRBUS 135	4	884596	4.521838e-06	0.452184
10	SCHWEIZER 269	5	1139326	4.388560e-06	0.438856
11	SIKORSKY 76	2	915515	2.184563e-06	0.218456
1	AIRBUS 130	1	1053786	9.489593e-07	0.094896

Congratulations. With that, you’ve re-created the heart of the analysis published in the Los Angeles Times and covered most of the basic skills necessary to access and analyze data with pandas.

Before we move on, here’s another quiz for you. You can answer all of these questions using only tricks we’ve learned thus far.

What’s the date of the most recent fatal helicopter accident in Texas?
How many fatalities occurred in Texas accidents?
What helicopter model logged the most flight hours?
Where did the accident with the NTSB number ERA13LA057 occur?