# Big Data

# Importing

# Packages

# Importing

# Data

# Averages

# Histograms I

# Histograms II

# Checkpoint #1

# Statistics

# Summary

### Polishing Histograms

Let’s make this graph more descriptive. It might help to have a title, x-axis label, and y-axis label so we know what we’re looking at so that if you showed it to someone else, they would understand it better.

1 import pandas # work with big data

2 import matplotlib.pyplot as plt # plotting

3 # Load data

4 human_data = pandas.read_csv('HumanHeightWeightData_1000.csv')

5 # Create a histogram of human height

6 fig2 = plt.figure()

7 plt.hist(human_data.height_inches)

8 plt.title('Human Height Distribution') # add plot title

9 plt.xlabel('Height (inches)') # x-axis label

10 plt.ylabel('Count') # y-axis label

11 fig2.savefig('HeightHistogram2.png')

Here we have about ~2 inch resolution, so we can see how many people fall into each 2-inch range. We call these ranges “bins.” What if instead we wanted to see how many people fall into each 1-inch bin (e.g. how many people are between 66 inches and 67 inches)?

We can do that by changing the number of bins in the histogram. In the previous histogram, there were 10 bins. Now, let’s make 16 bins (since our range is between 60 inches and 76 inches, and 76 - 60 = 16). We will also specify our range of between 60 and 76 to ensure that the bins each get drawn containing exactly one inch each.

1 import pandas # work with big data

2 import matplotlib.pyplot as plt # plotting

3 # Load data

4 human_data = pandas.read_csv('HumanHeightWeightData_1000.csv')

5 # Create a histogram of human height

6 fig3 = plt.figure()

7 plt.hist(human_data.height_inches, bins = 16, range = [60, 76]) # now there are 16 bins

8 plt.title('Human Height Distribution')

9 plt.xlabel('Height (inches)')

10 plt.ylabel('Count')

11 fig3.savefig('HeightHistogram3.png')