We can use Python to start to get a sense of what our data actually looks like.
One tool for visualizing data quickly is a graph called a histogram. Histograms display the frequency of different values in your data so that you can see which values are most and least common. Histograms are essentially bar plots where the x-axis represents the values in your data and the y-axis represents counts (i.e. how many of each value we observed in our data).
Histograms in Python are super simple! It’s just a few lines of code. Let’s make one to look at the distribution of height in our data set. In addition to the Pandas package we used earlier, we’re going to use another package (Matplotlib) to help us make visualizations like this one.
1 import pandas # work with big data
2 import matplotlib.pyplot as plt # plotting
3 # Load data
4 human_data = pandas.read_csv('HumanHeightWeightData_1000.csv')
5 # Create a histogram of human height
6 fig1 = plt.figure() # tell Python that we are about to make a graph
7 plt.hist(human_data.height_inches) # histogram of human height
8 fig1.savefig('HeightHistogram1.png') # save histogram