Big Data
Importing
Packages
Importing
Data
Averages
Histograms I
Histograms II
Checkpoint #1
Statistics
Summary
Importing Data
Now we’re going to load the data, using the Pandas package we just imported. Again, the syntax is not important for you to understand right now. We’re going to read in a file of the heights and weights of 25,000 people and put it in an object called human_data. You can look at the raw data file, which is stored in the second tab of the coding console, titled HumanHeightWeightData.csv (you can also download the data to take a look at it yourself by clicking on the file name). As you can probably surmise by looking at the data, the filetype “csv” stands for “comma separated
values.”
Let's import the required packages, load the data, and look at a couple of the data's properties:
1 import pandas # work with big data
2 import numpy # statistics functions
3 # Load data
4 human_data = pandas.read_csv('HumanHeightWeightData.csv')
5 # Print the top of the data file
6 print(human_data.head())
7 # Print the number of rows and columns in the data file
8 print(human_data.shape)
Note: Because we’re loading multiple packages and a large data set, please allow some time for the code to run (up to 1 minute).