top of page

Big Data

Importing

Packages

Importing

Data

Averages

Histograms I

Histograms II

Checkpoint #1

Statistics

Summary

Importing Data

Now we’re going to load the data, using the Pandas package we just imported. Again, the syntax is not important for you to understand right now. We’re going to read in a file of the heights and weights of 25,000 people and put it in an object called human_data. You can look at the raw data file, which is stored in the second tab of the coding console, titled HumanHeightWeightData.csv (you can also download the data to take a look at it yourself by clicking on the file name). As you can probably surmise by looking at the data, the filetype “csv” stands for “comma separated

values.”

Data.png

Let's import the required packages, load the data, and look at a couple of the data's properties:

1 import pandas # work with big data

2 import numpy # statistics functions

3 # Load data

4 human_data = pandas.read_csv('HumanHeightWeightData.csv')

# Print the top of the data file

6 print(human_data.head())

7 # Print the number of rows and columns in the data file

print(human_data.shape)

Note: Because we’re loading multiple packages and a large data set, please allow some time for the code to run (up to 1 minute).

Click here to

see files

Start coding here!

bottom of page