Weather data visualization for San Francisco Bay Area – a Python Pandas and Matplotlib Tutorial

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Weather data is a great type of input when starting to learn tools and technologies for your data science skills. This project will introduce us to the basics of Pandas and Matplotlib Python libraries using data for San Francisco, San Mateo, Santa Clara, Mountain View and San Jose in California.

If you are interested in checking out the whole project you can run it  in your browser using our PLON Platform.

Continue reading

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

An Introduction to Geo Data Science with Python

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Geographic data (Geo data) science is a subset of data science that emphasis on location based data. By location based, I mean maps, description of objects and their relationship in space. This article will help you get started in Geo Data Science with Python on the platform. It will work you through analyzing and visualizing the world airports dataset. Specifically, we are going to create a web map of a type of world airports.

You can launch the project here – Getting started with Geo Data Science on

Getting the data

The data we are going to explore is the world airports geo data which can be downloaded from here.

Download the CSV file containing information on all airports on this site.

The data contains names and other details of all the airports around the world. What makes this data a geo data and suitable for this tutorial is the presence of the “latitude and longitude” columns, aside from that it is just like any other dataset. So if you intend to follow along with you own generated a dataset, be sure it is a geo data (that is it contains latitude and longitude for each entry).


Geo data python modules

There are several Geo data processing modules in python that perform different geo processing tasks, they include GeoPandas, Folium, Fiona, Rasterio, Geopy, Cartopy, Shapely, PySAL, etc. You can find more here.

However, in this tutorial, we will only use one of them (Folium) in conjunction with Pandas a powerful data wrangling module in Data Science ecosystem. Folium is a python Geo data module that makes it possible to create beautiful HTML web maps with Leaflet.js & Python. Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. So basically, we manipulate our data in Python, and then visualize it on a Leaflet map via Folium.

Installing libraries in PLON is easy – just go to: Tools menu >> Libraries and install folium.

The code

First, we import the two modules (folium and pandas) we will use in this exercise and then we will test run by visualizing just one or two airports on the web map. This will allow you understand the coding principle, and then in a while, we will visualize an entire section of the airports list directly from the CSV file.

Creating the base map

The code above simply generates a map with default base map (OpenStreetMap) centered at a location in New Mexico, United States by creating a map object variable “new_mexico” from calling the folium “Map” class. * You can see other classes in folium by executing “dir(folium)” The map object accepts a number of parameters; the most important one is the “location” parameter which is given by lists of latitude and longitude. You can save the map into an

* You can see other classes in folium by executing “dir(folium)” The map object accepts a number of parameters; the most important one is the “location” parameter which is given by lists of latitude and longitude. You can save the map into an

The map object accepts a number of parameters; the most important one is the “location” parameter which is given by lists of latitude and longitude. You can save the map into an

You can save the map into an HTML file by calling the “save()” method or simply call the map object to render it on the browser platform.

Let’s add a zoom parameter to the map object, here I will set the start zoom to six (zoom_start=6). You can play around with the zoom level to see what best suit your display.

Adding a point to the map

The map so far is plain with no much useful information. Let’s now add a marker/point and popup label to show the location of one of the airports (the Columbus Stockyards Airport with latitude and longitude values as (31.7918, -107.638001) in New Mexico, United States).

This can be achieved by calling the “add_child()” method on the map object. The “add_child()” method takes in some parameters such as location, popup, and icon.

As you can see above, the marker uses a default blue color. We can change the color by adding the third parameter “icon” like this: icon=folium.Icon(color=’red’), here I set the color to red.

Adding multiple points to the map

Let’s add another airport (Luna Landing Airport with latitude and longitude as 32.10010147, -107.822998). One way to do this is to duplicate the “add_child()” method line while changing its parameters like this: new_mexico.add_child(folium.Marker(location=[32.10010147, -107.822998], popup=”Luna Landing Airport”, icon=folium.Icon(color=’green’)))

Here we changed the location, pop-up and icon parameters to suit the new airport.

Manually adding the airports isn’t efficient especially if you have thousands of airports to add like we have in the CSV file. So, we need an efficient way to add multiple markers. We will make use of “for loop” to add multiple airports as follow. First, let’s create three “

First, let’s create three “Python lists” to hold the airports details and loop through to add each airport on the map.

In reality, the list of airports will be in a CSV or other file formats. So, you will have to read in the file to visualize the airports on the map. At this point, we will use “pandas” module to read in our CSV file containing the list of airports and visualize all the airports on the map.

Explore the data using pandas

Here we will quickly explore the airport CSV data to have some basic understanding of it structure by taking a look at the summary of numerical fields using pandas “describe()” function.

To keep the code clean, open a new file and load the CSV file into pandas dataframe then call “describe()” function on it.

As you can see above, the data has four numerical fields (id, latitude_deg, longitude_deg & elevation_ft) and the total count of the row is 52,016 records (note that “elevation_ft” column contains some missing values). You can also see other basic statistical measures for each of the numerical fields.

If you check the first ten records from the head of the data by calling “head()” function, you should see that it has 18 columns and one of them is the “type” column.

Now let’s group the table by “type” to further explore more details about the table. We call “groupby()” function on the dataframe and then look at the summary of numerical fields by using “describe()” function.

As you can see, we have seven types of airports (balloonport, closed, heliport, large_airport, medium_airport, seaplane_base, & small_airport) and the number count each type is also listed.

Geo visualizing and analyzing “Large Airport”

Type Assuming we are only interested in the “large_airport” type and we want to visualize their spatial/location distribution around the world, we then need to extract it out for further processing by using the pandas slicing “loc[]” function.

Let’s save the extracted airports into a dataframe, then loop through the latitude and longitude columns while using folium to geo visualize the airports.

These are the geo location of 500+ airports that are of the type “large_airport”. However, there is a minor issue with the visualization above, that is the markers used isn’t the best since there are many locations to display at once. So, we need to use another suitable marker such as point or circle marker.

Fortunately, folium has a marker called “CircleMarker”, let use that… so we can have a cleaner map as seen below.

Visually analyzing the airports by elevation

Let’s assign a color code to the airports according to their elevation value which is available in the “elevation_ft” column.

Elevations less than 1000ft are assigned green color, elevations between 1000ft to 6000ft are assigned orange color and elevations above 6000ft are assigned the red color.

We do this by defining a function with the condition above within it and assigning the returned color to the ‘color’ and ‘fill_color’ parameters as seen below

From the above, we can see that most of the airports classified as “large_airport” type are located in North America and Europe. This can be difficult or impossible to see by looking at the CSV data directly. Also among this type of airport, there are more airports with elevations of less than 1000ft (green color).

You can launch the project here: Getting started with Geo Data Science on


I hope this article will help you maximize your efficiency when starting with Geo data science in Python. Python is really a great tool and is becoming an increasingly popular language among the Geospatial Data Scientists. The reason being, it’s easy to learn, integrates well with other databases and tools like ArcGIS and QGIS. Majorly, it has a great computational intensity and has powerful data analytics libraries.

Launch project on PLON: Getting started with Geo Data Science on

Article created by

Umar Yusuf,

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

The World Bank GDP Analysis using Pandas and Seaborn Python libraries

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Pandas and Seaborn are one of the most useful data science related Python libraries. The first one provides an easy to use and high-performance data structures and methods for data manipulation. The latter is build on top of matplotlib and provides a high-level interface for drawing attractive statistical graphics. How do they work?

Let’s check it out using World Bank GDP data from 10 central European countries – Poland, Germany, Belarus, the Czech Republic, the Slovak Republic, Hungary, Estonia, France, Ukraine and the United Kingdom.

Just want to run the project? You can find it here: The World GDP Analysis Project

What are we looking for?

The question – How far in economic development eastern Europe countries are relative to developed countries like Germany and France?

To answer it we need to analyze four GDP factors – GDP per capita (US$), GDO per capita growth (annual %), GDP growth (annual %) and GDP (current US$).

The data from the World Bank (from the World Development Indicators website to be exact) are in an open format and have good history records for many countries that include a number of economic and social indicators.

We chose the years 1990 – 2016 because only these were available for the selected indicators.

You can find the data here.

The code

First, we load the data from the CSV file. Then we remove the last 5 lines, because they contain empty values and information about the date of the last data update. In addition, we have to remove the column with the year 2016, because, as it turned out it is empty (no data). “gdp.replace” is responsible for the replacement of two dots, symbolizing the empty NaN.

In the course of further work with DataFrame I received mysterious errors and at first, I was not able to determine what was wrong. After some time I decided to check the types of the individual columns:

To my surprise dates from 1990 to 1995 didn’t have the data type float64 only object, so I decided to be sure all the columns of years to convert to numeric values. For this purpose, I select  columns from 4 up to the end (that is, all of the years) and with use of “apply” method ‘I applied the function “pd.to_numeric“. It converts all years to a floating point number.

In each row, was the name of the country, its code, the name of a series of data from the World Bank, its code, and in subsequent columns the years. Such arrangement of the data was not too comfortable so I decided to reindex the table using the functions “pivot_table

This has changed dataframe from form:

worldbank pandas dataframe

To this one.

worldbank pivoted table in pandas

That way I can pull any economic indicator and immediately have all the countries along with all the years.

Now I can easily visualize 4 selected indicators. For nicer graphs import Seaborn and set the color palette so that each line on the graph was plotted with a different color. Try comparing charts with and without Seaborn.

Drawing directly with the pandas is really simple – just for our pivot table choose the interesting indicator, then transpose the data (function .T) and plot (, plot ‘).

The first two charts

  1. GDP (current US$), data from World bank

GDP plot eastern europe

2. GDP per capita, data from World bank

GDP per capita pandas plot eastern europe

Let’s try to perform a simple regression from the GDP data to see if there is a chance that one day we can catch up with Germany. This time we will use the “lmplot” function from the Seaborn library, except that the data must lead to a form of time series.

From the data in the form of a table with countries as columns, we need to create a table in which we will have only three columns [years, the country GDP]. We do this through a series of operations, the removal of the index, because our table at the beginning of the year is indexed (unique rows), changes of the name of the column. The key operation here is the “melt” function that transmits the data from the column and adds them into the next rows. So that we are able to make the following transformation. The attached images omitted part of the columns and rows but I hope its clear.


We should get a result similar to this:



Be sure to check also how ca you launch the project in PLON after importing it into your PLON account.

Important link

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn