At first Alan and I decided to do data visualization and reached agreement in a pretty short time. Then we both wanted to do something on the map. So we came up with visualizing the traffic in New York. As we know that New York is always in traffic jam, and the subway trains always delay. We want to draw the real condition in the city.
Originally we wanted to use realtime data as the data feed in the project. However, there were some difficulties coming up.
The data for the traffic in the road is from NYC DOT(Department of Transportation). However, as we distracted the data, it only gave us the speed of highway. Next, we tried the data from NYC Taxi & Limousine Commission. The data here is not realtime, and they changed the format of its data so that it only marked out the picking and dropping AREA in each taxi.
Due to the difficulties of getting realtime traffic data, we found an example from Deck.gl, which is a visual library made by Uber. One of the provided example shows the taxi route in 30 minutes, and the taxi data has route coordinates. Accordingly, we took the data from the example to be our traffic data to represent the traffic speed.
For the subway data, MTA provides realtime data for all the trains. Unfortunately, it is not in JSON format. It is in a data format called GTFS instead. GTFS stands for General Transit Feed Specification, which is made by Google. It uses with protocol buffer to encode the data feed into a form that is even lighter than JSON format. However, decoding GTFS is more difficult than we thought. At the end, we found a solution from Github that is written in Python, and it is the only code we can successfully decode the data provided by MTA.
The decoded data looks like this:
To fully understand the data, MTA provides a documents to define all the instances. Additionally, there are files that give coordinates for each route and stations.
The GTFS data from MTA updates every 30 seconds, however, we didn’t use a server to grab the data so we can only download them to the local. For the project, we downloaded the data every 30 seconds in 30 minutes.
The main code can be divided into three parts: Setup, Analysis, and Draw.
This part is to have every data loaded and ready.
The first thing is to set the map. Our project uses Mapbox as the base, and we use Mappa to call Mapbox and connect it to p5. However, there is a small issue to do some additional manipulations to Mapbox in Mappa. If we want to add something on the map through Mapbox, it should always be event-driven. So if I want to draw something by Mapbox not by p5, I can only do it by pressing mouse or a key.
Next, load data for traffic, subway stations, subway routes, and subway GTFS data.
For the input data, I create classes for each dataset: Traffic, MtaStation, MtaRoute, and Train.
One object of Traffic and Train are simply one taxi and one train respectively.
Each MtaStation object represents one station, and each MtaRoute represents one train route.
To convert the raw data into class objects, I use some simple regular expression techniques to get and match what I want.
Moreover, since the data from GTFS is chronological, and I need to put the data of same train to the same object. So I need to efficiently find out where the exist train is in the array. To reduce the computation loading by using loops searching the same train id or route id, I create several hash tables to make it easier and faster.
After getting every data arrays ready, I still need to do some sorting due to some issues. For example, for the data of traffic and GTFS, I need to sort it due to the chronological issue, so that I can know when to start drawing each train and taxi. For the MtaRoute, I have to sort it because the raw data records every route a train might run through, and most of them are pretty similar, so I need to distinguish which line I should draw and which I don’t to reduce the data amount on p5.
The last part is to draw the data on the map.
I draw the 3d buildings and stations through Mapbox, so it will only comes up when there is a key pressed.
For the traffic data, train routes, and running trains, I draw it on p5.
If I draw everything on the map, it will definitely make the performance worse. Also, the data we are showing is based on time, so they should be drawn in the time order.
I use the timestamps between two points and use “lerp” function to interpolate the line between these points to make the trains and taxis run in different speed. For example, if one taxi runs through two points within 10 seconds, then I will divided the line of this two points into 10 pieces, so that only one section is drawn in every frame, which means it needs more frames to go through the whole line if it takes more time.
Short video demo: The Data Viz of NYC Traffic
The full code is here
The next thing we want to do is to add the time display on it. Perhaps it can be stopped or slowdown. Moreover, we want to add more interactive elements into it. For example, the user can choose which train route is showing by checking a checkbox. Also, we want to put popup windows to show some information when the user click on the station.