A Byte of my 2.2-lb Brain

Just sharing stuff…

Web Scraping the MMDA Traffic Updates

Metro Manila Traffic Rhythm

Thanks to a good friend and colleague**, last week, I learned about the existence of the Metro Manila Development Authority Live Traffic Monitoring System. Below is a video clip of the traffic data generated by the MMDA platform. Each of the circle represents an MMDA metrobase or station where traffic along a specific road segment is assessed (every 15 minutes). Traffic flow is only classified into three states: Light, Moderate, or Heavy.

The data cover the period when there was a (INC) mass gathering along Megamall and Shaw Boulevard area—the visualization has actually captured this. In the viz, SM Megamall and Ortigas Avenue are mostly on a “heavy” traffic  situation—for both the north and south bounds. For the southbound traffic, road segments before Shaw Boulevard sustained heavy traffic flows; after which, it was medium to light all the way. For the northbound traffic, on the other hand, everything before SM Megamall was jammed and all bases after were in either medium or light traffic condition. This clearly illustrates where the bottleneck was. 

Details behind the visualization are discussed in the next section.

Data Scraping and Mining

Interaksyon_TV5-MMDA_Traffic_Monitoring_System_-_2015-08-30_19.15.33The Metro Manila Development Authority Live Traffic Monitoring System gives traffic flow reports at intersections around Metro Manila, and is updated every 10 to 15 minutes by the MMDA MetroBase. As I understand, MMDA officers from post manually key in the status. The data do not directly provide traffic velocity, but instead, they indicate whether the traffic flows along road segments are: Light, Moderate, or Heavy (LMH). Aside from the LMH statuses, the platform also gives alerts—whether there’s an accident, a road construction, or a rally happening, among others.

When I saw the site, I had the urge to revisit web scraping (again). I thought that this was a good personal project to “refresh” ones web scraping skills, and data viz skills as well.

For web scraping, I used the python packages urllib2 and BeautifulSoup. The latter helps to find tags in whatever html page one is looking at, allowing the extraction of only the necessary information on a page. For page downloading, I used urllib2—a library for getting and opening URLs. If you want to mine data from a website with URL URL, just execute:

import urllib2

mypage = urllib2.Request(URL)
thepage = urllib2.urlopen(mypage).read()

Then use the BeautifulSoup package to “read” the page.

from bs4 import BeautifulSoup

soup = BeautifulSoup(thepage)

Since the update is done every fifteen minutes, it is necessary to put a pause in the system at every iteration:

import time

def sleep():
print "Sleeping for 15 minutes"
time.sleep(60*15)
return

All scraped (“streaming”) information are stored in a database (sqlite). For each area (k, i.e. Edsa, C5, etc.), a tower (towerName) is assigned (this is probably the intersection) that gives both the northbound (nstat) and southbound (sstat) statuses along a specific road segment.

entry = MMDATraffic()
entry.LINE = k
entry.TOWER_ID = towerID
entry.TOWER_NAME = towerName
entry.NB_STATUS = nstat
entry.SB_STATUS = sstat
entry.UPDATE_INFO = dt
entry.DATETIME = mytime

dbsession.add(entry)

The final code is less than 100 lines! My colleague Ed is currently playing with SVG scripting to dynamically visualize the scraped data. While he is finishing the SVG rendering, I used a visualization software to have a peek at the data.

Status: 1 – Light, 2 – Moderate, 3 – Heavy.

mmdatraffic

Visualization Snapshot. The orange plots are for the north-bound traffic; the blue plots, on the other hand, are for the south-bound. The snapshot is at 5PM of August 29, 2015. Note that there is a rally going on at this hour along EDSA. Thus, one may consider this dynamics an anomaly (compared to regular Fridays). The subplots on top show the number of Light (1), Moderate (2), and Heavy (3) traffic flows along the given lines (EDSA, Ortigas Ave., C5, etc.). Each of the circles at the bottom plots are the “sensing towers”. The color (light to dark) and the size (small to big) indicate the status along the specific junction.

Acknowledgment

Thanks to Ed David for letting me use his code on databasing, particularly, creating the sql engine using sqlalchemy.

** Follow Reina Reyes’s blog for updates on their work on the MMDA data and Manila traffic.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Information

This entry was posted on September 1, 2015 by in Geek, Philippines and tagged , , , , .
%d bloggers like this: