Modeling Medellín’s Road Traffic Collisions Using Foursquare Data

Analyzing people’s behavioral patterns is key to understanding the places people live, interact and work, as it in turn influences the outcomes of human behavior. In recent years, technological advances have provided new and exciting opportunities for scientific researchers to uncover and explore more human behavioral patterns, especially across urban spaces. Such advancements have generated vast amounts of behavioral data with unique insights that small-scale research, survey data, and field research cannot offer. With access to location data, researchers can investigate human activities across the globe and uncover key insights for urban transportation studies.

As the leading platform in the geolocation data industry, Foursquare builds proprietary technology for understanding the places people go and visit in real-time, offering a trustworthy, privacy-first source of human activity data. Utilizing this power of location data from Foursquare, researchers from University of Luxembourg and National University of Colombia, conducted a study performing systematic spatial and exploratory analysis to identify road traffic collision (RTC) patterns in Medellín, a city in Colombia. The researchers proposed a novel approach where the territorial administrative zones of Medellín were grouped using a K-Means clustering model, based on the composition of their venues using Foursquare data with an aim to analyze trends in RTC for each cluster. From this analysis, the researchers evaluated optimal strategies in order to reduce the RTC in Medellín, especially given the differences between the city’s zones.

The novelty of this research lies in the use of check-ins and venue density in Foursquare’s location data. Foursquare’s location based experiences consist of information about venues, check-ins, users, and more, that can be used in multiple ways to explore mobility patterns as the Foursquare API supports real-time access to locations, mapping users to specific locations, and geotagging. Based on initial exploration of the dataset, researchers observed that most RTC are located on the center and main avenues of Medellín, which are the most crowded zones in the city. After defining the location of each territorial administrative zone in Medellín, the researchers proceeded to explore the Foursquare API to collect data related to check-ins and density of venues in each zone, achieving 2769 venues grouped into 241 categories, and obtained the ten most common categories of venues for each zone. Subsequently, they implemented the K-Means clustering model to define five clusters, allowing them to use the geographic coordinates of the different types of RTC and assign them within each cluster.

Findings indicate that the largest group, representing 84.7% of the area of Medellín (residential neighborhoods), has the largest number of RTC and collisions per zone. Meanwhile, the first cluster comprises zones surrounded by landscapes and parks, in which run overs are more common than fallen occupants and other RTC. The public spaces in the first cluster are regularly visited by pedestrians, hence the large percentage of run overs. This trend can also be seen in the third and fourth cluster, where there are a lot of parks, squares and restaurants. The second cluster, on the other hand, is composed of food trucks, gyms, bars and stores, where fallen occupants and other RTC occur more than run overs. Cluster 2 also has the most RTC per area, because it is frequently visited at night hence factors such as poor visibility, poor illumination and alcohol consumption come into play that lead to a major rate of collisions per area. From these findings, the researchers concluded that zones with parks, restaurants, and stores such as the third and fourth cluster are important in reducing the severity of pedestrian injuries (common victims in run overs).

An important implication of this study’s results is the researchers’ use of data from Foursquare and the K-Means clustering model to group all zones of Medellín based on the composition of their venues. This is a critical outcome since there is a lack of studies focused on developing a detailed diagnosis of the RTC in a city by identifying different zones according to specific characteristics as hot spots. It presents a systemic and comprehensive approach using Foursquare’s location data as a geographic method to perform spatial analysis and characterize the infrastructure of each cluster. Ultimately, the study’s diagnosis provides unique insights for public agencies to conduct an evaluation and prioritize strategies to reduce RTC based on the differences between zones according to the composition of their venues.

Foursquare believes in the power of location and its potential to positively influence the future of society. Our location technology solutions present unique opportunities for knowledge creation in urban planning and beyond. Contact us to learn more about leveraging our technology in scientific research, or check out the location data available for free on the AWS Marketplace.

More on data

The benefits of using geospatial data in analytics

Learn More

How location intelligence leads to powerful data solutions

Learn More

Deck the Malls: What to Expect from Shoppers this Holiday

Learn More