Technical Guide to Foursquare Places (Part 2): How does Foursquare Get Location Data Right?

Location data is hard to get right. In Part 1 of this Technical Guide, we outlined the importance of high quality points of interest data and various factors to consider when investing in a location data provider. What sets Foursquare Places apart is the rigorous process to guarantee the precision and reliability of our POI dataset.


Where does Foursquare POI data come from?

FSQ Places stands out as the sole provider of POI data with a comprehensive global representation of the world derived directly from firsthand information, drawing upon more than 14 billion distinct check-ins gathered through Foursquare’s consumer apps, City Guide and Swarm. These billions of check-ins power our dataset’s rich attributes, including tips, tastes, and photos. On top of our unique user-generated submissions, we programmatically crawl thousands of authoritative sources, including web resources, third-party partners, and feedback from business owners. To ensure near 100% coverage, we also team up with trusted listing syndicators who update millions of location data daily on behalf of brands, chains, and small business owners. 


How does Foursquare ensure accuracy of data?

Extract and Clean

Once we’ve curated the initial data, billions of raw data points undergo extraction, structuring, cleaning, and validating procedures. The process kicks off by identifying potential matches with existing POI within our database, using key attributes as indexing criteria. Subsequently, our models generate a similarity score. This model, responsible for resolving similarities between two POIs, is constantly improving through training techniques that use datasets from various countries and languages, all with human-reviewed labels. We use this similarity score to link the input data to the place that shares the most similarity. In instances where a source input doesn’t correspond to any existing POI, this situation prompts the initiation of a new POI entry.

Cluster and Summarize

Our clustering process organizes the cleaned non-unique data points into unique and accurate POI entities. Each attribute value within the same cluster is given a confidence score. Within each cluster, every attribute value is assigned a confidence score. In this stage of the process, we employ a range of methods and strategies, a) weighted mode summarization; and b) model-based summarization. Weighted mode summarization employs consensus voting in a source cluster to pick the most frequently suggested value. Meanwhile, model-based summarization relies on spatial context such as adjacent roads or buildings to determine the geocode from the list of candidate inputs from various sources. 

Calibration and Filtration

Assessing the quality of every point of interest in the dataset is a crucial task when maintaining a dependable location dataset. Foursquare has engineered several models to gauge how accurately each place in our dataset matches its real-world counterpart. We utilize the following scores to consistently detect errors and refine our data.

  1. Venue Reality Score (VRS). The Venue Reality Score signifies how ‘real’ (i.e., accessible to the public at a fixed location) we think a place is. Our models generate a score based on various signals, inputs, and movement data. From there, each POI is scored from Very High to Low, indicating our confidence level. We regularly analyze places in this category to enhance the signal or identify miscalibrations by the model.
  2. Closed Score. Using time-sensitive features like check-ins, review/tip patterns, and feedback received from our API, the Closed Score denotes whether the place is currently open or closed. POIs are categorized under VeryLikelyClosed, LikelyClosed, Unsure, LikelyOpen, and VeryLikelyOpen.
  3. Attribute Accuracy Score. One of the newer measurements that we have recently started to leverage, Attribute Accuracy Score helps determine the accuracy of attributes attached to a place. The algorithm used to power this score rewards or penalizes attributes based on conflicting/agreeing sources and places them through a time decay factor to assign each attribute a confidence score. Attributes scoring in the top 99% help us to a) sort the POIs based on accuracy score of specific attributes and b) identify which sources are trustworthy and which are not.

After assigning scores to all the place records in our dataset, we conduct a series of verifications to ensure that each record is eligible for inclusion in the final dataset we deliver. The factors or conditions we consider are: a) the key attributes of a place record are available; b) the reality score of a place and the accuracy of key attributes meet specific criteria; and c) each attribute has at least one credible source. We also verify that the details in a record logically fit together, such as the zip code matching the city. If a record successfully passes all these checks, we include it in the dataset we provide to our customers.

Quality Assurance and Release

Foursquare ensures that meticulous QA checks are performed at every step of the entire process. All data changes are evaluated using our automated QA workflows which flag regressions for any significant changes. Events that trigger changes to the Places dataset include a) a new batch of data available from a specific source or a set of sources; and b) a new version of the summarization algorithms or the calibration models available. Each of these changes are tested in a staging environment and all the stages in the entire data quality process is run. Learn what’s new in Foursquare Places from this year’s releases.

Read this blog article that provides an in-depth explanation of the engineering process Foursquare employs to maintain our global dataset of more than 200 million POIs.


The FSQ Places Data Schema

The FSQ Places data schema includes 25 core attributes as well as more than 115 rich attributes. These core attributes are included in all packages, while rich attributes can be added on for an additional cost to any standard country or region package. This allows for customization and flexibility in accessing the desired location data.

We provide a high-level overview of some notable attributes below.

Core Attributes

Core attributes refer to fundamental and essential characteristics or information about a specific location or place. These attributes provide basic details that are included in POI datasets and are essential for identifying and describing a place. Core attributes include the POI’s unique identifier in the FSQ database, business name, latitude and longitude, geocodes, and address. Aside from these basic information, the FSQ Places core attributes also include information on:

Translated name [name_translated]

This column denotes the user-entered translated name(s) of a venue. The translated name will also include a ISO 639-1 language code and follows the following format: [{Translated Venue Name,language code(en for English, ja for Japanese, etc)}]. Generally, this attribute will only exist for very popular POIs. Knowing a place’s translated name can provide several benefits, including accessibility for foreign visitors, enhanced user experience, search and discovery, among others.

Designated Market Area [dma]

This attribute defines the Designated Market Area, as defined by Nielsen, that the POI is located in. This signifies a region where the population can receive similar TV and radio offerings in the USA. There are 210 DMAs in the United States.

Understanding a place’s designated market area (DMA) is vital for businesses, advertisers, and media companies seeking to tailor their marketing and advertising efforts for better reach of their target audience. For example, DMA data helps businesses and advertisers focus their marketing efforts on specific geographic regions where their target audience is concentrated. This precision in targeting can lead to more effective advertising campaigns.

Category IDs [fsq_category_ids] and Category Labels [fsq_category_labels]

Foursquare category ID denotes the most granular category (or categories) available for a POI. Meanwhile, the category label refers to the label (or labels) for the most granular category (or categories) available. The ten parent categories are: 

  • Arts and Entertainment
  • Business and Professional Services
  • Community and Government
  • Dining and Drinking
  • Event
  • Health and Medicine
  • Landmarks and Outdoors
  • Retail
  • Sports and Recreation
  • Travel and Transportation

See our Categories page for more details.

Having knowledge on a POI’s category has been proven essential in enhancing the capability to generate more specific and detailed search queries, developing apps that provide customers with richer and more informative experiences, and gaining deeper insights into market conditions and environments.

Chain ID [fsq_chain_id] and chain name [fsq_chain_name]

Foursquare has established explicit associations between national and local brands or franchises and their physical stores. Users have the option to directly search the Places dataset using the chain’s name or unique ID to receive a list of stores affiliated with that particular chain. Example chains are McDonalds, Starbucks, Walgreens, and 7-Eleven.

Chain name and chain ID are the two chain-related attributes in Foursquare’s core attribute set. Chain name denotes the label that indicates which chain the POI is a member of. In comparison, chain ID is the unique identifier of the chain that the POI is a member of.

Having access to chain information can provide an array of benefits. App developers enable their end-users to easily find nearby chain locations or make personalized recommendations. For businesses, understanding the presence and distribution of chains can help in analyzing competitors and making informed decisions about market entry and expansion. 

Parent ID [parent_id] and subvenue count [subvenue_count]

The Places dataset can provide information on whether a POI is a parent venue, meaning the main location where a subvenue is located. For example, a clothing store inside a mall is considered the subvenue, while the mall is the parent venue.

The parent ID attribute indicates the Foursquare ID of a POI’s parent venue. Meanwhile, if a POI is a parent POI (example, a mall with several stores and restaurants), the subvenue count indicates the total subvenues within this parent venue. Meaning, it denotes how many total stores there are inside the mall.

Closed bucket [closed_bucket]

This attribute represents the probability that a given POI is no longer in business. Foursquare uses a machine-learning model to assess the current operational status of each POI. This closed-score model is trained on thousands of human annotations of Foursquare’s POI and uses features that reference how recent internet sources for the POI have been updated, like when the last time the POI had a check-in/tip/photo, etc. Using the Closed score that comes out of the model, we assign each POI to a closed_bucket of:

“VeryLikelyOpen” bucket [Highest Precision >95%]

“VeryLikelyOpen+LikelyOpen” [High Precision >80%]

“VeryLikelyOpen+LikelyOpen+Unsure” [Precision >65%]

Having insight on whether a POI is permanently open or closed is useful for both consumers and businesses. For example, for consumers utilizing location-based apps, knowing the operational status of a POI ensures a positive user experience. It prevents wasted trips to places that are closed, saving time and frustration.

Rich Attributes

The Foursquare Places rich attributes are additional and detailed pieces of information associated with POIs. These attributes go beyond the basic or core information and provide more comprehensive details about a place. 

Ratings, Hours, & Social Media

The Foursquare Places dataset delivers rich attributes, such as ratings, hours, and social media, to provide additional information on how consumers can engage with these POIs. With the ratings attribute, customers can access the rating of a POI (0-10) based on user votes as well as an internal score aggregated by likes/dislikes, tips, and visit traffic. Insights on price, whether the POI’s offerings are cheap, moderate, expensive, or very expensive, are also available. Additionally, the Places dataset offers both hours of operation and popular hours to provide data on traffic for consumers who want to avoid the crowd. Lastly, Places also helps deliver methods on how to reach out to the venue through social media information if available, such as Facebook, Twitter (X), and Instagram, as well as email, phone, and website.

Tips and Tastes

Tips and tastes are two unique attributes that are available in Foursquare Places through our multitude of user-generated content. An open text box, tips are recommendations (or warnings) posted by users on our site and in our apps. Tips are ranked by the amount of times a tip was liked and ordered by ranking alongside the TipID within the array. Meanwhile, Tastes are nouns or noun-phrases that signify unique qualities of the POI. These are extracted using Natural Language Processing (NLP) from tips and shouts found on Foursquare’s apps. Tastes are ranked using a combination of affinity and frequency, and are ordered by ranking within the array. 

Best Photos

The Photos attribute is another type of data that is uniquely available to Foursquare Places. Customers are provided with a URL to photos submitted by Foursquare users. Leveraging photos can help provide a visual depiction of the POI, offering a more comprehensive understanding than just textual information. This visual representation can be incredibly helpful for users trying to identify or recognize a location. Photos also add credibility and authenticity to the POI data. Seeing actual images of a location instills confidence in potential visitors or users, as they can visually verify the information provided.

Tags

Tags are a series of attributes that refer to description labels or keywords assigned to a POI. These tags serve as metadata that help categorize, organize, and identify various characteristics associated with a location. They can describe a range of features, such as whether a POI is dog-friendly (true or false) or has accommodations for gluten-free diet (poor, average, or great). Read our documentation to see all the tags available in the FSQ Places dataset.

Store ID

Store IDs function as distinctive markers or codes used for monitoring and analyzing transactions at the individual store or location level. With Foursquare’s Store ID, clients can rectify inaccuracies in transaction data, address inconsistencies, as well as fill in any gaps in attribute details and context to these transactions. Access to Store IDs can also help track and map transactions to real-world POI at an individual store level through matching Store IDs.


How can you access POI data through Foursquare Places?

You can leverage Places data using the methods and frequencies that best suit your needs. Data can be delivered weekly or monthly. Our POI dataset is also available through the Places API or flat file via JSON or TSV from a FSQ-hosted S3 bucket. Foursquare has also partnered with leading ISVs and cloud providers, such as Amazon Data Exchange, Snowflake Data Marketplace, Esri, Carto, and Korem, to help you quickly procure, access, and utilize Places POI dataset in your preferred supported environment. Read this documentation page to learn more about how you can access Foursquare Places data.

By tapping into the robust features, in-depth details, and global coverage that Foursquare Places delivers, organizations can enhance strategies, enrich applications, deliver engaging customer experiences, optimize resources, unlock new insights, and discover market opportunities — all while leveraging one of the most trusted and comprehensive sources of location data available today.


Getting started with Foursquare Places

In a world of rapid change and stringent data privacy regulations, acquiring and sustaining accurate points of interest data can be a formidable challenge for businesses. The landscape of locations and consumer behavior evolves rapidly, making it essential to have a trusted data partner that keeps pace. This is where Foursquare Places steps in as a dependable solution. With a dynamic and detailed database that’s constantly refreshed, Foursquare Places offers accurate and up-to-date data that businesses need to stay ahead of the curve. What’s more, Foursquare Places is committed to data privacy, ensuring that you not only have the most current location-based insights but that you also navigate the complex world of data privacy and compliance with confidence. To see Places in action, schedule a demo with our technical experts.

You may also download our sample dataset to explore information that is included with Foursquare Places. If you’d like to power your apps with POI data, get started by signing up for the Developer Console.

Learn more about Foursquare Places

/butto

Authored by: Princess Guzman

More on products

Personalization APIs 101

Learn More

Technical Guide to Foursquare Places (Part 1): Empowering Organizations with Points of Interest Data

Learn More

Let us show you how you can take advantage of Places API

Click here to arrange a meeting