Creating an Event Scraper

Scrapers are the crawlers that run automatically to pull data from other websites, such as venue calendars, ticket listings, and event blogs. Each scraper has a script that is specifically created to read and pull the data from its assigned site.

You can create a new scraper by:
  • Using an existing script for sites powered by the same source with an identical page setup, such as Ticketfly or Ticketweb.
  • Requesting a new script for a venue calendar, ticket listing, or event blog site.
There can be many venues that use the same script, but in most cases you won’t have a venue using multiple scripts. The goal is to use the source that has the best information for a particular venue, typically the venue's own website or their main ticketing source.

Creating Scrapers

To create a new scraper:
  • Go to Radmin -> Events -> Scrapers
  • Click the 'New Scraper' button in the top right of the screen
  • After hitting that button, you will see this screen:


Let's talk through the different fields and what they all mean:
  • Kind: Please always leave this as 'Event' for now. We will likely have other types of scrapers available eventually, but for now, event scrapers are the only active ones.
  • 'Active': When the 'Active' box is checked, the scraper will run automatically every 25 hours or so. Almost every scraper should be marked 'Active'. The only exception to this would be a scraper that you only want to run manually, such as one that scrapers a spreadsheet that you manually update.
    • A note that you can manually run any scraper at any time, whether it be Active or not, by going to Radmin -> Events -> Scrapers, searching for the scraper you want, selecting it and clicking the 'Run' button.
  • 'Venue': Please select the venue to which you'd like this scraper to be attached.
    • Our scraping system will let you scrape the same venue with multiple scrapers and also scrape multiple venues with the same scraper, but we would highly recommend against doing that whenever you can - as it causes more work for yourself and less accurate events to populate to your site.
    • If you're ever unsure of how to handle a certain scraper setup for multiple venues, please reach out to Support.
  • 'URL to scrape': Please select the most complete and accurate source for the venue's events. This is very often a ticketing website or the venue's website. It could also be the Facebook Events feed for the venue.
    • A note that if the URL you would like to scrape is a Ticketmaster or Live Nation page, we ask that you stop your process here and reach out to Support for help. The reason for this is that we can write you a custom script that interfaces with Ticketmaster / Live Nation's API - and that script will be the best way to scrape these two ticket providers. The process is very quick and we will have your scraper up and running within 24 hours.
  • 'Category': This is the default category with which the events from this scraper will enter the pending queue. You can always change that category in the pending queue, and you can also ask the scraper to assign specific categories to certain types of events (more on that later). For now, just use the category that would most frequently be assigned to events from this scraper.
  • 'Multiple Venues': Check this box if you would like the scraper to pull in events from (you guessed it) multiple venues. Because this is often a risky move, we highly recommend reaching out to Support before creating a scraper of this type.
  • 'Ignore Bands': If this box is checked, the scraper will not automatically pull in any bands. This can be very helpful in either of the following situations:
    • If the events are the sort that do not include artists, bands or comedians - such as literary nights and sporting events.
    • If the URL you're scraping lists bands in such an inconsistent way that you would have to manually delete and re-add most of the bands yourself in the pending queue anyway. It can be easier to not pull bands in and manually type them in yourself in the queues.
  • 'Conference' and 'Conference Category': These won't apply to most of your scrapers (so you can leave them blank) - and will only be used if the scraper you're creating is pulling in events for a conference lense. If you are building a scraper for a conference lense, please reach out to Support for specific guidance as the strategy for these scrapers varies depending on the conference.
  • 'Script': If the scraper you're creating is pointed at a type of site for which there is already a script created, please select that script here. 
    • Examples of these sorts of scripts are ticketing websites like Ticketfly and TicketWeb, Facebook Event pages and Google Spreadsheet.
    • If the scraper you're creating is pointed at a unique site, such as a venue website or a ticket blog that has not previously been scraped, then please leave this field blank.
    • Please note that there are a few more use cases for special types of scrapers listed at the bottom of this document. 
  • If you selected a script from the 'Script' dropdown:
    • Your next step is to click the 'Next' button to finish creating the scraper.
    • Once the scraper is created, click the 'Run' button to manually run the scraper and confirm that is working correctly.
      • A note that all scrapers appear 'Broken' while they're running, so if you hit run, refresh the page and see a 'Broken' error, that is expected. Instead, it is best to run the scraper, go do a different task and come back to it.
  • If you did not select a script from the 'Script' dropdown:
    • Check the 'Script Needed' box and then click 'Next' to proceed to the next page:
On this page, you must provide Taro, our master script writer, an example of an event that you would like brought in from the scraper, so that he knows how to format the script. This form should be pretty self explanatory, but here are some pro tips for filling it out:
  • Be sure to pick an event that is far enough in the future that it doesn't pass before Taro has a chance to look at it. Taro generally gets to scrapers within 24 hours, so one week out should be plenty of time.
  • Make sure to pick an event that includes all of the information that you would like pulled in when available. If your example event has no 'Presented By' field, but other events do, it will be hard for Taro to know how to handle this field when it comes up.
  • You do not need to include 'End Date' and 'End Time' if you do not want to, nor do you have to include 'Poster Image URL' if it is not available.
  • If you would like any information to appear in the 'Ticket URL' field and/or 'Description' for all events from this scraper, please add it to the respective field and then leave Taro a note about it in the 'Notes for Scripter' section.
    • An example of when you would do this would be if all events at a particular venue were free or all ages.
  • In the 'Notes for Scripter' section, you can indicate whether you'd prefer if Taro pulled event cover images from the URL you're scraping or didn't scrape in images. If the images on the site you're scraping are small, text-heavy or generally bad, you'll want to be sure to ignore images.
  • If there are any events on the URL you're scraping that you don't want pulled in, you can leave a note for Taro requesting that he build that into the script.
  • You can also request that certain categories be applied to certain events in the 'Notes for Scripter' section.
    • An example of this could be asking for events that include 'Happy Hour' to be categorized as 'Drink Specials' and for all other events to be categorized as 'Live Music'.
  • Your next step is to click the 'Next' button to finish creating the scraper.
    • This will alert Taro that he has a new scraper that he needs to build. 
    • He'll get started on that scraper within the next day. If he has any questions for you, he'll reach out to you via Radmin - and you'll receive an email alert when he does. Otherwise, your scraper will start running once it's built, and you'll notice events from this scraper populating in your queues.

Special scrapers and what to do with them:
  • Please note that you have the option of scraping directly from the API of Ticketfly and TicketWeb. This is preferable to scraping the front end of these ticketing websites because it allows us to access the raw, plain text data with no mark up - so the scraping process is quicker and more accurate. To set one of these scrapers up.
    • Find the venue ID of the venue you're scraping by Googling '(Venue Name) (Ticket Provider)', clicking on the venue's ticketing page and checking the URL for the venue ID. 
    • Append the venue ID to this URL...
      • For Ticketfly: http://www.ticketfly.com/api/events/upcoming.json?venueId={{IDHERE}}
        • So that the resulting URL would look like http://www.ticketfly.com/api/events/upcoming.json?venueId=1239 for this venue.
      • For TicketWeb: http://api.ticketweb.com/snl/VenueAPI.action?key=OnTLfy5CJ7XX1mLwynRp&version=13&venueId={{IDHERE}}&method=json
      • Copy the resulting URL, start setting up a scraper as normal and use that as the 'URL to scrape'.
      • Select 'TicketWeb API' or 'Ticketfly API' as your script for the scraper.
  • If you would like to scrape events from Ticketmaster or Live Nation, please reach out to Support and we will build you a custom API script.
  • If you would like to scrape events from a Google Sheet:
    • Please make a copy of this sheet.
    • Create a scraper as normal using the steps above.
    • Select 'google spreadsheet' as your script.
    • Use the URL of the spreadsheet as the 'URL to scrape'.
      • Make sure to set your Sheet's share settings so that anyone with the link can view the sheet.
If you have any feedback about this article or would like anything added to it, please hit up Support!
Comments