Creating a Comment Scraper

I. Introduction

  • Scrapers are the crawlers that run automatically to pull data from other websites, such as venue calendars, ticket listings, and event blogs.

  • There are two types of scrapers:

    • Event scrapers - attached to a venue to pull in data about events.

    • Comment scrapers - attached to a user to pull in reviews and/or votes for upcoming events or bands.

  • Each scraper has a script that is specifically created to read and pull the data from its assigned site.

  • You can create a new scraper by:

    • Using an existing script for sites powered by the same source with an identical page setup, such as Ticketmaster, Live Nation, or Village Voice.

    • Requesting a new script for a venue calendar, ticket listing, or event blog site.


II. Navigation

Step 1: Under the “Main” tab in Admin, click “Scrapers”

Note: The default view is your TOTAL list of scrapers starting with your venue event scrapers in reverse alphabetical order.

Step 2: You can search for a particular scraper or group of scrapers in the right sidebar.

EX. You can filter by “Script” and select “songkick” to view how many of your venues still rely on Songkick to scrape events, or filter by “Kind” to view only Comment scrapers.

Step 3: To view a scraper’s details, click on the venue name under the “Venue” column.

Note: The comment scrapers’ links will read “Without Venue.”

Step 4: Review the scraper’s page for the URL it scrapes, the script it uses, the kind of scraper it is (whether event or comment), and the venue or user to which it is attached.

Step 5: To create a new scraper, you can click the “Create a new scraper” link in the sidebar.


III. Creating a Comment Scraper

Step 1: On the “Create New Scraper” page, select “Comment Scraper” from the “Kind” drop-down.

Step 2: Copy the URL you want to scrape for comments and paste into the “Calendar URL” field.

Step 3: If it is a website powered by another site we scrape and identically formatted, you can select that scraper’s script from the “Script” drop-down, otherwise leave this blank.

EX. OhMyRockness and Hype Machine.

Step 4: Select the “Comment Kind” from the drop-down:

-> “Upcoming event” if the comments preview events - the user will "vote" on the event and the comment will appear on the event's page

-> “Band” if the comment reviews or lists bands - the comment will appear on each bands' page

Step 5: If the user should be set to automatically follow whatever bands it scrapes or bands associated with the events it scrapes, check “Follow Bands.”

Note: If the user is set to auto-vote in its profile, this will cause the user to vote on whatever events those bands play listed on your metro in the future.

Step 6: Copy and paste the ID of the user associated with the comments in the “Auto-Vote User ID” field.

Step 7: Check “Automatic.”

Step 8: Check “Auto-approve” ONLY if this comment is not actually bringing in comment text but will serve as an automatic way to pull in user votes.

Step 9: Check “Prefs Cross Metro” if this comment scraper is relevant to all other metros (only "Band" comment scrapers), such as a new Pitchfork comment scraper. This will make the user and comments appear on every metro’s site.

Step 10: Check “Manual Interval” if you will be manually running the scrape to approve comments rather than having the scraper run automatically to either pull in votes or comments into the queue.

Step 11: If you selected a script above, click “Create.”

                -OR- If you did not select a script above, check “Needs Script” and continue to the next set of instructions (IV).


IV. Creating a Comment Scraper - Requesting a New Upcoming Events Script

The script request is how the scraper will read all data to pull for your site. We cannot scrape any events listed in a Flash format or in image posters or PDF. All scraped sites must have information presented in a consistent format for the scraper to read and pull the data.


Step 1: If the “Comment Kind” you requested above was “Upcoming Events,” open a new tab or window to an example of the comment on an upcoming event you wish to scrape, either on the main URL you entered above or a link from that URL.

Step 2: Identify the START DATE + VENUE of the event.

Note: The comment scraper will match to an event in your database by the start date and venue, so these MUST be in a clear and consistent location for every comment.

Step 3: Fill in the “Start Date” and “Venue” fields in the script request.

Step 4: If you want the scraper to scrape the text of the comment that previews the event, copy and paste it into the “Comment Text” field.

Step 5: Add any particular instructions for this scraper in the “Notes for Scripter” field.

EXs. Include pagination or only scraping a certain amount of words for the “Comment Text.”

Step 6: Click “Create” and your GM will receive an e-mail within a few days when that scraper script has been made.


V. Creating a Comment Scraper - Requesting a Band Script

The script request is how the scraper will read all data to pull for your site. We cannot scrape any events listed in a Flash format or in image posters or PDF. All scraped sites must have information presented in a consistent format for the scraper to read and pull the data.


Step 1: If the “Comment Kind” you requested above was “Band,” open a new tab or window to an example of the comment you wish to scrape, either on the main URL you listed above or a link from that URL.

Step 2: Ignore the “Start Date” and “Venue” fields in the script request.

Step 3: Identify in the example the band or bands in the comment.

Note: The band scraper will match to a band in the database and the band name MUST be in a clear and consistent location for every comment.

Step 4: In the “Band name” field, enter the name of the band in your chosen example.

Step 5: In the "Comment Text" field, paste the comment text (if any).

Step 6: Add any particular instructions in the “Notes for Scripter” field.

EXs. Include pagination or only scraping a certain amount of words for the “Comment Text.”

Step 7: Click “Create” and your GM will receive an e-mail within a few days when that scraper script has been made. 
Comments