Scrapers & Queues FAQ

Scrapers

1. Q: I requested a special script for a venue calendar scraper, but the venue has a Songkick scraper. Can I delete the Songkick scraper?

A: Yes, you want to delete extra scrapers and keep those that are the best source of information.  Otherwise, they just bring in unnecessary dupes. The only time more than one scraper is useful is if it brings in additional events but is not comprehensive on its own.  


2. Q: What kind of calendars can we not request scraper scripts for?

A: Any calendar that lists events info in:

  • An inconsistent format - the title, date, time and bands are not in the same place or order for each event

  • Images/posters

  • Flash

  • Twitter

  • Any site that requires a password (for Facebook password protected pages, we created a special bookmarklet to automatically copy Facebook event pages into Admin - see instructions below)


You can set up a manual scraper for these, the purpose of which is to remind you regularly to check those sources and enter the events directly.


3. Q: How do you set up the Facebook bookmarklet to scrape Facebook events?

   A: In GOOGLE CHROME, follow these steps:


Step1: Go to: Bookmarks->Bookmark Manager

Step 2: Right click in the right column and click add page

Step 3: Give it a name in the left column and paste in this entire URL in the right column:


javascript:(function(){   var segments = window.location.href.split("/");   window.open("https://graph.facebook.com/oauth/authorize?type=user_agent&client_id=408165152569858&redirect_uri=http://admin.dostuffmedia.com/events/new/?facebook_event_id=" + segments[segments.length - 2]); }());


Step 4:  Have your Admin open in one tab or window and open another tab to the Event page in Facebook you want to add.

Step 5: On the Facebook event page, click the icon you just created for the bookmarklet.

Step 6: Allow the Dostuff Admin permission to access your info.

Step 7: You will be on the create event page with title, description, venue, event start, event end date pre-populated.

 

*Note: You only need to add the bookmarklet and give it permission the first time you set it up.


4. Q: The scraper for a venue is not pulling in detailed information.  How can I tweak it?

A: Check the site being scraped directly.  It's likely that the scraper is pulling in all available information.  That site could still be in developmental stages and will add more information later, or you could find a better immediate source for scraping that venue's events.

    If there is a problem with the information the scraper is pulling in, check out the article on Scraper Management to request a fix.


5. Q: For some reason, this scraper started scraping in events from outside of our city.  Are wires getting crossed in the database?  How can I fix it?

   A: Not likely that wires are getting crossed.  It's more likely that the scraper is working correctly and the page it is scraping has those out-of-town events listed.  

Step 1: Check the json of the past scrape by going to "View past scrapes" in the sidebar of the scraper page.

Step 2: Search for the out-of-town event in the list of scraped events.  See it?

Step 3: Cross-check by going directly to the scraped URL and searching for the event.

Causes: If the scraper is scraping search results, it may have expanded the results beyond the search criteria

-OR- The scraped site displays events by the promoter outside of your city.

Step 4: Request a scraper fix to specify/limit what is scraped

-OR- Change the URL so that it is only scraping relevant events.


6. Q: I noticed that a scraper keeps creating a new venue when its scraping in events even though we have an official venue.  Why does this keep happening/what can I do to get rid of these dupe venues?

   A: Step 1: Merge the dupe venue into the official venue.

Step 2: Merge any duplicate events this may have created for that official venue.

Step 3: Check that the name of the venue being newly created exists as an alias for the official venue (because the venue names are likely slightly different) by clicking "Add aliases to this venue" in the sidebar of the venue page.

Step 4: Add the different name of that venue if you don't see it listed.


7. Q: I submitted a scraper script request days ago, but it hasn't been made.  What's going on?

   A: You can put in a "Needs Fix" comment to nicely remind the script writer that it needs to be made.  If he has trouble making a script for a particular venue calendar, he will usually give a reason in the "Needs Fix" field.


8. Q: What is different about creating TicketWeb and TicketFly scrapers?

   A: To create a new scraper from Ticketweb or Ticketfly, it is a slightly different process. We already have scripts in place for each of these, so there is no need to check "Needs Script," which means that it will be able to run and pull in data immediately after being created.

Because we have access to TicketWeb and TicketFly's API, we don't simply copy and paste the URL of the ticket listing for the venue.  

Step 1: On http://admin.dostuffmedia.com/scrapers/new, copy and paste this unique code for either TicketWeb or TicketFly into the Calendar URL field of the new scraper:

- TicketWeb: http://api.ticketweb.com/snl/VenueAPI.action?key=OnTLfy5CJ7XX1mLwynRp&version=13&venueId=XXXXX&method=json

- TicketFly: http://www.ticketfly.com/api/events/upcoming.json?venueId=XXXXX

Step 2: Go to the venue page on TicketWeb or TicketFly to find the venue ID for the venue you are trying to scrape.

Step 3: Replace the XXXXX highlighted in red in the above code with this venue ID.

Step 4: Select the script Ticketfly API or Ticketweb API

Step 5: Enter the Venue

Step 6: Enter the Category

Step 7: Check "Automatic"

Step 8: Click "Create"


9. Q: Do other scripts besides those for which we have API access have special URLs?

     A: Nope, you can use the URL of the venue listing on Oh My Rockness, Ticketmaster, or whatever the case may be.


10. Q: Can a scraper be assigned more than one category?

A: Sort of! You can create separate scrapers assigned to the same venue if you can filter the venue calendar by category to assign the different categories to the different scrapers. If it is not possible to filter the venue calendar by category, then it is best to choose the most common category and manually change in the pending queue.


11. Q: Why are there events being pulled into a venue without there being a scraper?

A: This can happen if the scraper has been deleted and the events were pulled in the past. Check one of the event pages in Admin to  identify the scraper. If the "Scraped by" field in the right column shows "none," that indicates the scraper was deleted. If there is no scraper, a new scraper should be created so that your content is constantly up to date!


Dupes

1. Q: There are multiple dupes of the same event in the queue.  What's happening? Which do I choose?

A: If you see the same approved event in pink multiple times in the queue, this is simply because the dupes queue has not been cleared in the time in which the scraper has run multiple times.  You can update one of the dupe sets and then clear the queue when you have taken care of all dupes of whatever extra sets remain.  Once a dupe has been addressed by updating the old event or approving or deleting the new event, the new event will not come back into the queue.

*To avoid multiple dupe sets, you simply want to be clearing your queues regularly ;)  


2. Q: I have a dupe in the queue and the approved pink event looks like it matches the new white event exactly. Why is it a dupe?

   A: There is likely new information coming into the details or ticket info that simply is not shown.


3. Q: There is a c3 scraper as well as a designated venue scraper that are causing some duplicates. Which information is correct?

A: Always give preference to event information from the c3 scraper. In the duplicate queue, update the old event with the c3 scraped event.  Not only is the c3 scraped information accurate, but also it enables DoStuff to track data.  Secondarily, this applies to Ticketfly and Ticketweb scrapers and those with external event IDs that bring in better information.



Comments