blog :: web development

Using The TwitterSentiments.com API

Posted on 28 Dec 2010 by Andy

Checking out my RSS feeds over the Christmas break, a post from ProgrammableWeb caught my eye - a sentiment analysis API for tweets. Digging a little deeper, a blog post from the authors showed that they have applied libSVM to make a great tool.

I had to try this out

The API is incredibly easy to use - especially if you have used the Twitter API before. TweetSentiments.com essentially augment the JSON returned from a subset of the public Twitter API with sentiment data. So a search for tweets returns 20 tweets, each of which has a sentiment grade and then the whole block has an aggregated count and overall score.

Wines

I often test out SEO techniques using my girlfriend’s Wine Education site so I thought I’d see what twittersentiment.com thought of the most common wine varietals.

I whipped up a short perl script to call the API for the following phrases:

  • Chardonnay
  • Sauvignon blanc
  • Semillon
  • Muscat
  • Pinot grigio
  • Pinot blanc
  • Riesling
  • Gewurztraminer
  • Syrah OR Shiraz
  • Merlot
  • Cabernet sauvignon
  • Malbec
  • Pinot noir
  • Zinfandel
  • Sangiovese
  • Barbera

The results got dumped into a CSV file, which I then imported into a Google Spreadsheet for analysis (spreadsheet here).

The Results

The results could hardly be called scientifically rigorous, but that’s not the primary purpose of the exercise, I just wanted to play around with the twittersentiment API.

The API found twenty tweets for each of my search terms so the small sample size has skewed the results towards a very narrow window. This graph shows that the wine with the highest sentiment score was Merlot, whilst the lowest was Pinot Noir.

Twitter Wine Sentiment Analysis Graph

Interesting choices, in a period traditionally associated with eating roast turkey when I’d have plumped for a fruity white, such as a Chardonnay.

The next graph might add a little towards an explanation - opinion of the Merlot is not as divided as for the other wines:

Twitter Wine Sentiment Analysis Graph

So, what this tells us is that Americans really like Merlot (see the full results for Merlot at tweetsentiments.com).

Other wines, like Cabernet Sauvignon, got a larger number of tweets that were classified as neutral - sometimes incorrectly, sometimes because they used more technical terms to describe the wine. Here are the results for cab sav from tweetsentiments.com.

I used Google Trends to compare the top 5 and bottom 5 ranked wine varieties, and the results did not correlate very well with the tweetsentiments.com results.

That is to be expected - they do measure different things after all.

Conclusion

The tweetsentiments.com API is nicely constructed and easy to use but the results aren’t perfect (especially for this test).

It would certainly be interesting to track sentiment changes over time and a sharp swing towards negative sentiment would make a useful early warning indicator for brands in trouble.

One of the biggest take-aways from this exercise is that it is hard to extract meaningful data from the twitter stream. I have no doubt that there are people doing it, but they won’t be giving their data away via a free API.

1 comments, add yours.

More On Twitter Annotations

Posted on 26 Jun 2010 by Andy

Damon Cortesi (@dacort of Rowfeeder.com and Untitled Startup) was tweeting about the Twitter Annotations Hackfest (I was very jealous that I couldn’t participate):

Trying to come up with cool ideas for Twitter annotations - what do you wish a tweet could do?less than a minute ago via Echofon

After my last blog post on Twitter annotations was very well received, I’ve been thinking about how they might be used in practical applications.

Twitter Attachments

One of the immediate parallels to explore is attachments to emails. Multimedia, like MP3 files, video and pictures are obvious and I’m sure that many, many developers are exploring those avenues. Other forms of email attachment offer some less crowded domains to play around in.

Sharing Microsoft Word or Powerpoint documents is pretty redundant over twitter: lots of tweets link to documents (in HTML) so uploading a document to share it has been done to death. Sharing privately via a public-private key exchange would be very useful though and has similarities with attaching your PGP public key to emails (hat-tip to Ed Borasky for the original idea).

Another common use of email attachments is to exchange contact details and it’s a common pattern in twitter backgrounds (like mine). This thinking lead me to tweet an idea:

@dacort I want to send a vCard (or hCard) with a tweet and easily add it to my email & phone contacts.less than a minute ago via Echofon

Rich twitter clients for smart phones could make great use of this, but for desktop or web-based clients we’d need a standard API for contact management.

Commenting Via Twitter

Another interesting application for Twitter annotations would be to allow users to attach a comment to any URI. Just like blog comments, but remember that in the time of the semantic web anything can be identified by a URI.

Many systems like Disqus and Intense Debate already pick out URL mentions from the Twitter public stream but they are treated much like trackbacks rather than adding to the conversation. Twitter clients (and/or blog plugins) could use metadata with technologies like PubSubHubbub or Salmon to tightly integrate with the conversation back into the blog post and twitterers could enhance their comments with tags, star ratings and more.

Just having one hundred and forty characters would severely restrict blog comments though - the average length of non-spam comments on this blog (HTML stripped) is 281 characters. Many are much longer. Mobile friendly blogs and tumblogs are likely to benefit more from Twitter comment integration.

How about the ability to pull up a stream of tweets about a product you’re thinking of buying, identified by its barcode? Or a restaurant you are stood outside? Hotels, movies, travel destinations and more could really benefit from the realtime semantic web.

There are many sites that present this kind of information already (Yelp, TripAdvisor etc) but they don’t offer the realtime element of twitter - nor the social features.

Here is a great presentation by Joshua Shinavier on the power of semantic twitter annotations.

Searching Twitter Annotations

A hugely important step in unleashing the full potential of twitter annotations will be the need for powerful and comprehensive search.

Remember how annotations are specified by namespace, key and value? They need to be searchable on those foields too but in a sensible manner.

Search must be available on namespace alone or namespace, key and value. I cannot think of any practical application of searching on namespace and key (no value), if you can please leave a comment!

Examples of searching on namespace alone are searching for all tweets annotated with RDF, or all tweets with video.

Whilst search against namespace, key and value could return all tweets with reviews of the Twilight Movie or tweets where people checked into a particular restaurant.

There has been no mention from Twitter as to how annotations will be handled in their Search API, but I am confident that it will be available soon after the full launch of annotations. If twitter don’t provide it themselves, rest assured that someone else will step up.

Annotations Are For Humans

One of the really heartening aspects to emerge from the Annotations Hackfest and the developer documentation is that annotations are being used to enhance the user experience - not simply feed hungry bots.

Twitter have recommended that common attributes for annotations might include a title, URL and image. That means they want users to interact with the meta-data and that’s a good thing!

One of the Hackfest projects even put together a "Rich Tweet Format" including Twitter Style Sheets - CSS for tweets. This might be an awful thing - how long before your twitter stream is as ugly as a teenager’s MySpace page? - but it shows how developers are working to create and share standards that bring more functionality to the user.

2 comments, add yours.

Rowfeeder Produces Great Test Data

Posted on 27 Mar 2010 by Andy

When Damon Cortesi announced the Rowfeeder application a while back, my interest was piqued. I’d been meaning to try out a One Forty application and Rowfeeder looked like something I could make good use of.

OneForty is a directory of twitter applications that allows developers to receive donations or charge for their hard work - kind of like Apple’s App Store for the iPhone.

RowFeeder logo

Rowfeeder is an app that will dump the results of a twitter search into a Google Spreadsheet over a period of 48 hours. It provides a bit more information than the standard Twitter Search API as it includes the number of friends and followers for each tweeter. It would pretty straightforward to hack together a script that does this but since it only costs USD $2.49, I would say that writing your own is a waste of time.

The OneForty Experience

I signed up to oneforty.com and immediately purchased a search for wine as I know there are a good many wine drinkers and wineries that use twitter on a regular basis.

As usual, I created a new email address for the sign up, as I like to track where spam comes from and who sells their email lists. This turned out to be a mistake as the Google Spreadsheet was being shared with this new address, so I quickly created a new Google Account for it.

Payment was painless, with Paypal being used to collect your money.

Rowfeeder started working almost immediately and I could watch the spreadsheet being populated as it was working. After the forty-eight hours were up, I had about 41,000 rows in the document.

The Results

As advertised, I had a spreadsheet with a couple of days worth of tweets mentioning the word "wine". One issue is that the timestamp of each tweet does not contain a timezone indicator (I hate that) but @dacort told me that they are PST. I exported the Google spreadsheet to a CSV file and used it to test a term extractor that I have been working on.

It turned out that my massive investment of $2.49 was well worth it - the data showed up a couple of nasty bugs in my code. With the bugs fixed, I can add another unit test to my suite.

There is some bad news which has nothing to do with Rowfeeder - a lot of twitter is just crap. Those of us with carefully curated streams of interesting people will be surprised at the gibberish that people tweet.

Got dat cleanup spell &dis shit need 2 wear off so I can wind dwn. Let me try sme wine.

Ummmm, what?

Any developers hoping to leverage the firehose to extract anything meaningful are going to need to build some kind of idiot filter.


Like this? You might also like these great posts:

2 comments, add yours.

Securing User Generated Content

Posted on 13 Nov 2009 by Andy

I noticed a post on Dom’s blog asking for suggestions on how to prevent exploits in user submitted HTML using PHP and thought that I’d post an in-depth response regarding the security practices that should be followed when designing and building a site that accepts UGC.

What Is UGC?

User generated content is at the heart of most web 2.0 sites, from Facebook to Delicious via Digg, Flickr and Twitter. All these sites generate loads of traffic from data that their userbase submits for free - which sounds like a great deal, until a malicious user discovers an exploit and suddenly the site is awash with viagra spam, malware and popups and then legitimate users soon leave.

UGC simply means anything that your users enter which is later displayed on your website.

This includes usernames, comments, email address, blog entries (if you run a blogging platform), tweets etc. Note that this data doesn’t have to come from a form served by your server - JSON, XML, RSS feeds and thirdy party adverts can all contain undesirable markup.

Barbed wire

Types Of Exploits found In UGC

Exploits can take many forms but two of the most common are cross site scripting (XSS) and SQL injection, both of which can and should be prevented in server-side code.

Note that it is no use to simply rely on javascript to validate your forms in the user’s browser - not all users have javascript enabled and malicious users can bypass the browser altogether. Use server-side validation.

A Common XSS Attack

XSS attacks (almost) always involve the bad guys adding some javascript to your web application that will be executed on your users’ web browsers. This sort of vulnerability can be very damaging if your web application has a password protected area for users.

An example of an XSS exploit could be found on a social website (with features such as those found on Facebook or Bebo). A bad guy creates a profile and lists his homepage as:

javascript:alert("You have been infected with a virus. Visit www.crappyav.info to remove it!"); return false;

Then the bad guy runs a script that befriends every user he can find. Many of them will click on his homepage URL to see what he is all about. When they do, they’ll see an alert box like this:

More inventive attackers could use the exploit to automate friend messages, send spam, show viagra adverts or even scrape sensitive data. All of which are very damaging for the site’s reputation and mean that the owner will be busy cleaning up for a long time.

Typical SQL Injection Attacks

Most UGC is stored in a relational database such as MySql. SQL injection attacks exploit lazy (or naiive) programmers that build up strings of SQL to send to the database containing the raw data supplied, such as this example that searches for a user with the supplied name:

$sql = "SELECT * FROM users WHERE username = '" . $_POST['username'] . "'";

This will create a simple SQL select statement, like this (if I supply the username andy):

SELECT * FROM users WHERE username = 'andy'

A piece of working SQL, but it is vulnerable. An attacker might enter a username like '; DELETE FROM users; --. Which would result in the following SQL being created:

SELECT * FROM users WHERE username = ''; DELETE FROM users; --';

Readers familiar with SQL should now be experiencing a deep sense of dread (and possibly awe at the inventiveness of some people). The SQL code would delete every single entry from the users table if were executed, even though the author thought he was writing a read-only SELECT statement. Scary stuff!

Preventing SQL Injection Attacks

Preventing attacks like the SQL injection outlined above is quite straightforward, if you use a feature common almost all RDBMS clients - use prepared statements with bind variables.

Using bind variables would mean that the SQL would change to:

SELECT * FROM users WHERE username = ?

NB: The actual syntax of bind variables can vary between RDBMS platforms

So, whatever strings you provide to the database will behave just as you expect - they cannot be interpreted as SQL. There are significant performance benefits too.

Guarding Against Cross Site Scripting Attacks

There is no simple action to prevent XSS attacks - you must analyse all possible user inputs and determine how they must be sanitised.

Regular expressions can help determine if a user supplies invalid data but don’t just rely on regexps, specify minimum and maximum lengths too. Determine what format the user-supplied data will take is vital to perform further content specific checks.

Plain Text Input

Usernames and status updates usually take the form of plain text, but you will need to only allow a restricted character set (don’t forget that hackers can send a string of backspace or escape characters).

Other issues to consider for plain text input are:

  • Is the data case sensitive?
  • Internationalisation - will you support accented characters, or even non-latin characters sets?
  • Will a username be used to create a URL, as Twitter does?
  • Is the data displayed publicly? If so, it would be best to prevent people from using their email address as their username.

Remember to output your strings with correct HTML encoding. See PHP’s htmlentities(), Perl’s CGI module, the Ruby cgi escape etc.

URL Input

It is common for social sites and blogs to allow members or commenters to link to their homepage. In this case, you probably want to restrict the protocol of the URL to just HTTP and HTTPS (definitely not javascript:).

Consider very carefully whether you should nofollow the links - if you don’t endorse the page, or it might be an affiliate link then you should.

You also need to have a blacklist of disallowed domains which should include most common URL shorteners. URL shorteners can be abused (get a list of them here).

Other domains are often targets for spammers, so make sure that your list can be easily edited. You might want to consider using regular expressions here too, so that you could (for example) block *.blogspot.com.

Another check that your legitimate users will find reassuring, is to check the URL against Google’s safe browsing API.

File or Image Upload

If you allow users to upload their own avatars to your site you need to check them thoroughly. Only allow files in known formats, and check the MIME Type not the file extension.

While not strictly an XSS issue, multimedia files have been hacked before to cause buffer overruns so ensure that you keep your server packages up to date with the latest patches.

Ensure that images are within specified size parameters or resize them on the server (to prevent page widening).

HTML Input

Sanitising HTML input is a more complex process than other types of input, since HTML often contains unclosed tags, implicit attributes and general hackery.

Most HTML sanitisation methods involve building a parse tree from the input and traversing the tree to discard any elements and attributes that are deemed undesirable. Define a whitelist of allowed element/atttribute combinations NOT a blacklist.

Take extra care in preventing attributes like ONCLICK and ONMOUSEOVER in all HTML elements. Beware of the STYLE attribute too.

For most HTML user generated content, the only element that should be allowed an attribute is the <A> (anchor) element, and the only attribute that it is allowed is HREF (in some circumstances you might allow <IMG> with SRC and ALT). Take care with the URLs supplied to HREF and SRC attributes, see the section above on URL input for validation recommendations.

Look for HTML sanitizers for your preferred language - lots of other coders will have solved this problem before.

Checking for spam is also useful when accepting user supplied HTML content. Automattic’s Akismet has a great API and good third-party library support to help you out with this.

RSS Import

If you allow users import an RSS feed from an untrusted domain, you need to do more that just validate it against the XML Schema Definition. You need to treat titles as plain text, descriptions as HTML and links as URLs as discussed elsewhere in this post.

Other Precautions

Aside from the XSS and SQL injection issues, there are a number of other sensible precautions that web application developers can take to minimise the impact off malicious users. You might think that these won’t affect you soon, but my advice is to get them in place before you need them. If your site comes under sustained attack, you’ll have plenty on your plate without needing to code and test some defenses.

IP Blacklist

Recording and testing against a set of IP addresses that are banned from your application is an excellent precaution and will deter many script kiddies and wannabe blackhats. Place the check early in your application code to reduce server load.

More sophisticated systems could use a variety of inputs (length of membership, country, IP address, number of previous posts, etc) together with a binary classifier to determine whether a user action is undesirable.

Rate Limiting

Scripts can type a lot faster than humans, so any user posting updates many times per second is likely to have an evil intent. You can modify your session management code to slow and eventually lock out such abuse.

DDOS Protection

Talk to your hosting company about this (there are specialist consultants that can help you with this too).

Backups

This really goes without saying - get your backup strategy sorted out now and test it regularly! Then test it again.

Sensible, Helpful Error Messages

Remember that the vast majority of your users will have good intentions. They might mis-type an email or not understand exactly what is a URL, so provide useful feedack when displaying an error message.

Use simple language and be very clear about just what is and isn’t allowed in each field.

This Is Lots Of Work

Yes it is and it’s worth it. Taking solid, sensible precautions like these make the difference between throwing something together and engineering.


You might also like:


Creative Commons licensed photo by Tancread.

0 comments, add yours.

Get The MMMeeja Toolbar For Your Browser

Posted on 17 Oct 2009 by Andy

I’ve been experimenting with the Conduit toolbar system and come up with what I think is a useful offering (and I’ve been impressed with Conduit’s product in the process).

The MMMeeja toolbar works on Internet Explorer, Mozilla Firefox and Safari and on Windows, Mac or Linux. Like most browser toolbars, it adds a horizontal menu of extra functionality across the top of your browser pane.

The toolbar after installation

Functionality Offered By The Toolbar

All Conduit toolbars come with a Google search box built-in, as the Conduit company make money via an advertising revenue sharing deal with Google, but the rest of the toolbar is completely customisable by the creator. The tools that I chose to add are:

  • A feed of the latest posts from this blog (got to have some self-promotion!)
  • A ShareThis button, allowing you to quickly and easily share pages with a huge variety of social networks. Supported networks include:
    • Digg
    • Delicious.com
    • Reddit
    • Yahoo Buzz
    • Stumbleupon
    • Facebook
    • Twitter
    • And loads more
  • A gadgets menu that allows you to jot down quick notes as you surf. You can also customise your toolbar installation by adding more gadgets (there is a large range to choose from.
  • A domainers menu that provides quick and easy access to whois, DNS, ping and keyword lookups.
  • An SEO toolbar that can display the Alexa data, Google pagerank, Google/Yahoo/Bing indexed pages, backlinks etc for the current page.

Hopefully, readers of this blog will find the toolbar useful. Let me know via the comments if you have any suggestions for more tools.

Download The Toolbar

You can download our toolbar by clicking on this button:

FREE DOWNLOAD

Make Your Own Toolbar With Conduit

I found the whole process to be pretty easy and intuitive, although I haven’t tried building any custom applications or gadgets yet (again, any suggestions are welcome).

If you need a little help getting started, try searching YouTube as there are lots of screencasts and howto videos on there. Here is one of the better ones:

1 comments, add yours.

Use Robots.txt To Prevent Yahoo Pipes & YQL From Scraping Your Site

Posted on 10 Jul 2009 by Andy

Yahoo’s Open Search strategy is great news for mashup developers but it could also be used by scrapers to grab your content and republish it on their own sites, but thankfully Yahoo play by the rules and honour the robots exclusion protocol. This article will help you to block the services that can be used to scrape or remix your content.

None of the robots.txt changes will prevent die-hard blackhats, though, because they are unlikely to be using Yahoo tools for their nefarious activities. Most blackhats will have their own toolkits so you would need to go through your server logs to look for patterns of IP address, user agent, cookie use etc to block them effectively.

It should be noted that I fully support Yahoo’s efforts to open up the search results and that I only recommend blocking their crawlers if you’re specifically having problems with content theft. I haven’t implemented any of these techniques on this blog, so you can remix away to your heart’s content.

I do take content theft and scraping seriously though, so I check for scrapers regularly using tools like CopyScape and FairShare to check for pliagiarism. I report all content theft to Google via its webmaster tools (which usually delists the site and stops the problem) and if problems persist I’ll contact the infringer’s ISP to get their site shut down. Luckily, I’ve not (yet) had to take any further action - probably a sign of the blog’s unpopularity :-)

Yahoo Pipes

One of the most common ways that your content gets scraped is via your blog’s RSS feed and Yahoo Pipes is a handy tool for tweaking and mashing RSS feeds. If you are publishing your feed through Feedburner, blocking Yahoo Pipes is easy, just follow these steps:

  1. Log in to your Feedburner account and choose the feed that you are concerned about
  2. Click the “publicize” tab and then the “NoIndex” service
  3. You can then choose to block Yahoo Pipes by clicking the second check box - don’t forget to activate the service
Block Yahoo Pipes from Feedburner

If you don’t use a service like Feedburner but serve your RSS feed yourself, you need to block using either server configuration changes or robots.txt. From the FAQ, we can tell that the Yahoo Pipes user agent is “Yahoo Pipes 1.0”, so add the following to your robots.txt file:

User-agent: Yahoo Pipes 1.0
Disallow: /

YQL

I found it quite hard to find any information on YQL’s user agent string and ended up asking on Twitter. @jonathantrevor provided the answer:

YQL uses "Yahoo Pipes 2.0 " for fetching robots to see if its allowed, and then uses mozilla for the content

My tests confirmed this as correct.

Yahoo BOSS

Yahoo BOSS allows you to create your own search engines using Yahoo’s data (like my recipe search engine).

As it uses Yahoo’s search index, the only way to prevent access to your content through BOSS is to block Yahoo’s search crawler, Slurp, which is probably not what you want.

There are rumors of a closer integration of BOSS to some of the other services mentioned here, so if/when that happens the other blocking methods given here should apply.

Search Monkey Data Services

Search Monkey is a technology that allows developers to create widgets to embed into the Yahoo search results (which I mentioned before). It behaves differently to YQL and Pipes in that it does not use a web crawler, so you cannot use a robots.txt entry to deny access. Instead, you can modify your web server configuration to deny access to the user agent.

Search Monkey user agent strings are quite distinctive, so you can alter your httpd.conf or even .htaccess files (if your host allows) to deny access with the following code:

SetEnvIfNoCase User-Agent "Yahoo! SearchMonkey 1.0" noMonkey
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=noMonkey
</Limit>

This code was taken from the SearchMonkey user guide, which also lists an email address that you can contact to have your pages blocked at the source.

Putting It All Together

If you want to block all of these services, you’ll need to add the following to your robots.txt file:

User-agent: Yahoo Pipes 1.0
Disallow: /
User-agent: Yahoo Pipes 2.0
Disallow: /

Add this code to your server configuration too:

SetEnvIfNoCase User-Agent "Yahoo! SearchMonkey 1.0" noMonkey
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=noMonkey
</Limit>

There you have it, a few configuration changes to block opportunist scrapers.


Related Posts

4 comments, add yours.

How to Make A Recipe Search With Yahoo BOSS And PHP

Posted on 31 May 2009 by Andy

Recipe search engines are all then rage these days, with a new example cropping up every week. This in-depth tutorial will show you how to make your own and provides working code for you to download.

We’ll be using Yahoo’s BOSS to do the hard work for us, so the first thing you need to do is sign up for a Yahoo account, if you don’t have one already.

Yahoo BOSS is a great service that allows you to use their search engine and brand the results however you like. It is completely free for most uses, with fees only applying if you use it more than 10,000 times per day.

Get A Yahoo BOSS Application ID

This is pretty straightforward. Fill in the form here and make a note of the long, hexadecimal App ID that gets spat out afterwards. Here is an example of how to fill out the form:

Yahoo BOSS application form

We’ll be using BOSS to provide search results across a range of recipe sites, so we don’t need to do anything difficult (like creating a data service). It’s just like entering a site: operator into the normal Yahoo search.

I’ve chosen these sites to search, but you can easily modify the script to supply your own choices:

Build A Search Front-End

Next, we need a web front-end to accept user input and call the BOSS service and then display the results. There’s some ready made PHP code for you to download here which we’ll go through to get a thorough understanding of just what is going on.

The code performs the following actions:

  1. Display a header and search form
  2. Check if the form has been filled and submitted, if it has:
    1. Send the query to our BOSS service
    2. Parse the reply from BOSS
    3. If there were some search results
      1. Display the search results
    4. Else
      1. Print “No results found”
  3. That’s it!

Note: I’m going to assume that the reader is familiar with basic HTML and PHP code. If you’re new to this sort of stuff, there are some great tutorials here and here.

The results are output pretty much unstyled so you should put your HTML skills to work on prettifying the presentation.

So, on to the code...

Display A Header & Search Form

The header will extract and sanitise any input variables, then output a HTML header.

Note: You need to replace YOUR_APP_ID with the Yahoo BOSS application ID you got at the beginning of this tutorial.

The search form is quite straight forward, see this tutorial on HTML forms if you’re unsure of what is going on here.

Was The Form Filled Previously?

We just need to check for the presence of the search terms:

Send The Query To BOSS

It is a simple HTTP GET so PHP’s file_get_contents() will work (unless your host has disabled it, in which case read this for a work around).

The script builds a URL using our application ID, list of sites to search and the query string. This URL returns the results in JSON, which is easy to parse into a PHP structure.

Did We Get Some Results?

Error checking is important - we don’t want to confuse users if there was a problem - so we’ll check the output carefully. We must also check whether any results were found.

Loop Through The Results And Display Them

This is pretty easy but I’ve not styled the output in any way. There are CSS classes in the output though, so you could just add your own external stylesheet to the header.

Try It Out And Download The Script

You can see my script in action here and the source is available for download here.

When you try it, you’ll see that it is very rough but it should provide a reasonable foundation for you to build something upon. Here are some suggestions for improvement:

  • Style everything - add a CSS stylesheet and make the output pretty
  • Add some analytics - find out what your users are searching for
  • Show recent searches - if you want to dive into the code, you could show the last ten searches by recording them in a database
  • Display some adverts - make a bit of cash
  • Add an OpenSearch plug-in - maybe even break out the AJAX and provide search suggestions

I hope you find this useful, both as inspiration for building a recipe search and an introduction to the power of Yahoo BOSS. If you have any questions or suggestions then please leave a comment and don’t forget to subscribe to the feed for more articles like this.


Creative Commons licensed picture by silver marquis.

3 comments, add yours.

Yahoo Search Monkey App For Twitter Profiles

Posted on 05 May 2009 by Andy

I’ve been meaning to get to grips with Yahoo’s Search Monkey technology for a while but put it off because the documentation wasn’t great for beginners. Then I discovered this excellent resource on a Google Code wiki and had a few hours spare, so I got stuck in.

The result is a Search Monkey application that enhances the Yahoo search results with a Twitter user’s profile information when their profile appears as a result of your search:

Screen grab of the Search Monkey application in action

What Is Yahoo Search Monkey?

Search Monkey is Yahoo’s technology that allows developers and website owners to add widgets to a set of search results. Yahoo have really stolen a march on Google with this enhancement and hopefully, their open, developer-oriented approach will give them a competitive edge over the Big G.

Developers cannot force their Search Monkey applications on users, instead a user chooses to enhance his/her results by choosing applications from a gallery of available enhancements or by going to the application’s homepage. If the user is logged in Yahoo, they can click the add button to enhance their Yahoo SERPs:

Screen grab of the Search Monkey application in action

SEO Aside: User choice in the enhanced SERPs is a pretty powerful differentiation from Google’s offering. A heavy social media user would choose an entirely different set of enhancements to an online shopper.

Building A Search Monkey App

There are two aspects to every Search Monkey app - the data and the presentation.

The data aspect provides the extra information that is used to enhance the SERPs. This can be gleaned by scraping the page or through a third-party API. For my application, I chose to scrape the Twitter profile page for the user’s:

  • Display name
  • Avatar
  • Follower count
  • Following count

Yahoo’s page scraping is pretty sweet - the page gets put through HTML Tidy to fix any validation issues and then it runs an XSL Transform to grab the data you want.

XSLT is not the easiest technology to understand, but most pages can be scraped with very simple transforms. One aspect that threw me at first, is that HTML Tidy makes all element tags uppercase, so make sure your XSLT uses capitals in the XPath declarations.

Want To Try It Out?

Trying out my application is easy, just follow these instructions:

  1. Log in to your Yahoo account (or sign up if you don’t have one yet)
  2. Head over to the application’s homepage, by clicking this link
  3. Click the “Add” to add the app to your search results
  4. Test it out by searching Yahoo for a Twitter username and look for the little bird icon below the Twitter profile page in the search results
  5. Click the down arrow to show extra information

I hope you like it. Let me know what you think by leaving a comment below, or if you have an idea for a Search Monkey application, leave a comment and maybe I’ll code it up for you.

0 comments, add yours.

How To Export An RSS Feed From Google Spreadsheets

Posted on 25 Apr 2009 by Andy

I’m a huge fan of RSS feeds and often use them to wire together simple mashups or power more complex web programs. Most online applications need some configuration data, such as a list of data sources, users, locations or timestamps and I don’t like to hard-code these into my programs. It’s much better to use a datastore for configuration information like this.

RSS

When building a mashup of data and services stored in the cloud, the configuration data should be stored (and edited) online too. You can collaborate with others and use delicious.com as database of URLs, as described by Jon Udell, but if you want a nice online form for your configuration data, Google Spreadsheets is the way to go.

It is easy to get data into the spreadsheet by publishing a form, but how to get an RSS feed out of the Google Spreadsheet? It’s not obvious, but Google Documents provides this functionality too. Follow these instructions to publish your spreadsheet as an RSS feed:

  1. Create your spreadsheet and save it, as normal
  2. Click the big blue “Share” button and choose “Publish as a web page”
  3. After publishing, the popup window will show a “Subscribe” link. This is the URL of your RSS feed
  4. Tick the option to automatically republish as changes are made. There is also an option to choose a specific sheet from your document
Popup window showing the RSS link

So there we have it. You can create a nice form to populate a Google Spreadsheet with data, then publish an RSS feed and use it in your mashups.

If you like RSS feeds, don’t forget to sign up to the MMMeeja RSS feed.


Creative Commons licensed photo by lumaxart.

2 comments, add yours.

Using A JSON Callback To Defeat The Same Origin Policy In AJAX

Posted on 12 Mar 2009 by Andy

So you’re writing a cool AJAXified mashup, grabbing data from various APIs and munging it together in the browser? If you’re seeing this Firefox error in Firebug, you want to read the rest of the article:

uncaught exception: Access to restricted URI denied (NS_ERROR_DOM_BAD_URI)

You’ll see similar errors in Opera and Safari, whilst Internet Explorer will happily service the request.

What Causes NS_ERROR_DOM_BAD_URI?

The exception gets raised when you send a HTTP GET requested to a different domain via an XMLHttpRequest object.

If you load some javascript into your page from http://www.mydomain.com/code.js you can only make AJAX calls to URLs on the server at www.mydomain.com. Yes, subdomains count as different domains to!

This restriction is called the Same Origin Policy and it’s generally a very good thingTM as it makes our browsers much more secure. It does cause headaches for those of us working with JSON over XMLHttpRequest but don’t worry, help is at hand.

How To Fix NS_ERROR_DOM_BAD_URI?

You can install a proxy at www.mydomain.com to get the data and send it on to your page but this is an incredibly bad idea:

  • The security implications are manifold
  • It slows down data load
  • It increases server bandwidth

A much better solution is to programmatically add a new <script> tag into the page’s header to load your data.

You can do this with a bit of code like this:


var script = document.createElement('script');
script.setAttribute('src', 'http://www.anotherdomain/api/stuff.json');
document.getElementsByTagName('head')[0].appendChild(script); 

Or you could use jQuery’s JSONP to do the hard work for you.

In both cases you need to supply a javascript callback function that will be executed when the request completes and the data is returned. This will only work for JSON APIs that include a callback parameter (in some cases it’s called json_callback). Thankfully there are a lot of APIs that do this.

JSON APIs That Support Callbacks

Pretty much every Yahoo API supports the callback parameter (that includes Flickr and del.icio.us too).

Google’s Javascript APIs tend to use wrappers to work around the problem (and constrain how you manipluate the results) so there’s little for you to do here. One exception to this is the YouTube JSON API, which dies include a callback.

Music mashup authors will be pleased to note that Last.fm definitely provides callback support.

There are lots more APIs that support JSON callbacks, have a loook through the excellent documentation at Programmable Web to get some ideas.

API Providers, Please Support Callbacks

JSON callbacks are fast becoming standard practice, so if you’re writing an API that supports JSON, I recommend adding callback support. It’s not a difficult thing to add but it can make a massive difference to mashup authors.


Creative Common licensed photo by Dmitry Baranovskiy.

2 comments, add yours.

 

Sitemap

Copyright © 2006-2009 MMMeeja Pty. Ltd. All rights reserved.