- Semantic Web
- Size matters
- Further Reading
Twitter’s Chirp developer conference had two big announcements:
- They are buying the Tweetie iPhone application
- The addition of meta data payloads to tweets - called annotations
The first announcement was greated with dismay by many twitter application developers -
Oh noes! They is eatin our lunches! - but the more far-sighted commentators focussed on the possibilities that annotations will bring.
What Is An Annotation?
Simply, an annotation is some extra data that can be associated with each tweet - data that followers might never see.
Each annotation has three fields: a namespace, key and value - and each tweet can have several annotations.
The namespace explains what the annotation describes. It could be a book, a meal, a place or pretty much anything.
The key and value provide data within the context of the namespace - the author of the book, price of the meal, etc.
It will be up to twitter clients to create and display annotations as they see fit. So all those Chirp attendees that moaned about Tweetie becoming the
official iPhone app for Twitter should stop worrying about dealing with just 140 characters because Twitter just gave them a huge new sandbox to play in.
Namespaces Are The Key
Namespaces are a means of describing the context of an annotation.
Early indications are that Twitter will allow any text as the namespace value making some people call for a centralised authority for namespace registration. This would be counter-productive for developers and would sacrifice flexibility for consensus as to the meaning of the namespace.
Far better to take a leaf out of the semantic web’s playbook and have the namespace describe itself. Make your namespace a URL that points to an XML document describing the data (keys and values) that can exist within the namespace.
A big advantage of this approach is that schemas can be extended and combined with ease - and without having to ask permission from a central authority.
Semantic Web Annotations
Much of the semantic web is presented as RDF triples, which can be combined to describe almost anything.
A triple has a subject, predicate and an object. For example:
andymurd checked in at Rundle Mall, Adelaide
|predicate||checked in at|
|object||Rundle Mall, Adelaide|
Each of the subject, predicate and object can be represented as a URI - in the above example, the subject might be http://foursquare.com/user/andymurd.
But twitter will only provide key-value pairs, not triples so we must fit our RDF ontologies into this model:
Many linked data tweeps are justifiably excited about the potential of embedding an RDF payload in tweets, and I think they are right!
Keep It Short
Twitter will be limiting the size of annotations (intially just 512 bytes) so we need to keep our meta-data succinct.
A lot of URLs for RDF ontologies are quite long, as they include versioning information, so I expect that many developers will make use of URL shorteners for annotations too.
It is also likely that standard will emerge to abstract meta-data into an external document in order to overcome the size limitations. Some kind of "See Also..." for annotations. This would also allow editing of annotations (something which Twitter doesn’t plan to provide) but will also introduce security implications for application developers.
Semantic data is produced for machines - typically search engine indexers or graph query tools, and now we can add twitter bots and clients to that list.
Open, discoverable standards are important for communication between these consumers and RDF has a broad base of support. Google already does a good job of indexing RDF and microformats and using the data to enhance its ten blue links with relevant information about product reviews, document authors and more. I really want to see that integrated with their realtime search results.
Yahoo technologies like YQL and BOSS can facilitate search mashups that make use of RDF too. Hopefully we’ll see some twitter SearchMonkey plugins shortly after annotations are released.
Twitter adds more data into the mix - tweets have authors, timestamps, replies, locations - as this excellent tweet infographic shows. One issue for data consumers to tackle is to decide whether these are relevant to the annotation.
We’re about to enter an era of much richer twitter clients. They will be capable of displaying video, photos, maps, playing mp3s and much more.
Developers will need to consider which annotation namespaces are deserving of being displayed to their users. Certainly some equivalent of the media RSS standard would be a prime candidate.
Other namespaces will gain authority as de facto standards with developer support and we should be looking to existing web meta-data formats to predict which will be implemented in twitter clients first. Microformats like hCard and hReview are an obvious first choice but new ontologies will be created to exploit the real-time nature of twitter.
How about a standard for location based services (Gowalla, FourSquare, BrightKite, et al)?
Pluggable twitter clients (like Seesmic) will become more common and a supplemental developer eco-system will emerge for third party plug-ins that manipulate annotations. Maybe we will eventually see a standard for twitter client plug-ins.
I would like to see web-based twitter clients (maybe even twitter.com) publishing RDFa (HTML & RDF mixed together) where the annotations are appropriate. It would also be great to see semantic data mixed into Google’s realtime search results.
Of course, spammers will try to exploit any security loopholes in a twitter client’s annotation handling, so annotations published on the web will need to be sanitised like any other user generated content.
A very big job for twitter application developers will be building user interfaces to create annotation data. The semantic web is lacking a simple UI that makes it easy for everyone to create linkable data.
I don’t believe it is practical to automatically derive accurate semantic data from just 140 characters of free text that makes a typical (manually created) tweet. However, many websites integrate with twitter already (
@andymurd favorited a video on YouTube etc.) and they will be well-placed to automatically add annotations to tweets. Also RDF/SPARQL equivalents of twitterfeed and tweetmeme will emerge.
My hope is that application developers will rise to the challenge of providing simple user interfaces that allow everyone to easily create linked data and share it through twitter. All the semantic web authoring tools I’ve tried have been complex, unwieldy things that need in-depth technical knowledge to use effectively.
I think that the twitter developer community can change all that by focussing on the user experience.
Lots of people have been discussing the potential of annotations:
There’s a Google Group with some good ideas.
There are also quite a few blog posts:
- Doug Petkanics talks about sharing an API virally
- Matthew Markus argues for a walled garden of namespaces
- Venturebeat comes up with some ideas
- Marshall Kirkpatrick came up with a thoughtful post
Several initiatives have attempted to utilise twitter messages for transmitting semantic data. Every twitter user is aware of hashtags but interested readers should also check out what RoboCrunch, SemanticTwitter and TwitterFormats have been up to.
These are exciting times for twitter developers and semantic web proponents but there will be some big challenges ahead:
- How do we promote open, extensible namespaces?
- How are spammers likely to exploit annotations?
- How can we get users to love, create and use annotations?
- We’ll need an icon to indicate that a tweet has annotations!
All these challenges must be solved whilst remembering that twitter is a tool for humans. We must add value through annotations, value that makes people want to use the new breed of rich twitter clients that leverage this technolgy.
These problems are not beyond us and I believe that twitter could provide the impetus to make the semantic web a part of our daily lives.