MMMeeja

blog :: semantic

My Prediction for 2009

Posted on 29 Dec 2008 by Andy

Every good blogger must stick their neck out and guess at what next year will hold. Never one to buck a trend or miss a bandwagon, this is my prediction...

Are you ready? Here it is:

We’ll get closer to the Star Trek ideal... semantic search

You know what I mean: that perfect example of human-computer interaction: “Computer, where is Commander Riker?”, “Commander Riker is on the holodeck on level four, sobbing uncontrollably”.

Ask a (simple) question, get an answer. We’re close now but we’ll be a lot closer by the end of 2009 because next year will be the year that semantic web applications grow beyond acaademic exercises and are finally usable by real human beings. The least disruptive application, and therefore the first that will be in common use, is search.

Semantic Search

Search is not disruptive because the switch from text/graph analysis algorithms (that we use now) to semantic systems are invisible - users ask the same questions but they’ll start getting better answers.

They’ll continue to ask short one to five word questions but get used to shorter answers. Instead of ten Google results per page, we’ll see 140 character twitter style answers to queries - and trust them!

Who Are The Players?

Twitter have been building a huge database of short but content-rich tweets. Twine is building its user base quite successfully and Facebook is hoarding data like Scrooge McDuck hoards gold. All this data is worthless until it is analysed and made useful to us all.

And it’s not just startups looking towards this ideal - Google & Yahoo have been busy investigating LSI, HMM and watching how people use public APIs. Yahoo pledged to index the semantic web and Google are looking towards speech recognition. Things are getting exciting, web 3.0 is so close I can almost taste it!

It’s not all established firms though - the collapse of western finance markets continue as I write - small, agile startups with a solution will continue to secure investment and operating capital. During hard times shoe-string startups can find it easier to beat the mega-corps but they have to have the best solution!

So, what are the next steps?

We’re going to see a convergence between personalised search, the document classification systems of semantichacker and technologies like Yahoo’s term extracter and there are big bucks to be made by getting this right.

The winner will be able to tell what you’re searching for from the three or four words you type into the search box and your web history and then integrate heterogeneous sources of data to get the results. Contrast the searches of “how much do elephants weigh?” and “how much could I earn in Santa Monica?”. The former is factual and will not change greatly over time, whereas the second requires up-to-date information and knowledge of the searcher. Neither can be answered accurately using traditional search-engine algorithms that look for common textual patterns.

As always with disruptive technologies like semantic search, change and user acceptance will be gradual. To get a user to trust a 140 character answer will take time and peer reviews but I think that we’ll be starting down that road towards the end of 2009.

Am I right?

I’m putting my head above the parapet by making a bold prediction, and one that can be measured but I think the pace of change on the internet is accelerating again and a load of clever people are looking at the possibilities of semantic search.

A load of greedy people (especially advertisers) are looking at these technologies, so one thing is for certain: interesting times ahead.

Am I too optimistic? Way off the mark? Leave a comment.

More on the semantic web.

0 comments, add yours.

Microsoft Embrace And Extend Microformats With hSlice

Posted on 28 Oct 2008 by Andy

Internet Explorer 8 beta has an interesting new feature: hSlice support.

An hSlice is a small chunk of a web page that you can subscribe to with your browser, like RSS but for a section of a page instead of the whole thing. The most obvious application of this technology is for e-commerce sites so you can wait for prices to drop or stock levels to replenish before placing an order but I have no doubt that once the technology gains wider adoption, a wide range of interesting and unpredicted applications will emerge outside of the e-commerce arena.

Will hSlices Be Adopted?

Yes. The idea is a good one and it should appeal to people who already use RSS. The are already a number of Firefox extensions that are bringing hSlice support to the geeks’ favourite browser.

Hopefully, RSS aggregators like Google Reader and NewsGator will add hSlice support soon and then widget systems like iGoogle will quickly follow. The other major browsers should be able to add hSlice support without too much development effort since they all have RSS support and Safari already does something quite similar with web snippets.

A hugely important difference between RSS/Atom feeds and hSlices is that the former are (usually) used to indicate new pages being added to web sites, whilst hSlices show changes to an existing page. Stop and think about that for a moment, that’s a massively important change and it will have an impact on every aspect of the web.

But It’s A Microsoft Standard And They Are Evil!

Calm down dear, your tinfoil hat is slipping.

Seriously, I think there is a need for this technology and, as I said, Apple are doing something similar with Safari’s web snippets. The microformats.org mail discussion list has given a tentative welcome to the new arrival, but pointed out that it would have been nicer to be involved in the naming/design.

It’s good to see Microsoft actually innovating on the web, instead of playing catch-up with the open sourcerers and it’s particularly gratifying to see that innovation taking the form of an open standard.

hSlice Markup

Adding hSlices to your HTML is pretty straight forward, though you’ll need to update the server code to serve updates. Just as with serving hAtom (and most other microformats), you add classes to standard HTML tags, like this:


<div class="hslice" id="1234">
<p class="entry-title">Buy 1 doz eggs</p>
<p class="entry-content"><img src="eggs.jpg" alt="Eggs"/> £2.68 per dozen</p>
<a rel="feedurl" href="http://www.mmmeeja.com/slice-1234.xml">Subscribe to Feed</a>
</div> 

There can be more than one slice per page but each must have a unique ID.

The “feedurl” anchor is optional and, if it is not present, clients are expected to download the entire page and extract the hSlice using its ID.

So, my feeling on hSlice should be pretty clear by now. I don’t care that it’s from Microsoft, it’s going to make for some exciting new web tools and technologies.

DO WANT!


Creative Commons licensed photo by petoo.

0 comments, add yours.

7 Tools To Make The Most Of The Semantic Web

Posted on 17 Jul 2008 by Andy

The semantic web starting to make progress with new, useful applications appearing everyday.Add to that Microsoft’s purchase of Powerset hitting the headlines and you can bet that you will need to know about semantic technologies before long.

We’re here to help get you started with this run-down of really useful tools.

Books about books

The Operator Firefox Extension

Operator is a Firefox extension that adds a toolbar allowing you to explore the microfomats and semantic data embedded in any web page as you surf.

Surfing with Operator is really eye opening as it brings attention to just how much semantic data is already out there on the web, Technorati tags, blog author names, friend lists, feed subscriptions are common and often marked up with the right tags.

Freebase Semantic Database

Freebase provides a database of over 4 million topics, all semantically labelled and accessible through a comprehesive API. The API is REST-based and uses MQL to structure query requests.

One of the greatest features of Freebase is that users can add their own ontologies (or topics), so if you have a load of data that you want to share and make discoverable, you can do so via a bulk upload. Then combining your data with that already present in Freebase is straight-forward.

Semantic Hacker

Semantic Hacker caused controversy when they announced a competition to find the best use of their API with a prize of a measly million dollars!

Leaving the competition aside, their API will take unstructured text (or web pages) and attempt to classify them by placing them into categories that roughly match up to the DMoz directory categories. Their Bayesian algorithm is pretty well-trained and gives good results even when fed web pages that contain irrelevant information like adverts, menus and copyright notices.

Semantic Search Engines

There are a number of search engines providing search facilities for semantic data. SWSE searches RDF data on the web and has a SPARQL search API.

Yahoo also have plans to provide search for the semantic web and have a research project already available online. When this matures and gets customised using their new BOSS technology, it should be a rich platform for developers.

Intellidimension also offer a semantic web search engine with a SPARQL interface.

In fact, there are lots of companies jumping into this area. There is no clear winner yet but SPARQL seems to be the technology of choice. Everybody is eying Google warily, if they make an announcement about their interest in this field, all these companies will see some very tough competition.

Open Calais

This absolute jewell of a service can create semantically structured documents from unstructured text. Owned by Reuters, Open Calais is heavily slanted towards business and news gathering but the results are still very impressive.

When you feed in a chunk of text, it will identify places, dates, people, companies and so forth and even relations or events involving those entities.

It also offers a tagging plug-in for Wordpress blogs. If you have any experience with Tagaroo, I’d love to hear about it so please leave a comment.

Twine

Twine is a CMS with a semantic focus and could well become the Wordpress of the next generation web. Currently still in beta so I haven’t been able to try it out.

Mash It All Up With Pipes

DERI offer an RDF equivalent to Yahoo Pipes, called simply Pipes. Their offering is not as full-featured as Yahoo’s but you can get the source and extend it, and even host it on your own server.

Given the wealth of amazing data provided by all the other services, no doubt you have some great ideas for mash-ups and new services, so Pipes can be a quick way of prototyping.

The future of the web is not so far away and we all need to learn about these technologies so we can be part of it.


Creative Common licensed photo by jm3.

1 comments, add yours.

 

Sitemap

Copyright © 2006-2009 MMMeeja Ltd. All rights reserved.