MMMeeja

blog :: semantic

Ten Useful Wolfram|Alpha Searches

Posted on 18 May 2009 by Andy

So the Wolfram|Alpha engine launched last weekend to a great deal of fanfare, but the reactions from blogs and the twitterverse show that a lot of people just don’t get it.

This is not surprising, we’ve been conditioned by Google et al to type in a phrase and expect ten blue links, so when you first accessed the Wolfram|Alpha input box, what did you type? Your name? A natural language question?

I bet the answer to your first query was the disappointing:

Wolfram|Alpha isn't sure what to do with your input.

That’s because Wolfram Alpha is not a search engine - it’s a knowledge inference engine and so many people struggle to get the best out of it. It deals with facts, maths and statistics and it deals with them very well.

In this post, I’ll show you ten useful queries that should build the right kind of mindset and encourage you to experiment with the tool a bit further.

1. Stock Comparisons

How about a nice chart of Microsoft versus Google?

Microsoft Google stock chart

2. Complex Mathematics

Stephen Wolfram wrote the excellent Mathematica program so you can bet that Wolfram|Alpha will kick ass at maths.

It can solve quadratic equations and differentiate them, and even plot fractals.

Julia set plot

3. Date Manipulation

What is the date of the first Tuesday in May, next year?

Most of us would flick through a calendar to answer a question like this, but Wolfram|Alpha can save us time:

It’s the 4th!

4. Analyse Sports Statistics

Despite knowing nothing about baseball, I can check the histories of the Boston Red Sox and the New York Yankees. That’s probably useful to someone less geeky than me.

5. Find Out Flight Times

Enter two city names to see the distance between them and the average flight time. Here is London to New York.

6. Show Movie Casts

Not as detailed as IMDB but pretty handy all the same.

7. View Currency Fluctuations

Here is a query of definite use to me right now - the dollar vs the UK pound.

5 year data for the Australian dollar vs UK pound

8. Show Website Traffic Estimates

Using Alexa traffic data, Wolfram estimates that apple.com receives 11,111 visits per minute.

9. Compare Chemical Compounds

Very useful for chemistry homework, here’s Methanol compared with Ethanol.

Thermodynamic comparison of Methanol and Ethanol

10. Calculate Your Mortgage Payments In Plain English

Who hasn’t wanted to cut through the financial gibberish and get to the bottom line quickly and easily.

The Wolfram|Alpha interface

Whilst researching this post, I hit a few limitations of the Wolfram|Alpha engine (and user interface) - in particular a bias towards placenames, not great when there are places like Dollar in Scotland and Pound in Wisconsin - but a bit of experimentation usually brings you to the correct syntax.

In all, I think Wolfram Alpha has been a successful launch and I look forward to seeing just what future improvements are in store. In the medium term, I hope that the API is extended to allow third party developers to use the inference engine to process their own data - that would be very cool.

Have you found any interesting queries that showcase Wolfram|Alpha’s engine? Leave a comment.

3 comments, add yours.

Evolving User Interfaces For Semantic Search Engines

Posted on 23 Feb 2009 by Andy

Semantic search engines are starting to appear on the fringes of mainstream web, and thanks to Yahoo’s BOSS/SearchMonkey integrations they are likely to get a lot more prevalent. However, a vital component needs to be overhauled before my mum is going to use them - the user interface.

Let’s start by having a look at current search engine UIs in common use today.

A Single Text Box

At present, search engines just employ a single text box for users to enter a summary of their goal. This works well for most text-based searches, not least because users have learned to modify their behaviour to get the most out of the search technology.

Most users type one to three noun phrases, examine the results and then either drill into the ten blue links or refine their query, often returning to previous queries before they reach their goal.

One of the biggest challenges for semantic search developers will be to modify the users’ learned behaviour. Longer queries give more accurate results, both for text search and semantic search.

Advanced Search

Many search engines offer an “advanced search” option, which takes the user to a lengthy web form comprising of optional search fields. Such complexity makes for a horrendous user experience, as Google discovered when they found large numbers of users viewing the form but leaving before they entered any data.

Faceted Search

“Faceted Search” is a technical term for the filters that you often see on e-commerce sites. For example, a user might search for “adidas shoes” and then supply extra criteria by clicking filters for “Men’s shoes”, “Under £100”, “White” etc.

This is a great improvement in usability where the number of facets is low. Multiple page reloads can be problematic for people with slow connections or using small devices like mobile phones.

Repeatability can be an issue for facted search too. Remembering the search phrase then a sequence of four clicks that got you to your favourite shoes is onerous for the user but the “you recently viewed” feature of sites like Amazon really help.

It’s not just e-commerce sites that have faceted search: Google’s Image, Blog, Finance etc are all search facets too.

Natural Language Search

From Ask.com to Powerset, there’s always been rather more hype than substance surrounding natural language search. The technology performs well for simple queries like “How old is Barack Obama?”

Ask.com search:

Ask.com search results for the query “how old is barack obama?” state Barack Obama is 47 years old

Powerset search:

Powerset search results for the query “how old is barack obama?” gives Barack Obama’s date of birth

...producing better results than the equivalent Google search.

Ask a question that has two or more facets, and these engines fall back to text searches. For example, “Which English philosophers where also classical liberals?” would require an intelligent engine to find the list of English philosophers then find which are mentioned in the page on classic liberals, intersect the two sets and provide the results.

SPARQL & MQL

Neither SPARQL nor FreeBase’s MQL are user friendly, but they’re not designed to be. They are designed to answer complex queries like “Which English philosophers where also classical liberals?”

If you’re interested, here is the SPARQL to perform that query against DBPedia:


PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?name WHERE {
     ?person skos:subject <http://dbpedia.org/resource/Category:English_philosophers> .
     ?person skos:subject <http://dbpedia.org/resource/Category:Classical_liberals> .
     ?person foaf:name ?name .
     ?person rdfs:comment ?description .
     FILTER (LANG(?description) = 'en') .
}
ORDER BY ?name

And here are the results:

{ "head": { "link": [], "vars": ["name"] },
  "results": { "distinct": false, "ordered": true, "bindings": [
    { "name": { "type":"literal", "value": "Herbert Spencer" }},
    { "name": { "type":"literal", "value": "Jeremy Bentham" }},
    { "name": { "type":"literal", "value": "John Locke" }},
    { "name": { "type":"literal", "value": "John Stuart Mill" }} ] } }

Horrible, but perfectly accurate and a real answer, not ten blue links.

The query syntax comes about from having a large number of possible facets to cover - far more than could reliably follow the e-commerce model of applying filters.

The Coming Challenge

User interface designers are going to have to work hard to get from users that are used to a single text box to the complex query syntax of SPARQL and MQL.

I think the first thing we all need to recognise is that that damn text box is not going away! Users know that’s how to interact with a search engine and you’re not going to change their minds anytime soon.

The ten blue links, however, can be scrapped if we can determine that the user is searching for a definitive answer (not just looking for “funny jokes” or “cat pictures”). Instead, a summary of the data available should be presented together with a number of appropriate facets to drill down into the results.

The interface components representing facets should be appropriate to the query - use date pickers for dates, drop-downs for lists of countries etc. Lead the user forward using visual cues that they are already comfortable with, like the timeline in this experimental Google search:

Here’s a very usable faceted search of Nobel Prize winners, although it could benefit from allowing use of the browser back button. You might also like to play around with MedStory, a clinical information search engine that has eye-catching filters, but is let down by the confusing pop-up interface.

Perhaps we can learn from off-line data analysis tools, I’m thinking that techniques like OLAP cubes or Excel’s pivot tables might be made more web-friendly. Maybe the Hollywood cliché 3D interface can help, but I doubt it.

There’s been some good progress made in semantic search technologies but now we need to start thinking about users. How will they interact with the tools? How will semantic search change the way we work online? What limitations will people hate?


Related documents:


Creative Commons licensed photo by dullhunk.

2 comments, add yours.

My Prediction for 2009

Posted on 29 Dec 2008 by Andy

Every good blogger must stick their neck out and guess at what next year will hold. Never one to buck a trend or miss a bandwagon, this is my prediction...

Are you ready? Here it is:

We’ll get closer to the Star Trek ideal... semantic search

You know what I mean: that perfect example of human-computer interaction: “Computer, where is Commander Riker?”, “Commander Riker is on the holodeck on level four, sobbing uncontrollably”.

Ask a (simple) question, get an answer. We’re close now but we’ll be a lot closer by the end of 2009 because next year will be the year that semantic web applications grow beyond acaademic exercises and are finally usable by real human beings. The least disruptive application, and therefore the first that will be in common use, is search.

Semantic Search

Search is not disruptive because the switch from text/graph analysis algorithms (that we use now) to semantic systems are invisible - users ask the same questions but they’ll start getting better answers.

They’ll continue to ask short one to five word questions but get used to shorter answers. Instead of ten Google results per page, we’ll see 140 character twitter style answers to queries - and trust them!

Who Are The Players?

Twitter have been building a huge database of short but content-rich tweets. Twine is building its user base quite successfully and Facebook is hoarding data like Scrooge McDuck hoards gold. All this data is worthless until it is analysed and made useful to us all.

And it’s not just startups looking towards this ideal - Google & Yahoo have been busy investigating LSI, HMM and watching how people use public APIs. Yahoo pledged to index the semantic web and Google are looking towards speech recognition. Things are getting exciting, web 3.0 is so close I can almost taste it!

It’s not all established firms though - the collapse of western finance markets continue as I write - small, agile startups with a solution will continue to secure investment and operating capital. During hard times shoe-string startups can find it easier to beat the mega-corps but they have to have the best solution!

So, what are the next steps?

We’re going to see a convergence between personalised search, the document classification systems of semantichacker and technologies like Yahoo’s term extracter and there are big bucks to be made by getting this right.

The winner will be able to tell what you’re searching for from the three or four words you type into the search box and your web history and then integrate heterogeneous sources of data to get the results. Contrast the searches of “how much do elephants weigh?” and “how much could I earn in Santa Monica?”. The former is factual and will not change greatly over time, whereas the second requires up-to-date information and knowledge of the searcher. Neither can be answered accurately using traditional search-engine algorithms that look for common textual patterns.

As always with disruptive technologies like semantic search, change and user acceptance will be gradual. To get a user to trust a 140 character answer will take time and peer reviews but I think that we’ll be starting down that road towards the end of 2009.

Am I right?

I’m putting my head above the parapet by making a bold prediction, and one that can be measured but I think the pace of change on the internet is accelerating again and a load of clever people are looking at the possibilities of semantic search.

A load of greedy people (especially advertisers) are looking at these technologies, so one thing is for certain: interesting times ahead.

Am I too optimistic? Way off the mark? Leave a comment.

More on the semantic web.

0 comments, add yours.

Microsoft Embrace And Extend Microformats With hSlice

Posted on 28 Oct 2008 by Andy

Internet Explorer 8 beta has an interesting new feature: hSlice support.

An hSlice is a small chunk of a web page that you can subscribe to with your browser, like RSS but for a section of a page instead of the whole thing. The most obvious application of this technology is for e-commerce sites so you can wait for prices to drop or stock levels to replenish before placing an order but I have no doubt that once the technology gains wider adoption, a wide range of interesting and unpredicted applications will emerge outside of the e-commerce arena.

Will hSlices Be Adopted?

Yes. The idea is a good one and it should appeal to people who already use RSS. The are already a number of Firefox extensions that are bringing hSlice support to the geeks’ favourite browser.

Hopefully, RSS aggregators like Google Reader and NewsGator will add hSlice support soon and then widget systems like iGoogle will quickly follow. The other major browsers should be able to add hSlice support without too much development effort since they all have RSS support and Safari already does something quite similar with web snippets.

A hugely important difference between RSS/Atom feeds and hSlices is that the former are (usually) used to indicate new pages being added to web sites, whilst hSlices show changes to an existing page. Stop and think about that for a moment, that’s a massively important change and it will have an impact on every aspect of the web.

But It’s A Microsoft Standard And They Are Evil!

Calm down dear, your tinfoil hat is slipping.

Seriously, I think there is a need for this technology and, as I said, Apple are doing something similar with Safari’s web snippets. The microformats.org mail discussion list has given a tentative welcome to the new arrival, but pointed out that it would have been nicer to be involved in the naming/design.

It’s good to see Microsoft actually innovating on the web, instead of playing catch-up with the open sourcerers and it’s particularly gratifying to see that innovation taking the form of an open standard.

hSlice Markup

Adding hSlices to your HTML is pretty straight forward, though you’ll need to update the server code to serve updates. Just as with serving hAtom (and most other microformats), you add classes to standard HTML tags, like this:


<div class="hslice" id="1234">
<p class="entry-title">Buy 1 doz eggs</p>
<p class="entry-content"><img src="eggs.jpg" alt="Eggs"/> £2.68 per dozen</p>
<a rel="feedurl" href="http://www.mmmeeja.com/slice-1234.xml">Subscribe to Feed</a>
</div> 

There can be more than one slice per page but each must have a unique ID.

The “feedurl” anchor is optional and, if it is not present, clients are expected to download the entire page and extract the hSlice using its ID.

So, my feeling on hSlice should be pretty clear by now. I don’t care that it’s from Microsoft, it’s going to make for some exciting new web tools and technologies.

DO WANT!


Creative Commons licensed photo by petoo.

1 comments, add yours.

7 Tools To Make The Most Of The Semantic Web

Posted on 17 Jul 2008 by Andy

The semantic web starting to make progress with new, useful applications appearing everyday.Add to that Microsoft’s purchase of Powerset hitting the headlines and you can bet that you will need to know about semantic technologies before long.

We’re here to help get you started with this run-down of really useful tools.

Books about books

The Operator Firefox Extension

Operator is a Firefox extension that adds a toolbar allowing you to explore the microfomats and semantic data embedded in any web page as you surf.

Surfing with Operator is really eye opening as it brings attention to just how much semantic data is already out there on the web, Technorati tags, blog author names, friend lists, feed subscriptions are common and often marked up with the right tags.

Freebase Semantic Database

Freebase provides a database of over 4 million topics, all semantically labelled and accessible through a comprehesive API. The API is REST-based and uses MQL to structure query requests.

One of the greatest features of Freebase is that users can add their own ontologies (or topics), so if you have a load of data that you want to share and make discoverable, you can do so via a bulk upload. Then combining your data with that already present in Freebase is straight-forward.

Semantic Hacker

Semantic Hacker caused controversy when they announced a competition to find the best use of their API with a prize of a measly million dollars!

Leaving the competition aside, their API will take unstructured text (or web pages) and attempt to classify them by placing them into categories that roughly match up to the DMoz directory categories. Their Bayesian algorithm is pretty well-trained and gives good results even when fed web pages that contain irrelevant information like adverts, menus and copyright notices.

Semantic Search Engines

There are a number of search engines providing search facilities for semantic data. SWSE searches RDF data on the web and has a SPARQL search API.

Yahoo also have plans to provide search for the semantic web and have a research project already available online. When this matures and gets customised using their new BOSS technology, it should be a rich platform for developers.

Intellidimension also offer a semantic web search engine with a SPARQL interface.

In fact, there are lots of companies jumping into this area. There is no clear winner yet but SPARQL seems to be the technology of choice. Everybody is eying Google warily, if they make an announcement about their interest in this field, all these companies will see some very tough competition.

Open Calais

This absolute jewell of a service can create semantically structured documents from unstructured text. Owned by Reuters, Open Calais is heavily slanted towards business and news gathering but the results are still very impressive.

When you feed in a chunk of text, it will identify places, dates, people, companies and so forth and even relations or events involving those entities.

It also offers a tagging plug-in for Wordpress blogs. If you have any experience with Tagaroo, I’d love to hear about it so please leave a comment.

Twine

Twine is a CMS with a semantic focus and could well become the Wordpress of the next generation web. Currently still in beta so I haven’t been able to try it out.

Mash It All Up With Pipes

DERI offer an RDF equivalent to Yahoo Pipes, called simply Pipes. Their offering is not as full-featured as Yahoo’s but you can get the source and extend it, and even host it on your own server.

Given the wealth of amazing data provided by all the other services, no doubt you have some great ideas for mash-ups and new services, so Pipes can be a quick way of prototyping.

The future of the web is not so far away and we all need to learn about these technologies so we can be part of it.


Creative Common licensed photo by jm3.

1 comments, add yours.

 

Sitemap

Copyright © 2006-2009 MMMeeja Ltd. All rights reserved.