MMMeeja

blog :: web development

A Beginners Guide To REST Web Services

Posted on 19 Nov 2008 by Andy

This article series will take you through RESTful web services, introducing the technologies behind them, what they are, how to use them and how to implement them.

The series is presented in five parts, each of which will be posted over the next few days so sign up to the MMMeeja RSS feed to make sure you don't miss a single post!

REST (or Representational State Transfer) is fast becoming the de facto standard for exposing APIs on the web, beating more complex SOAP/RPC services. It's easy to understand why - it's a logical extension to how we all use the web already - as you'll see as the series continues.

I start off with a couple of introductory articles to the underlying technologies to get you up to speed:

1. What Is A URL?
2. An Introduction To HTTP

Then, let's get to the meat with:

3. Examples Of REST Interfaces
4. Using RESTful Web Services

And for the more advanced:

5. Designing And Implementing A RESTful Interface

So, the series will be a trip from n00b to l33t h4xx0r in five installments. Don't forget to subscribe.


Creative Commons licensed photo by Bobby ~ Lawcrow911.

0 comments, add yours.

Examples Of REST Interfaces

Posted on 19 Nov 2008 by Andy

Continuing our look at Representational State Transfer interfaces, part three lets us get our hands dirty with some real-life RESTful interfaces from BrightKite and del.icio.us.

But first...

What Is A RESTful Interface?

Three articles into the series and I finally get to explain this, thanks for sticking around!

Knitted body to laptop interface

REST stands for Representational State Transfer, a term first coined in Roy Fielding’s PHd Thesis. It involves using HTTP to manipulate resources identified by URLs via a completely stateless protocol.

The URLs used are generally quite pretty. For example, an e-commerce store might sell a product identified by this URL:

http://mystore.com/products/1234

The store might have an API that supplied users with a list of products by performing a HTTP GET request on this URL:

http://mystore.com/products

The response might be supplied in an XML document, like this:

 <?xml version="1.0"?>
<products>
    <product id="1">
         <url>http://mystore.com/products/1<url>
         <name>Red widget<name>
         <description>A large, left-handed widget<description>
         <stock-level>406<stock-level>
    <product>
    <product id="2">
         <url>http://mystore.com/products/2<url>
         <name>Blue widget<name>
         <description>A small, left-handed widget<description>
         <stock-level>12<stock-level>
    <product>
<products>

Stateless Protocol

I mentioned that REST is a stateless protocol, so how can that work in an e-commerce context? People need to browse around the site and fill up their shopping basket!

Simple, a shopping basket is a resource too. Create a basket with a POST request to this URL:

http://mystore.com/baskets

And add items with POSTs like this:

http://mystore.com/baskets/1/products/2

You can use HTTP authentication to ensure that only the basket owner can view or manipulate the contents of the basket.

Enough of fabricated example shops, let’s look at some RESTful APIs that are being used on the internet today...

Delicious

One of the first mainstream web applications to offer a RESTful interface was del.cio.us with its version 2 API. We’ll be using the del.icio.us API for this example, so if you’re following along be sure to get a del.icio.us account and download CURL, the tool that we’ll be using.

Delicious Logo

We’ll begin by using the API to get a list of tags that we’ve applied to our bookmarks. Open a command line terminal and type the following (changing username & password to your values):

curl --user username:password https://api.del.icio.us/v1/tags/get

This executes an authenticated GET request for your set of tags and the output will look something like this:


<?xml version="1.0" encoding="UTF-8"?>
<tags>
  <tag count="1" tag="Action"/>
  <tag count="1" tag="Bookmarklet"/>
  <tag count="1" tag="Day"/>
  <tag count="8" tag="EC2"/>
  <tag count="1" tag="GTD"/>
  <tag count="14" tag="IE"/>
  <tag count="14" tag="Maps"/>
  <tag count="1" tag="On"/>
  <tag count="3" tag="yahoo"/>
  <tag count="1" tag="youtube"/>
<tags>

Pretty cool, no?

Now, let’s examine the REST API to add a new bookmark:

curl --user username:password -d "?url=http%3A%2F%2Fwww.mmmeeja.com%2Fblog%2F&tags=webdev%20web%20design" https://api.del.icio.us/v1/tags/add

The command above will perform a HTTP POST to add this blog :) to your del.icio.us account. Note that the url parameter is URL encoded and the tag parameter has spaces replaced by %20.

The observant reader will note that there’s something fishy about the two del.icio.us URLs we just used. The first ended in /get and the second in /add - this goes against the whole idea of using HTTP methods for REST! Yes, delicious have compromised by allowing people to use HTTP GET to create resources and the only way to do that is to break the HTTP model.

This is very common and I understand why they did it, but it’s a shame, especially when I’m using their API to teach about RESTful interfaces. Roy Fielding successfully argues that a resource should only ever have one URL no matter whether the API changes.

So, how about another API that’s a bit more compliant?

BrightKite

I had to search for a long time to find a well-known service that implemented to truly RESTful API, so hats off to BrightKite.

BrightKite Logo

If you’ve not got a BrightKite account already, you might be out of luck because its invite only at the moment, but there are plenty of invites around if you know where to look.

The BrightKite API documentation explains that people, places, notes, photos etc are all resources available via their API. Placenames are rarely unique and unambiguous so places are assigned UUIDs, so the URL for City Square, in Leeds is:

http://brightkite.com/places/b1adc0a0b65e11dd8a90003048c0801e

That URL takes you to the web page, but you can get the information in XML format by performing a HTTP GET of the URL with “.xml” on the end:

curl http://brightkite.com/places/b1adc0a0b65e11dd8a90003048c0801e.xml

You should be able to access that part of the BrightKite API without an account, so try it and check out the results. You can also get the results in JSON format by substituting “.json” for “.xml”.

Posting Notes

Places can have notes associated with them, get a list of notes attached to City Square like this:

curl http://brightkite.com/places/b1adc0a0b65e11dd8a90003048c0801e/notes.xml

In correct RESTful fashion, we can create a note with an HTTP POST. We must be an authenticated user to be allowed to do this, so replace the username and password fields in the command below:

curl -u username:password -X POST http://brightkite.com/places/b1adc0a0b65e11dd8a90003048c0801e/notes -dnote[body]=My%20lovely%20note

Finally, We Get Some REST!

That was part three of the series, only two more to go. I hope that it’s been worth getting up to speed on URLs and HTTP before we got into the REST examples.

In the next installment, I’ll cover using RESTful interfaces from software. Both client-side javascript and server-side scripts will be featured, so don’t forget to subscribe.


Creative Commons licensed photo by Bekathwia.

9 comments, add yours.

An Introduction To HTTP

Posted on 16 Nov 2008 by Andy

HTTP (HyperText Transfer Protocol) defines how resources get transfered across the web. In the previous article on URLs, I explained that there are other protocols but HTTP is the most important when it comes to RESTful interfaces.

HTTP Methods

When a client (like your web browser) asks for a resource from a web server, it uses one of:

  • GET
  • PUT
  • POST
  • DELETE
  • HEAD

HTTP GET

This method means “I’d like the resource, please” The client or the server can override the request and grab a copy from a cache.

I’m going to say that again the client can override the request and grab a copy from its cache. That means no network activity, the server doesn’t even know that the resource was requested.

HTTP PUT & POST

Both these can mean “here is a resource for you” They could be an update to an existing resource or create a new one, the HTTP standard does not specify which. We will see later on that in RESTful interfaces a PUT usually means update, whilst POST means create.

POST is commonly used in web forms, so when you buy an item from Amazon or bid on an Ebay auction, you are sending a POST request. They should never be cached.

HTTP DELETE

I bet you can guess what this does!

HTTP HEAD

This is a more interesting method, it means “tell me about the resource, but don’t bother sending the resource” It can be used to check whether a resource exists (by checking the response code), the type of the resource or how old it is (by check the headers).

HTTP Response Codes

The server will answer a HTTP request with a code that indicates whether the request could be processed. I’m sure you know some of these code already, like the dreaded 404!

There’s quite a bunch of them, so here are a select few that are applicable to RESTful APIs:

  • 200 OK
  • 201 Created
  • 202 Accepted
  • 304 Not Modified
  • 400 Bad Request
  • 401 Unauthorized
  • 403 Forbidden
  • 404 Not Found
  • 405 Method Not Allowed
  • 406 Not Acceptable
  • 409 Conflict
  • 410 Gone
  • 411 Length Required
  • 412 Precondition Failed
  • 413 Request Entity Too Large
  • 414 Request-URI Too Large
  • 415 Unsupported Media Type
  • 500 Internal Server Error
  • 501 Not Implemented
  • 503 Service Unavailable

I’ll explain how these might apply to a web service in a later article, when we cover REST in more detail.

HTTP Headers

Both requests and responses can have extra headers attached. If you’ve done any web development you’re bound to know a few common headers (like UserAgent and Referer) already but there are many more that are useful when we’re dealing with APIs.

Request Headers

A classic use of a request header comes about when a resource can be represented in many forms, such as a product description available as a HTML page, XML document, JSON or a photo. In this case, a client might send its GET request with an Accept header indicating that it wants JSON, like this:

Accept: text/json

Another useful request header can be used to get resources that have been updated recently:

If-Modified-Since: Sat, 25 Oct 2008 19:43:31 GMT

Response Headers

Response headers contain meta-data about the requested resource. They can explain what format the data is (Content-Type), how long the data is valid for (the Expires header) or when it was last written (Last-Updated).

Don’t forget that you can use the HEADERS method to get just the response code and headers back from the server.

End Of Part Two

Hopefully you now have a decent grasp on the magic protocol behind the web. I find it really helps if you can think in terms of objects or resources instead of pages, then you’ll get a better feel for exactly what your browser is doing when you ask for a URL.

As always, feel free to ask questions via the comments, I’ll do my best to answer.

This post is part of a series on REST so if you’ve found it useful, subscribe to catch the other articles in the series.

0 comments, add yours.

Anatomy Of A URL

Posted on 12 Nov 2008 by Andy

I am writing a series of articles on RESTful web services but some research showed that a surprising number of people don’t understand the basics of what a URL is. So, in the interest of learning to walk before we can run, I’ll explain what a URL is and how the various components are used in modern web design.

URL stands for Uniform Resource Locator, as defined by this W3C standard and a URL identifies a unique item on the web. The item might be a web page, an image, a database item (like an ebay auction), an MP3 file or anything that can be represented on the web.

An item may have several different URLs, but each URL only points to one thing - although that thing may change over time.

A typical URL might look like this:

http://www.example.com/directory/page.html?param1=value1&param2=another%20param

That’s a pretty complex example, so let’s break it into its constituent components...

Protocol

The bit before the colon is the URL protocol, http in our case.

This tells the web browser how to talk to the server - how to ask for a resource, what will happen if the resource does not exist and so on.

There are several URL protocols available, but four are most common on the world wide web today:

  • http- HyperText Transfer Protocol is the protocol used by web servers. The page you are reading now was delivered via HTTP.
  • https - Secure HTTP is just the same as normal HTTP except that the transmissions are encrypted. If you enter passwords or your credit card details on a web site, you want to ensure that this protocol is used by checking for a padlock icon in your web browser.
  • mailto - This protocol allows for clickable email addresses.
  • ftp - File Transfer Protocol is used to manipulate files over the internet.

Double Slash

Our example has a double slash (//) after the protocol. This indicates that the URL is an absolute URL - that it does not need any context to resolve to a unique resource.

The opposite of an absolute URL is a relative URL. It makes no sense to type a relative URL into your browser’s address bar, but embedding one in a web page would indicate that the resource can be found relative to the URL of the page. If you want to know more about relative URLs, this article on UNIX relative paths should help, as relative URLs follow the same standards.

Domain Name

The next bit of an absolute URL is the domain name, www.example.com in the example. This identifies a computer (or cluster of computers) on the internet that stores the resource you want. Domain names are not case sensitive so www.example.com and WwW.eXAMplE.COM are equivalent.

Path

The URL path tells the server how to find the resource that you require. The path in our example URL is directory/page.html. Unlike domain names, paths are case sensitive.

You might think that the .html indicates that the resource is a HTML file but that is not necessarily true! Your web browser will check the contents of the file to determine what kind of resource it is and how it should handle it and if you are writing any code that downloads from the internet, you should do the same.

CGI Parameters

The end of our URL has two parameters, param1 and param2.

Parameters are optional for all URLs and their presence is indicated by the question mark (?). An ampersand (&) is used to separate multiple parameters.

Parameters can also be assigned a value using an equals sign (=), as is the case in our example.

param1’s value is “value1” but param2’s value is a bit more complex - it contains a space!

URL components can only contain certain characters: A-Z 0-9 underscores and dashes. All other characters must be “URL encoded” that is translated into a percent sign (%) followed by their ASCII hex value.

ASCII translation can also be used in other parts of the URL, but it’s not recommended - can you imagine reading out the URL over the phone and saying “H-T-T-P-colon-slash-slash-A-B-percent-twenty-C” Ridiculous!

Other URL Components

There are many other URL components that you might encounter that weren’t present in our example.

Port Numbers

Some URLs add a port number to the domain name, like this:

http://www.example.com:8080/page.html

Most web servers operate on default ports, but sometimes another port might be specified. The default ports are:

  • Port 80 - HTTP
  • Port 443 - HTTPS
  • Port 21 - FTP

Specifying a different port does not mean you can supply an incorrect protocol in the URL - trying to talk HTTP with an FTP server will fail.

Usernames & Passwords In URLs

It is possible, although rare, to specify a username and password inside a URL. In this case, the username and/or password are supplied before the domain name, like this:

ftp://username:password@hostname/

I hope I don’t have to tell you just how insecure it would be to embed a URL like this in a web page.

You Should Now Know All About URLs

That pretty much covers the basics of URLs. Feel free to experiment with your own web spaces - there’s nothing that can go wrong with asking a webserver for a resource via a URL. Ask questions in the comments too, I’ll do my best to answer.

This post is part of a series on REST so if you’ve found it useful, subscribe to catch the other articles in the series.


Creative Commons licensed photos by Laughing Squid and dailyinvention.

0 comments, add yours.

Programming Collective Intelligence

Posted on 28 Oct 2008 by Andy

Since the web first took off, I have found myself buying fewer and fewer computing textbooks as reference documentation moved online and blogs provided a wealth of how-to articles. I still sometimes scan the computing shelves of my local bookstores in idle moments and that is how I chanced across Programming Collective Intelligence by Toby Segaran.

Programming Collective Intelligence cover

The book is subtitled “Building Smart Web 2.0 Applications” and that is very appropriate. It is aimed squarely at web developers who, like me, are fascinated by the interactive nature of modern web applications and the use of machine-learning algorithms that make use of all the juicy data collected by the likes of eBay, Amazon and del.icio.us.

This is not a book that will teach you how to program or how to design a website - it is aimed squarely at competent, experienced back-end web developers who want to see the algorithms behind some of the world’s most successful websites.

Wonderful Examples

The examples contained within the book are its greatest strength.

Toby Segaran chose to use Python throughout the book, a wise choice. Despite that I have little Python experience, it is a very readable language and Toby deliberately avoids language-specific tricks and obscure libraries.

Each chapter introduces algorithms that solve a specific problem, including recommendation engines, categorisation, search engines, optimisation and more. Open APIs (such as the del.icio.us API) are used where possible and the example code is structured in a very modular, pluggable manner. Readers are encouraged to experiment via the Python shell.

Practical Introductions

The book introduces each algorithm with an overview that does not resort to intense mathematics, which was great for me since I promptly forgot most of my maths after graduating. Compare this from the book with the Wikipedia article on the same subject.

The author correctly surmises that most readers will not need to implement common algorithms from scratch but will use well-constructed third-party libraries, and so do not need to know a great deal of academic detail about each technique.

The book really lends itself to being used as a reference when searching for an appropriate algorithm in respositories like CPAN: chapter 12 provides a summary of the rest of the book with each algorithm’s strengths and weaknesses clearly presented.

Conclusion

A book for the hardcore geek? Yes, as my girlfriend pointed out when I showed her my purchase. But also a book for programmers with a healthy curiosity, as she later said “That sounds really interesting!”


Creative Commons licensed photo by vj_pdx.

2 comments, add yours.

A List Of Google Maps Top Level Domains

Posted on 23 Jun 2008 by Andy

Google Maps has a fantastic geolocation API that attempts to translate text representing a location into hard and fast latitude/longitude coordinates. A great service that is in use by many websites, not least Google Maps itself.

Some globes

For web developers who use the service from their servers, there’s an extra level of complexity to deal with - the geocoding API biases its results towards the country from which the request was made. So “Perth” could be the Scottish town to a user in Britain or the Western Australian capital if the request came from an IP down-under. When your servers are located in one country this can limit their usefulness to an international audience.

The solution is to call the API from an appropriate version of Google Maps, so if your server receives a request from a Japanese IP, call the Google API via maps.google.co.jp. Nice and easy, eh?

No, not quite, because not every country has its own Google Maps address. I searched for a canonical list of them, but could find nothing, so was forced to use a bit of Perl hackery to try every country domain preceded by “http://maps.google.(|co.|com.)”. These are my results:

CountryISO 3166-1URL
USAUS/Defaulthttp://maps.google.com
AustriaAThttp://maps.google.at
AustraliaAUhttp://maps.google.com.au
Bosnia and HerzegovinaBAhttp://maps.google.com.ba
BelgiumBEhttp://maps.google.be
BrazilBRhttp://maps.google.com.br
CanadaCAhttp://maps.google.ca
SwitzerlandCHhttp://maps.google.ch
Czech RepublicCZhttp://maps.google.cz
GermanyDEhttp://maps.google.de
DenmarkDKhttp://maps.google.dk
SpainEShttp://maps.google.es
FinlandFIhttp://maps.google.fi
FranceFRhttp://maps.google.fr
ItalyIThttp://maps.google.it
JapanJPhttp://maps.google.jp
NetherlandsNLhttp://maps.google.nl
NorwayNOhttp://maps.google.no
New ZealandNZhttp://maps.google.co.nz
PolandPLhttp://maps.google.pl
RussiaRUhttp://maps.google.ru
SwedenSEhttp://maps.google.se
TaiwanTWhttp://maps.google.tw
United KingdomUK/GBhttp://maps.google.co.uk

Hope you find the list useful, I’ll try to keep it up to date as Google rolls out more servers across the world, so don’t forget to bookmark it.

Do you know of a Google Maps domain that I’ve missed? Leave a comment below.


Creative Commons licensed photo by Hive.

3 comments, add yours.

 

Sitemap

Copyright © 2006-2008 MMMeeja Ltd. All rights reserved.