blog :: web development :: rest :: what is a url

Anatomy Of A URL

Posted on 12 Nov 2008 by - Permanent link Trackback this post Subscribe to this post Comment on this post -   

I am writing a series of articles on RESTful web services but some research showed that a surprising number of people don’t understand the basics of what a URL is. So, in the interest of learning to walk before we can run, I’ll explain what a URL is and how the various components are used in modern web design.

URL stands for Uniform Resource Locator, as defined by this W3C standard and a URL identifies a unique item on the web. The item might be a web page, an image, a database item (like an ebay auction), an MP3 file or anything that can be represented on the web.

An item may have several different URLs, but each URL only points to one thing - although that thing may change over time.

A typical URL might look like this:

http://www.example.com/directory/page.html?param1=value1&param2=another%20param

That’s a pretty complex example, so let’s break it into its constituent components...

Protocol

The bit before the colon is the URL protocol, http in our case.

This tells the web browser how to talk to the server - how to ask for a resource, what will happen if the resource does not exist and so on.

There are several URL protocols available, but four are most common on the world wide web today:

  • http- HyperText Transfer Protocol is the protocol used by web servers. The page you are reading now was delivered via HTTP.
  • https - Secure HTTP is just the same as normal HTTP except that the transmissions are encrypted. If you enter passwords or your credit card details on a web site, you want to ensure that this protocol is used by checking for a padlock icon in your web browser.
  • mailto - This protocol allows for clickable email addresses.
  • ftp - File Transfer Protocol is used to manipulate files over the internet.

Double Slash

Our example has a double slash (//) after the protocol. This indicates that the URL is an absolute URL - that it does not need any context to resolve to a unique resource.

The opposite of an absolute URL is a relative URL. It makes no sense to type a relative URL into your browser’s address bar, but embedding one in a web page would indicate that the resource can be found relative to the URL of the page. If you want to know more about relative URLs, this article on UNIX relative paths should help, as relative URLs follow the same standards.

Domain Name

The next bit of an absolute URL is the domain name, www.example.com in the example. This identifies a computer (or cluster of computers) on the internet that stores the resource you want. Domain names are not case sensitive so www.example.com and WwW.eXAMplE.COM are equivalent.

Path

The URL path tells the server how to find the resource that you require. The path in our example URL is directory/page.html. Unlike domain names, paths are case sensitive.

You might think that the .html indicates that the resource is a HTML file but that is not necessarily true! Your web browser will check the contents of the file to determine what kind of resource it is and how it should handle it and if you are writing any code that downloads from the internet, you should do the same.

CGI Parameters

The end of our URL has two parameters, param1 and param2.

Parameters are optional for all URLs and their presence is indicated by the question mark (?). An ampersand (&) is used to separate multiple parameters.

Parameters can also be assigned a value using an equals sign (=), as is the case in our example.

param1’s value is “value1” but param2’s value is a bit more complex - it contains a space!

URL components can only contain certain characters: A-Z 0-9 underscores and dashes. All other characters must be “URL encoded” that is translated into a percent sign (%) followed by their ASCII hex value.

ASCII translation can also be used in other parts of the URL, but it’s not recommended - can you imagine reading out the URL over the phone and saying “H-T-T-P-colon-slash-slash-A-B-percent-twenty-C” Ridiculous!

Other URL Components

There are many other URL components that you might encounter that weren’t present in our example.

Port Numbers

Some URLs add a port number to the domain name, like this:

http://www.example.com:8080/page.html

Most web servers operate on default ports, but sometimes another port might be specified. The default ports are:

  • Port 80 - HTTP
  • Port 443 - HTTPS
  • Port 21 - FTP

Specifying a different port does not mean you can supply an incorrect protocol in the URL - trying to talk HTTP with an FTP server will fail.

Usernames & Passwords In URLs

It is possible, although rare, to specify a username and password inside a URL. In this case, the username and/or password are supplied before the domain name, like this:

ftp://username:password@hostname/

I hope I don’t have to tell you just how insecure it would be to embed a URL like this in a web page.

You Should Now Know All About URLs

That pretty much covers the basics of URLs. Feel free to experiment with your own web spaces - there’s nothing that can go wrong with asking a webserver for a resource via a URL. Ask questions in the comments too, I’ll do my best to answer.

This post is part of a series on REST so if you’ve found it useful, subscribe to catch the other articles in the series.


Creative Commons licensed photos by Laughing Squid and dailyinvention.

0 Trackbacks

Trackbacks are closed for this story.

0 Comments

Comments are closed for this story.

 

Sitemap

Copyright © 2006-2009 MMMeeja Pty. Ltd. All rights reserved.