Class 10 – Fancy URLs: Customizing Your Site’s URLs Using Mod_Rewrite

December 4th, 2009 § 0

Now that you know all the basic techniques of web development, it’s time to start thinking about aesthetics. One the most obvious aesthetic choices you can make on your site is what domain name you choose, and what you call the file names on that site. Domain names are something I can’t help you with, but the rest of the URL after the domain name, including the folder and file names, is something I can help you beautify.

This is an advanced topic, but one that can provide polish to your sites if you are comfortable with all we have covered so far.

The problem: ugly URLs

As you know, depending on what we call our files and how we use the query string to pass data from one page to another, we sometimes end up with URLs that look like this:

http://onepotcooking.com/index.php?post=19&view=rss

But you might rather have URLs that look like this:

http://onepotcooking.com/rss/post/19/

And actually, search engines sometimes prefer more descriptive URLs, so they can more easily determine what a page is about:

http://onepotcooking.com/rss_feed/why_urls_should_be_pretty.html

But you don’t want to change the structure of your folders and file names, and change the entire way you use the $_GET, $_POST, and $_REQUEST variables in PHP just to make the URLs pretty. When you’re coding the site, you’re usually thinking about functionality and getting the job done, not aesthetics.

The solution to URL woes: mod_rewrite

Apache, the most popular software used by web servers to handle the requests and responses for web pages (and the software used by our class server and most other UNIX web servers) comes with a module called mod_rewrite that is used for creating custom URLs.

mod_rewrite lets you publish fancy URLs like:

http://onepotcooking.com/isnt_this_a_prety_url.html

But have them actually get converted internally into ugly URLs like this, without the user ever seeing it:

http://onepotcooking.com/process_something.php?id=1884&to_do=something&this_is=ugly

You will be able to use the fancy URLs for any links to your pages, but your folders, filenames, and PHP code will not have to change, so long as you use mod_rewrite correctly.

Rewriting vs. Redirecting

This process of having fancy URLs that get internally converted by the server into ugly URLs is known as URL rewriting. With a rewrite, since it only happens internally in the server, the user only ever sees the fancy URL. They will never see the ugly URL in the browser address bar.

However, the term redirect is generally used to refer to the technique where client, meaning the web browser, handles the redirecting. In the case of a client-side redirect, the user can see the final destination URL in the browser’s address bar after the redirect occurs. So they will ultimately see the ugly URL clearly in the address bar of the browser.

Another look at the client/server request/response relationship

To understand how mod_rewrite works, it’s important to understand where it fits into the whole request/response relationship. Here’s a very broad overview of the just relevant steps of what happens when a client requests a file from a server:

  • a user tries to load a web page in the browser (whether by going directly to a URL, clicking a link, submitting a form, or making an AJAX request)
  • the browser sends an HTTP request (either GET or POST) for the file to the server.
  • the server receives the request, and launches Apache’s request handler
  • Apache tries to figure out how to respond to the request
  • Apache first checks mod_rewrite settings to see if it should do any fancy processing of the URL of the file that the user is requesting
  • Then, if Apache determines that the requested file is a PHP script, it launches the PHP engine and sends any data that was passed along with the request to the PHP script that the browser requested
  • The PHP script runs and sends its output back to Apache
  • Apache sends a response to the web browser. The response contains an HTTP status code indicating some information about whether the request was processed properly or not, as well as any content that was output by the requested file, regardless of whether it’s a PHP script, HTML file, CSS file, Javascript file, or any other type of file.
  • The browser receives the response from Apache, and figures out how to display whatever content it received back from the server to the user.

As you can see, the mod_rewrite technique we will be discussing that allows sites to use fancy URLs will occur after the server has received the request from the browser, but before it has passed that request on to the PHP processor. It will be written in language that Apache can understand, not in PHP, since when it is processed, the PHP engine hasn’t even been launched yet.

Apache configuration files: httpd.conf and .htaccess

When a user requests a URL like this:

http://onepotcooking.com/spring2010/test.php

the Apache server checks two sets of configuration files to see whether it should do something fancy with that URL.

First, Apache checks its main configuration file, called httpd.conf, which is usually buried somewhere obscure in the deep recesses of the server filesystem. Httpd.conf has global settings that apply to your entire site. If you have a shared hosting plan for your site, which most of you will do, you do not have access to this file.

After it has checked httpd.conf for any relevant settings, Apache then checks the directory-specific configuration files called .htaccess, which have settings that apply only to specific folders.

With the example URL above, Apache would have to check for the existence of either of these two .htaccess files:

/.htaccess
/spring2010/.htaccess

Since the requested file is nested inside the spring2010/ folder, which is inside of the root / folder, either of those settings files could have an effect on how the request for the file is handled by the server.

We will be focusing on settings in the .htaccess files since these are the ones you will always have access to, regardless of your hosting setup. However, the same URL rewriting techniques will be applicable to settings in the httpd.conf file, with slight modifications.

How to use .htaccess files to rewrite URLs

Rather than rewrite an entire tutorial on how to rewrite URLs (which I initially started to do), there is an excellent tutorial already written which covers all the basic types of rewriting you are likely to do:

http://corz.org/serv/tricks/htaccess2.php

Note: Although I don’t think it’s clearly described on this site, all of the example code written there is meant to go into a file called “.htaccess” located in the root folder of your project. So if your project is at http://onepotcooking.com/johnhancock/final_project/, you should create an .htaccess file located at /johnhancock/final_project/.htaccess, so you can create fancy URLs like http://onepotcooking.com/johnhancock/final_project/this-is-a-fancy-url.html

In other words, fancy URLs only work at the level at which you put an .htaccess file. If you want a fancy URL like http://onepotcooking.com/this-is-a-fancy-url.html, you need to put an .htaccess file in the root folder of the server, /.htaccess.

I highly recommend you read that otherwise well-written document linked above if you wish to use fancy URLs on your own sites.

An example page

I have created a single example PHP script which can be accessed by a number of fancy URLs by taking advantage of rewriting rules found in a .htaccess file in the same folder. The PHP script just outputs whatever data was passed to it in the query string along with the GET request.

In other words, there is an .htaccess file which is allowing a variety of fancy URLs to all internally point to the same PHP script. Each URL is meant to exhibit a slightly different aspect of URL rewriting that may be useful to you. Several of them focus on passing data through the query string even though there is no query string in the fancy URL.

You will definitely want to read that tutorial linked above before going in to read the code in this example.

The direct URL to the example script is http://onepotcooking.com/amosbloomberg/spring2010/class10/mod_rewrite/index.php

The fancy URLs that internally rewrite to that same script are:

And the following URL uses mod_rewrite to do a client-side redirect (not a rewrite):

Reminder: all the rules that allow these URLs to point to and pass data to the same index.php script are found in the .htaccess file in the same folder as the PHP script.

Class 10 – Spiffing up the browser address bar with Favicons and Fancy URLs

July 22nd, 2009 § 0

You should consider the URL address of your site to be part of its design.  A memorable URL, and a nicely designed favicon are probably the first two things anyone sees of your work.

Intro to Favicons, Fancy URLs, and Search Engine Optimization

To read about what a favicon is, and how to create one, click that link.

To read about what I mean by fancy URLs, and how to create them, click that link.  A simpler example than those found on this link follows in this post.

Fancy URLs, meaning intuitive URLs that are easy to understand, are also important for Search Engine Optimization (SEO).  Click to read more about developing your web site with SEO in mind.

An example of Fancy URLs

I will now outline a relatively simple example of creating Fancy URLs.  Click to see  this example in action.

By clicking that link to see this example in action, your browser will bring you to this URL:

http://onepotcooking.com/amosbloomberg/summer2009/class9/mod_rewrite/animals/

If you click one of the animal names in that file, your browser will bring you to a URL that looks like something like this:

http://onepotcooking.com/amosbloomberg/summer2009/class9/mod_rewrite/animals/15

The first thing to notice is that if you view the files in that project folder on the server, you’ll see that there is no subfolder called “animals/ ” in there.  So Fancy URLs are a euphemism for Fake URLs.

The .htaccess file

The file named .htaccess in this folder contains a few rules that make this trick possible.  The first rewrite rule in the file is this:

RewriteRule ^animals/$ index.php [QSA]

This says that if the browser requests the folder “animals/“,  the server should respond by sending the file “index.php” to the browser instead.

The second rule looks like this:

RewriteRule ^animals/([0-9]+)$ index.php?animal_id=$1 [QSA]

This rule says that if the browser requests the folder “animals/” followed by any number, such as “animals/15“, then the server should convert that into a request for the file “index.php?animal_id=15″ instead.

As you can see, in this second rule, part of the Fancy URL has been converted into a bit of data passed via the query string along with the request for the file.  This is a common trick to make it less obvious that data is being passed to the server with the request.

Class 11 – Introduction to Search Engine Optimization

May 6th, 2009 § 0

The techniques website developers and marketers use to promote their web sites are many and varied.  Promotions on the web are not so different from promotions in any other medium – you need to use any and all channels available to you for getting the word out.  What used to be known as guerrilla marketing is now the norm online.

If a tree falls in the woods…

If your site doesn’t show up in the first page of Google results, does it really exist?  In some cases, getting your site listed near the top of a search for a particular word, or phrase, is imperative to the success of your web site and/or your business. Hence the interest marketers have in Search Engine Optimization (SEO).

The search engines have a monopoly.  Many users will not bother to look at sites that are not listed on the first page of search results for a particular term.  Many will not even bother with sites that are not in the top 3 results.

An excellent introduction

This site has an excellent introduction to the concept of Search Engine Optimization.  I will highlight what I consider to be the key aspects of the information in that tutorial.

SEO is “politics by other means”

How you place in the search results depends in a large part upon how the search engines work.  Each has a set of secret algorithms that ultimately determine how far up your site falls in the search results for any search term.  However, each search engine also regularly modifies these algorithms.  So just because you are high up in the search results today doesn’t mean that you will be there tomorrow.  Large, well-funded sites will try to detect each change in the search engines’ algorithms, and will modify their own sites accordingly.

“Politics by other means” was how General von Clausewitz described war.  You should generally consider SEO to be akin to war, and should think strategically.  Given the huge number of websites on just about any topic, all vying for the attention of a finite group of potential viewers, how will your site get noticed?  Everyone in the game is battling to show up at the top of a search result for the relevant keywords, so your chances of winning any particular battle are slim.

You need to consider SEO a sustained campaign of attrition.  Unless your site is very niche-oriented, and involves very obscure keywords, a one-time shock-and-awe marketing strategy may work for you at first, but you will slowly slide down in the search results as the search algorithms evolve, and as the other players in the game indefatiguably try to climb up to the top, pushing you down along the way.

It’s all about semantics

At a high level, the key to SEO is to make what your site is about clear to the search engines.  If your site is about cars, but you don’t use the word “car” in any headings or titles of pages, you will not be making a search engine’s job easy.

The search engines should be able to discover the main themes of your site automatically by crawling through the code of your site, seeing what other sites link to your site, seeing where your site links, and detecting the main words you use for things like the titles of pages, headings, and the text used in links.

So here are some very general but easy-to-implement tips:

  • inbound links: make partnerships, or friendships, with other sites and get them to link to your site.  You can even buy them.  The more thematically related the linking site is to your site, the better.  And ask them nicely to make the copy in the link text meaningful in some way to the content of your site.
  • outbound links: don’t be afraid to link to other related sites.  You want to show the search engines that you are part of the community of sites related to a specific topic.
  • picking keywords: if your site is about animals, you will need to come up with alternative keywords to use.  There are so many sites about animals that you will never make it to the top of the search results by optimizing for the word, “animals”.  Find variations or more specific keywords to use instead.
  • keyword density: if your site is about porpoise feeding habits, be sure to use the phrase “porpose feeding habits” in as many places in your content as possible.
  • meaningful page titles: If your site is about mold colonies, put the words “mold colonies”, or related words, in the <title> tag of every page
  • meaningful page headings:  Make sure to use the word “cultural perspectives on aging”, or related keywords and phrases in the <h1> – <h6> tags on your pages, if your site is about the cultural perspectives of the aging process.
  • meaningful link copy:  If your page about the health benefits of flax seed oil links to a page about bio-diesel car engines, put the words “flax seed oil will make your bio-diesel engine run quicker” somewhere in the link copy.  Of course, I’m being facetious, but you need to find creative ways to throw in the major keywords anywhere possible, even in the text you use for links.
  • semantic tags: use XHTML tags for what they were meant to be used for – don’t try to game the system (for now).  Use <h1> – <h6> tags for things that are truly headings of the content of your pages.  Use <p> tags for paragraphs, <th> tags for table headings, surround important words with the <strong> tag, use <label> tags for labels, etc.
  • don’t bury the content: use as few XHTML tags as possible to get the job done.  If you wrap <h1> tags within <divs> within <divs> within <divs> within <divs>, the search engine spider may give up trying to get to the real content of your page as it drills down through all the levels of your code.  Of course, efficient use of XHTML and CSS code comes with practice.
  • use meaningful URLs: if you feel comfortable with mod_rewrite and .htaccess files, convert your URLs to be semantically meaningful. For example, a page about artichoke recipes that has a URL like http://onepotcooking.com/recipes/artichokes is much more search engine friendly than http://onepotcooking.com/spring2009/class12/assigment6/recipes.php?cat=12
  • use <meta> tags in the <head> section of your document to explicitly include a description and keywords of your site.  Most search engines will actually ignore these when indexing your site, but it doesn’t hurt.

As you can see, there are some very practical things you can do to make your site more likely to be noticed by search engines.  How much you sacrifice in terms of design and creativity in order to appease the search engine gods is up to you and your specific needs.

More information

There are dozens of books available about this topic, and any of them will go into more detail about exactly what the differences are between the different search engines.  But each of them will most likely be focused at a high level on these fundamental concepts.

Furthermore, a simple search with the keywords, “search engine optimization” will bring up thousands of pages, blogs, message boards, and sites devoted to the topic.  Feel free to pick one from the top of the list.

Where Am I?

You are currently browsing entries tagged with seo at Web Development Intensive.