Overview
HTTP is the protocol which web browsers and web servers use to communicate via client requests and server responses, respectively. We’ve seen that the browser uses HTTP GET and POST methods to request data from the server.
HTTP also provides a very basic level of authentication which you can use to password-protect your sites or certain folders within your sites. And Apache servers, such as our class server, make it is possible to use this authentication system by simply writing a bit of special code in a file called .htaccess.
We have previously used .htaccess files for rewriting URLs to create Fancy URLs. The .htaccess file is a directory-specific configuration file – it can hold a variety of server settings that apply only to the folder in which you place it. This post is about one such setting.
Password-protecting a folder
To password protect a specific folder, we will create two files: one named .htaccess and another named .htpasswd.
.htaccess holds the server instructions indicating that the folder should be password protected. This file gets placed in the folder which you want to password protect.
.htpasswd holds the username/password combinations of users who are allowed to view the folder. Passwords are encrypted. This gets placed somewhere on the server where it is not accessible from the web – you don’t want people loading this file up directly in their web browsers.
The .htaccess file
The .htaccess file contains the following code.
AuthUserFile <the server path to the folder where your .htpasswd file will live>/.htpasswd
AuthGroupFile /dev/null
AuthName EnterPassword
AuthType Basic
Replace <the server path to the folder where your .htpasswd file will live> with the path to your own .htpasswd file. Ideally this will be somewhere outside of the web root of the server. On the class server, the web root is the folder /home/scps/onepotcooking.com/
As an aside, saying this is the “web root” means that when a user goes to http://onepotcooking.com in their browser, they will by default view the files in the folder /home/scps/onepotcooking.com. The “server root” is /, the very topmost folder on the server.
So, if your name is George Washington, perhaps put your .htpasswd file at
/home/scps/passwords/georgewashington/.htpasswd
so it is outside of the web root, yet still somewhere you might be able to find it again if you ever went looking.
The .htpasswd file
The .htpasswd file will contain one of the following lines for each user that has access to the protected folder:
<username>:<encrypted password>
Replace <username> with the username of the user you want to give access. And replace <encrypted password> with an encrypted password for that user.
How do you get an encrypted password? You use one of the many websites that encrypt your .htpasswd passwords for you for free, such as this one.
So, for example, if your username is “scps” and your encrypted password is “pnzpsMNdWW6aw”, you will put the following line in your .htpasswd file:
scps:pnzpsMNdWW6aw
And you will save this .htpasswd file into the folder that you indicated in the first line of your .htaccess file.
An example
See an example here. The username is our standard username, and the password is our standard password minus the last character. You’ll notice that I have been naughty and put the .htpasswd file in the same folder as the .htaccess file. On a real site you shouldn’t put it anywhere where a web browser can find it.
How it works
Here’s an overview of the steps that are happening behind the scenes to make this system work:
- Your client (most likely your web browser) makes a standard HTTP GET request for a password protected area of the server
- The server looks for any .htaccess file in the requested folder
- The server reads the .htaccess file and sees that the requested file or folder should be password protected
- The server responds to the client with an HTTP HTTP response code indicating that the requested file is password protected.
- The browser is built to know what to do with this response code: it pops up a dialog that the user must fill in with a username and password
- The user fills in the username and password and clicks submit
- The client sends another HTTP GET request to the server, but this time includes the login credentials as extra HTTP headers along with the request.
- The server again looks at the .htaccess file and sees that the requested file or folder is password protected, but this time notices that the client included the necessary login credentials along with the request
- The server responds to the client with the requested page
- The client stores the login credentials the user entered somewhere on the client machine (similar to a cookie) so that next time the page is requested, it doesn’t have to ask the user to enter them again. The client just sends them to the server in the HTTP headers automatically.
Now that you know all the basic techniques of web development, it’s time to start thinking about aesthetics. One the most obvious aesthetic choices you can make on your site is what domain name you choose, and what you call the file names on that site. Domain names are something I can’t help you with, but the rest of the URL after the domain name, including the folder and file names, is something I can help you beautify.
This is an advanced topic, but one that can provide polish to your sites if you are comfortable with all we have covered so far.
The problem: ugly URLs
As you know, depending on what we call our files and how we use the query string to pass data from one page to another, we sometimes end up with URLs that look like this:
http://onepotcooking.com/index.php?post=19&view=rss
But you might rather have URLs that look like this:
http://onepotcooking.com/rss/post/19/
And actually, search engines sometimes prefer more descriptive URLs, so they can more easily determine what a page is about:
http://onepotcooking.com/rss_feed/why_urls_should_be_pretty.html
But you don’t want to change the structure of your folders and file names, and change the entire way you use the $_GET, $_POST, and $_REQUEST variables in PHP just to make the URLs pretty. When you’re coding the site, you’re usually thinking about functionality and getting the job done, not aesthetics.
The solution to URL woes: mod_rewrite
Apache, the most popular software used by web servers to handle the requests and responses for web pages (and the software used by our class server and most other UNIX web servers) comes with a module called mod_rewrite that is used for creating custom URLs.
mod_rewrite lets you publish fancy URLs like:
http://onepotcooking.com/isnt_this_a_prety_url.html
But have them actually get converted internally into ugly URLs like this, without the user ever seeing it:
http://onepotcooking.com/process_something.php?id=1884&to_do=something&this_is=ugly
You will be able to use the fancy URLs for any links to your pages, but your folders, filenames, and PHP code will not have to change, so long as you use mod_rewrite correctly.
Rewriting vs. Redirecting
This process of having fancy URLs that get internally converted by the server into ugly URLs is known as URL rewriting. With a rewrite, since it only happens internally in the server, the user only ever sees the fancy URL. They will never see the ugly URL in the browser address bar.
However, the term redirect is generally used to refer to the technique where client, meaning the web browser, handles the redirecting. In the case of a client-side redirect, the user can see the final destination URL in the browser’s address bar after the redirect occurs. So they will ultimately see the ugly URL clearly in the address bar of the browser.
Another look at the client/server request/response relationship
To understand how mod_rewrite works, it’s important to understand where it fits into the whole request/response relationship. Here’s a very broad overview of the just relevant steps of what happens when a client requests a file from a server:
- a user tries to load a web page in the browser (whether by going directly to a URL, clicking a link, submitting a form, or making an AJAX request)
- the browser sends an HTTP request (either GET or POST) for the file to the server.
- the server receives the request, and launches Apache’s request handler
- Apache tries to figure out how to respond to the request
- Apache first checks mod_rewrite settings to see if it should do any fancy processing of the URL of the file that the user is requesting
- Then, if Apache determines that the requested file is a PHP script, it launches the PHP engine and sends any data that was passed along with the request to the PHP script that the browser requested
- The PHP script runs and sends its output back to Apache
- Apache sends a response to the web browser. The response contains an HTTP status code indicating some information about whether the request was processed properly or not, as well as any content that was output by the requested file, regardless of whether it’s a PHP script, HTML file, CSS file, Javascript file, or any other type of file.
- The browser receives the response from Apache, and figures out how to display whatever content it received back from the server to the user.
As you can see, the mod_rewrite technique we will be discussing that allows sites to use fancy URLs will occur after the server has received the request from the browser, but before it has passed that request on to the PHP processor. It will be written in language that Apache can understand, not in PHP, since when it is processed, the PHP engine hasn’t even been launched yet.
Apache configuration files: httpd.conf and .htaccess
When a user requests a URL like this:
http://onepotcooking.com/spring2010/test.php
the Apache server checks two sets of configuration files to see whether it should do something fancy with that URL.
First, Apache checks its main configuration file, called httpd.conf, which is usually buried somewhere obscure in the deep recesses of the server filesystem. Httpd.conf has global settings that apply to your entire site. If you have a shared hosting plan for your site, which most of you will do, you do not have access to this file.
After it has checked httpd.conf for any relevant settings, Apache then checks the directory-specific configuration files called .htaccess, which have settings that apply only to specific folders.
With the example URL above, Apache would have to check for the existence of either of these two .htaccess files:
/.htaccess
/spring2010/.htaccess
Since the requested file is nested inside the spring2010/ folder, which is inside of the root / folder, either of those settings files could have an effect on how the request for the file is handled by the server.
We will be focusing on settings in the .htaccess files since these are the ones you will always have access to, regardless of your hosting setup. However, the same URL rewriting techniques will be applicable to settings in the httpd.conf file, with slight modifications.
How to use .htaccess files to rewrite URLs
Rather than rewrite an entire tutorial on how to rewrite URLs (which I initially started to do), there is an excellent tutorial already written which covers all the basic types of rewriting you are likely to do:
http://corz.org/serv/tricks/htaccess2.php
Note: Although I don’t think it’s clearly described on this site, all of the example code written there is meant to go into a file called “.htaccess” located in the root folder of your project. So if your project is at http://onepotcooking.com/johnhancock/final_project/, you should create an .htaccess file located at /johnhancock/final_project/.htaccess, so you can create fancy URLs like http://onepotcooking.com/johnhancock/final_project/this-is-a-fancy-url.html
In other words, fancy URLs only work at the level at which you put an .htaccess file. If you want a fancy URL like http://onepotcooking.com/this-is-a-fancy-url.html, you need to put an .htaccess file in the root folder of the server, /.htaccess.
I highly recommend you read that otherwise well-written document linked above if you wish to use fancy URLs on your own sites.
An example page
I have created a single example PHP script which can be accessed by a number of fancy URLs by taking advantage of rewriting rules found in a .htaccess file in the same folder. The PHP script just outputs whatever data was passed to it in the query string along with the GET request.
In other words, there is an .htaccess file which is allowing a variety of fancy URLs to all internally point to the same PHP script. Each URL is meant to exhibit a slightly different aspect of URL rewriting that may be useful to you. Several of them focus on passing data through the query string even though there is no query string in the fancy URL.
You will definitely want to read that tutorial linked above before going in to read the code in this example.
The direct URL to the example script is http://onepotcooking.com/amosbloomberg/spring2010/class10/mod_rewrite/index.php
The fancy URLs that internally rewrite to that same script are:
And the following URL uses mod_rewrite to do a client-side redirect (not a rewrite):
Reminder: all the rules that allow these URLs to point to and pass data to the same index.php script are found in the .htaccess file in the same folder as the PHP script.
You should consider the URL address of your site to be part of its design. A memorable URL, and a nicely designed favicon are probably the first two things anyone sees of your work.
Intro to Favicons, Fancy URLs, and Search Engine Optimization
To read about what a favicon is, and how to create one, click that link.
To read about what I mean by fancy URLs, and how to create them, click that link. A simpler example than those found on this link follows in this post.
Fancy URLs, meaning intuitive URLs that are easy to understand, are also important for Search Engine Optimization (SEO). Click to read more about developing your web site with SEO in mind.
An example of Fancy URLs
I will now outline a relatively simple example of creating Fancy URLs. Click to see this example in action.
By clicking that link to see this example in action, your browser will bring you to this URL:
http://onepotcooking.com/amosbloomberg/summer2009/class9/mod_rewrite/animals/
If you click one of the animal names in that file, your browser will bring you to a URL that looks like something like this:
http://onepotcooking.com/amosbloomberg/summer2009/class9/mod_rewrite/animals/15
The first thing to notice is that if you view the files in that project folder on the server, you’ll see that there is no subfolder called “animals/ ” in there. So Fancy URLs are a euphemism for Fake URLs.
The .htaccess file
The file named .htaccess in this folder contains a few rules that make this trick possible. The first rewrite rule in the file is this:
RewriteRule ^animals/$ index.php [QSA]
This says that if the browser requests the folder “animals/“, the server should respond by sending the file “index.php” to the browser instead.
The second rule looks like this:
RewriteRule ^animals/([0-9]+)$ index.php?animal_id=$1 [QSA]
This rule says that if the browser requests the folder “animals/” followed by any number, such as “animals/15“, then the server should convert that into a request for the file “index.php?animal_id=15″ instead.
As you can see, in this second rule, part of the Fancy URL has been converted into a bit of data passed via the query string along with the request for the file. This is a common trick to make it less obvious that data is being passed to the server with the request.