Posts Tagged ‘htaccess’

Class 11 – HTTP Basic Authentication using .htaccess files

Tuesday, April 27th, 2010

Overview

HTTP is the protocol which web browsers and web servers use to communicate via client requests and server responses, respectively.  We’ve seen that the browser uses HTTP GET and POST methods to request data from the server.

HTTP also provides a very basic level of authentication which you can use to password-protect your sites or certain folders within your sites.  And Apache servers, such as our class server, make it is possible to use this authentication system by simply writing a bit of special code in a file called .htaccess.

We have previously used .htaccess files for rewriting URLs to create Fancy URLs.  The .htaccess file is a directory-specific configuration file – it can hold a variety of server settings that apply only to the folder in which you place it.  This post is about one such setting.

Password-protecting a folder

To password protect a specific folder, we will create two files: one named .htaccess and another named .htpasswd.

.htaccess holds the server instructions indicating that the folder should be password protected.  This file gets placed in the folder which you want to password protect.

.htpasswd holds the username/password combinations of users who are allowed to view the folder.  Passwords are encrypted.  This gets placed somewhere on the server where it is not accessible from the web – you don’t want people loading this file up directly in their web browsers.

The .htaccess file

The .htaccess file contains the following code.

AuthUserFile <the server path to the folder where your .htpasswd file will live>/.htpasswd
AuthGroupFile /dev/null
AuthName EnterPassword
AuthType Basic

Replace <the server path to the folder where your .htpasswd file will live> with the path to your own .htpasswd file.  Ideally this will be somewhere outside of the web root of the server.  On the class server, the web root is the folder /home/scps/onepotcooking.com/

As an aside, saying this is the “web root” means that when a user goes to http://onepotcooking.com in their browser, they will by default view the files in the folder /home/scps/onepotcooking.com.  The “server root” is /, the very topmost folder on the server.

So, if your name is George Washington, perhaps put your .htpasswd file at

/home/scps/passwords/georgewashington/.htpasswd

so it is outside of the web root, yet still somewhere you might be able to find it again if you ever went looking.

The .htpasswd file

The .htpasswd file will contain one of the following lines for each user that has access to the protected folder:

<username>:<encrypted password>

Replace <username> with the username of the user you want to give access.  And replace <encrypted password> with an encrypted password for that user.

How do you get an encrypted password?  You use one of the many websites that encrypt your .htpasswd passwords for you for free, such as this one.

So, for example, if your username is “scps” and your encrypted password is “pnzpsMNdWW6aw”, you will put the following line in your .htpasswd file:

scps:pnzpsMNdWW6aw

And you will save this .htpasswd file into the folder that you indicated in the first line of your .htaccess file.

An example

See an example here.  The username is our standard username, and the password is our standard password minus the last character.  You’ll notice that I have been naughty and put the .htpasswd file in the same folder as the .htaccess file.  On a real site you shouldn’t put it anywhere where a web browser can find it.

How it works

Here’s an overview of the steps that are happening behind the scenes to make this system work:

  1. Your client (most likely your web browser) makes a standard HTTP GET request for a password protected area of the server
  2. The server looks for any .htaccess file in the requested folder
  3. The server reads the .htaccess file and sees that the requested file or folder should be password protected
  4. The server responds to the client with an HTTP HTTP response code indicating that the requested file is password protected.
  5. The browser is built to know what to do with this response code: it pops up a dialog that the user must fill in with a username and password
  6. The user fills in the username and password and clicks submit
  7. The client sends another HTTP GET request to the server, but this time includes the login credentials as extra HTTP headers along with the request.
  8. The server again looks at the .htaccess file and sees that the requested file or folder is password protected, but this time notices that the client included the necessary login credentials along with the request
  9. The server responds to the client with the requested page
  10. The client stores the login credentials the user entered somewhere on the client machine (similar to a cookie) so that next time the page is requested, it doesn’t have to ask the user to enter them again.  The client just sends them to the server in the HTTP headers automatically.

Class 10 – Fancy URLs: Customizing Your Site’s URLs Using Mod_Rewrite

Friday, December 4th, 2009

Now that you know all the basic techniques of web development, it’s time to start thinking about aesthetics. One the most obvious aesthetic choices you can make on your site is what domain name you choose, and what you call the file names on that site. Domain names are something I can’t help you with, but the rest of the URL after the domain name, including the folder and file names, is something I can help you beautify.

This is an advanced topic, but one that can provide polish to your sites if you are comfortable with all we have covered so far.

The problem: ugly URLs

As you know, depending on what we call our files and how we use the query string to pass data from one page to another, we sometimes end up with URLs that look like this:

http://onepotcooking.com/index.php?post=19&view=rss

But you might rather have URLs that look like this:

http://onepotcooking.com/rss/post/19/

And actually, search engines sometimes prefer more descriptive URLs, so they can more easily determine what a page is about:

http://onepotcooking.com/rss_feed/why_urls_should_be_pretty.html

But you don’t want to change the structure of your folders and file names, and change the entire way you use the $_GET, $_POST, and $_REQUEST variables in PHP just to make the URLs pretty. When you’re coding the site, you’re usually thinking about functionality and getting the job done, not aesthetics.

The solution to URL woes: mod_rewrite

Apache, the most popular software used by web servers to handle the requests and responses for web pages (and the software used by our class server and most other UNIX/LINUX web servers) comes with a module called mod_rewrite that is used for creating custom URLs.

mod_rewrite lets you publish fancy URLs like:

http://onepotcooking.com/isnt_this_a_prety_url.html

But have them actually get converted internally into ugly URLs like this, without the user ever seeing it:

http://onepotcooking.com/process_something.php?id=1884&to_do=something&this_is=ugly

You will be able to use the fancy URLs for any links to your pages, but your folders, filenames, and PHP code will not have to change, so long as you use mod_rewrite correctly.

Rewriting vs. Redirecting

This process of having fancy URLs that get internally converted by the server into ugly URLs is known as URL rewriting. With a rewrite, since it only happens internally in the server, the user only ever sees the fancy URL. They will never see the ugly URL in the browser address bar.

However, the term redirect is generally used to refer to the technique where client, meaning the web browser, handles the redirecting. In the case of a client-side redirect, the user can see the final destination URL in the browser’s address bar after the redirect occurs. So they will ultimately see the ugly URL clearly in the address bar of the browser.

Another look at the client/server request/response relationship

To understand how mod_rewrite works, it’s important to understand where it fits into the whole request/response relationship. Here’s a very broad overview of the just relevant steps of what happens when a client requests a file from a server:

  • a user tries to load a web page in the browser (whether by going directly to a URL, clicking a link, submitting a form, or making an AJAX request)
  • the browser sends an HTTP request (either GET or POST) for the file to the server.
  • the server receives the request, and launches Apache’s request handler
  • Apache tries to figure out how to respond to the request
  • Apache first checks mod_rewrite settings to see if it should do any fancy processing of the URL of the file that the user is requesting
  • Then, if Apache determines that the requested file is a PHP script, it launches the PHP engine and sends any data that was passed along with the request to the PHP script that the browser requested
  • The PHP script runs and sends its output back to Apache
  • Apache sends a response to the web browser. The response contains an HTTP status code indicating some information about whether the request was processed properly or not, as well as any content that was output by the requested file, regardless of whether it’s a PHP script, HTML file, CSS file, Javascript file, or any other type of file.
  • The browser receives the response from Apache, and figures out how to display whatever content it received back from the server to the user.

As you can see, the mod_rewrite technique we will be discussing that allows sites to use fancy URLs will occur after the server has received the request from the browser, but before it has passed that request on to the PHP processor. It will be written in language that Apache can understand, not in PHP, since when it is processed, the PHP engine hasn’t even been launched yet.

Apache configuration files: httpd.conf and .htaccess

When a user requests a URL like this:

http://onepotcooking.com/spring2010/test.php

the Apache server checks two sets of configuration files to see whether it should do something fancy with that URL.

First, Apache checks its main configuration file, called httpd.conf, which is usually buried somewhere obscure in the deep recesses of the server filesystem. Httpd.conf has global settings that apply to your entire site. If you have a shared hosting plan for your site, which most of you will do, you do not have access to this file.

After it has checked httpd.conf for any relevant settings, Apache then checks the directory-specific configuration files called .htaccess, which have settings that apply only to specific folders.

With the example URL above, Apache would have to check for the existence of either of these two .htaccess files:

/.htaccess
/spring2010/.htaccess

Since the requested file is nested inside the spring2010/ folder, which is inside of the root / folder, either of those settings files could have an effect on how the request for the file is handled by the server.

We will be focusing on settings in the .htaccess files since these are the ones you will always have access to, regardless of your hosting setup. However, the same URL rewriting techniques will be applicable to settings in the httpd.conf file, with slight modifications.

How to use .htaccess files to rewrite URLs

Rather than rewrite an entire tutorial on how to rewrite URLs (which I initially started to do), there is an excellent tutorial already written which covers all the basic types of rewriting you are likely to do:

http://corz.org/serv/tricks/htaccess2.php

Note: Although I don’t think it’s clearly described on this site, all of the example code written there is meant to go into a file called “.htaccess” located in the root folder of your project. So if your project is at http://onepotcooking.com/johnhancock/final_project/, you should create an .htaccess file located at /johnhancock/final_project/.htaccess, so you can create fancy URLs like http://onepotcooking.com/johnhancock/final_project/this-is-a-fancy-url.html

In other words, fancy URLs only work at the level at which you put an .htaccess file. If you want a fancy URL like http://onepotcooking.com/this-is-a-fancy-url.html, you need to put an .htaccess file in the root folder of the server, /.htaccess.

I highly recommend you read that otherwise well-written document linked above if you wish to use fancy URLs on your own sites.

An example page

I have created a single example PHP script which can be accessed by a number of fancy URLs by taking advantage of rewriting rules found in a .htaccess file in the same folder. The PHP script just outputs whatever data was passed to it in the query string along with the GET request.

In other words, there is an .htaccess file which is allowing a variety of fancy URLs to all internally point to the same PHP script. Each URL is meant to exhibit a slightly different aspect of URL rewriting that may be useful to you. Several of them focus on passing data through the query string even though there is no query string in the fancy URL.

You will definitely want to read that tutorial linked above before going in to read the code in this example.

The direct URL to the example script is http://onepotcooking.com/examples/class10/mod_rewrite/index.php

The fancy URLs that internally rewrite to that same script are listed on that page.