Class 12 – How to Make Money on the Web

December 19th, 2009 § 0

Start-ups

The web, at barely fifteen years old, is still an open field for innovation and entrepreneurial zeal.  This is in a large-part due to the low up-front costs in starting up a web-based business.  The initial investment of many web-based businesses is relatively low, compared to traditional businesses, while the potential rewards, given the huge potential market of web visitors, are ever-increasing.  You don’t need to buy any raw materials to set up shop online – you don’t even need to be in the same country to which you sell products or services.  You just need some understanding of web development and a sharp mind for business.

Agencies

But web developers need not be entrepreneurs.  Since the late 90’s, interactive agencies have popped up left and right to bring the marketing and advertising skills perfected at ad agencies of the past into the new domain of the internet.  These businesses have transformed the originally ad-hoc process of building websites into a results-oriented industrial process.  As in any industrial manufacture, web or interactive agencies divide interactive accounts into specialized work roles, where each person plays a small part in building and promoting brands on the web.  And it’s not just agencies – any medium to large corporation will have a web department handling a variety of functions.

Careers

Here is a short list of web-related jobs that one often sees advertised at interactive agencies, corporate web departments, and web design shops:

Creative Director
Interactive Director
Product Manager
Project Manager
Web Marketer
Web Producer
Account Manager
Web Strategist
Media Planner / Media Buyer
Information Architect
Web Developer
-Front-End Developer (client side)
-Back-End Developer (server side)
Flash Developer
Flex Developer
Web Designer
Graphic Designer
Q/A Engineer

Pay

A contract web developer can make anywhere between $15-$150/hour.  On the low end, you find newly minted developers with very little experience, as well as outsourcers from countries with a low cost of living.  On the high end, you get experienced consultants with years and years of experience and a proven track record of results with many former big-name clients.

As full-time employees, web developers can again make anywhere along a wide range of salaries: from $30k to $180k.  It depends, as with anything else, on experience and knowledge of the industry, as well as the type of company you work for.  Non-profits and start-ups tend to pay less than interactive agencies, but they often attempt make up for that gap with perks such as interesting work, good benefits, idealistic projects, stock options, etc.

What to charge your first client

Not a lot.  Explain to your first few clients that you are trying to gain experience, and be fair in your pricing.   You want to be able to develop accurate estimates of how long various types of jobs would take you, and price accordingly.  But to start with, things will take you a long time, and so it may be better to price your work as a flat fee for the entire project, rather than by the hour.  Give your clients a break, so long as they are lenient with you.

Once you get comfortable with how quickly and well you are able to complete jobs, which will happen sooner than you think, start raising your prices.

Recruiters

Many technology and design recruiters and staffing firms exist to match job applicants up with companies in need of specialized help.  You should be aware that recruiters generally charge clients much more than they pay their employees.  It is not unusual for them to take a 30-50% cut.  So if you are hired by a recruiter to work a job for $60/hour, you should not be surprised to learn that the recruiter is charging the client $100/hour and keeping $40 for themselves for every hour you work.

If you use a W-4 tax form with the recruiter, where they withhold taxes on your behalf, they may provide insurance, retirement plans, and other perks similar to those of a full-time employment.  However, if you have a corp-to-corp relationship and file your income from the recruiter on a 1099 form or similar, you do not get those perks, and you may be able to use that fact to negotiate a slightly higher rate since the recruiter’s expenses in hiring you will be significantly less.

Jobs sites

There is no replacement for networking anywhere and everywhere.  However, there are some popular jobs sites as well.  Here is a list of a few I know of:
http://newyork.craigslist.org
http://hotjobs.com
http://monster.com
http://dice.com
http://simplyhired.com/
http://elance.com

Trends

E-commerce is still in its infancy.  To those of us who have ordered all our holiday presents online  for years, it may seem like e-commerce is a saturated market.  But in fact, industry analysts predict massive growth in e-commerce in the coming years as more and more businesses move onto the web and new markets open up as more and more people become web-enabled.

For years, the industry has also chattered a lot about convergence – the meeting of markets like e-commerce, mobile devices, publishing, telephony, television, and web marketing.  With the explosive growth of web-enabled phones like Blackberry, iPhone, Android devices, VoIP, Kindle, and P2PTV, the web as we previously knew it is becoming more and more intertwined with the mobile device and with other broadcast technologies.  This may explain the enthusiasm with which investors and technologists embrace otherwise inconsequential products like Twitter which seamlessly bridge the divide between web and mobile.

An interesting artifact of the lowering cost of mobile and web-based technologies is the developing world’s embrace of mobile phones, and SMS (text messages).  Since many developing countries lack the telecommunications infrastructure required for high speed land lines for internet connectivity, and given that mobile phones in many places other than the US are very cheap to begin using, more and more people are communicating and accessing information through mobile phones and SMS, which work on cellular and satellite networks.  These cell towers and satellite systems bypass the more traditional methods of accessing the internet that we in the US are accustomed to, such as DSL, cable, or T1 lines.  This difference in access may have interesting implications for the development of the web for a more global audience.

The use of online technology for education is also growing every year as more schools offer distance learning programs to attract wider student bodies.  And childhood education has embraced technologies like online resources and electronic whiteboards that allow students to interact with a teacher’s console using handheld devices as they find that under-performing students fare better when taught with technological devices.

Google

Google makes money almost entirely through advertising.  Almost every product and innovation they release is geared towards bringing more traffic to their search engine and analyzing traffic on all of their sites for metrics which can be used to sell targeted advertisements to their users.  By providing extremely userful applications like Google Maps and GMail, Google drives users towards their search engine and their advertisements while collecting data on users’ interests and habits to better refine their targeted advertisements.

Web Advertising

Web advertising is traditonally priced in one of two ways:  CPM or CPA.   CPM stands for Cost-Per-Thousand – it means the price an advertiser is willing to pay to have 1,000 people view their advertisement.  CPA stands for Cost-Per-Action, sometimes also known as Cost-Per-Acquisition, or Cost-Per-Conversion.

It used to be that advertisers paid sites like Yahoo, Microsoft, and Google on a CPM basis to run their ads.  However, in recent years, advertisers have seen a drop in the number of people who view an ad and then actually click on that ad.  The click-through rate of web advertisements has dropped so much that today, only a few tenths of a percent of people will actually click ads they see on the major web portals.  Estimates vary, but click-through rates for web ads are said to be anywhere between 0.1% to 0.3%.

So many advertisers have switched to the CPA model, where the amount they pay depends on a more concrete metric, such as how many viewers perform a particular action that results from the advertisement.  An “action” in this context could be anything, such as registering for a site, signing up to receive more information, or buying something.  The term “conversion rate” refers to the percentage of viewers of an ad who end up taking such a follow-up action.  Web marketers spend a significant amount of their time worrying about these numbers.

Targeted advertisement is all the rage, since a site that can collect the habits and preferences of a visitor can then run advertisements that are more likely to appeal to that visitor. They can therefore charge more of the advertisers since the click-through and conversion rates are likely to be higher.

Facebook & Twitter

Nobody really knows how Facebook, Twitter, or similar popular social network services will become profitable if they ever do.  But the major social networks have hundreds of thousands to hundreds of millions of users, which means they have a huge number of eyeballs looking at their sites every day.  The potential for advertising revenue to those eyeballs is great, especially considering how many users have voluntarily indicated preferences and affiliations with their favorite brands.  This may explain some social networks’ high stock valuations.  However, such companies, which seem to rely on buzz and the good-will of their users, are extremely wary of alienating those users by overrunning the sites with advertisements.  Hence you will see them trying to advertise in roundabout ways, such as Facebook’s Beacon service, or selling user data to third parties rather than using it themselves for targeted advertisements.

By offering additional services like mail, applications, photo sharing, and such, Facebook is hoping to prevent people from abandoning their site in favor of the next big thing, as happened to its social network predecessors like Friendster and to a lesser extent, MySpace.  The more effort people put into curating their online personalities and profiles, the less likely they are to abandon the site and thereby waste all that effort.  I would not be surprised to see Twitter start doing the same.

Some people argue that Facebook’s valuation may be an indicator of investors’ faith in its viability as a “Web OS”, an operating system that runs entirely on the web and upon which you can build more sophisticated web-enabled social networked software products.  Maybe the future will see Facebook transform into the next Microsoft.

Class 11 – Some links from today’s discussion

December 12th, 2009 § 0

Class 9 – Final Project Requirements

December 5th, 2009 § 1

DEADLINE

  • All final projects must be complete by the last day of class (12/19 – Class #12).
  • This means you have approximately 4 weeks to complete your projects.

REQUIREMENTS

  • projects must show your mastery of the technologies we have learned in this class: XHTML, CSS, Javascript (using JQuery), PHP, and MySQL.
  • projects must be completely information architected before you start programming
  • projects must involve at least 3 distinct web pages.
  • you are required to present your site to the class on December 19th – Class #12
  • all filenames must be all lowercase with no spaces or special characters except underscore “_”
  • all variable names in PHP and Javascript must be written in camelCase.
  • all CSS class names and IDs must be written in lowercase, with no special characters except the underscore “_” character.
  • final projects must be working and accessible on the web (not only on your client machine)
  • projects must be linked to from your blog.

GRADING

Grades will be loosely based on your ability to exhibit mastery of Information Architecture and programming techniques.

  • Information Architecture (20%)
  • Programming (50%)
  • Ability to conceptualize and realize a fully-functioning, well thought-out site (30%)

PRESENTATIONS

You will be required to present your work to the class on December 19th – Class #12.

Presentations should be no more than 10 minutes.

Questions to answer in your presentation:

  • Who are you, and what is your background?
  • What’s the purpose of your site and why did you decide to build it?
  • Show your information architecture diagrams. Why did you choose this particular navigation structure and why did you place the various bits of information where you did in the wireframes?
  • Show your completed site. What do you think works and what doesn’t?
  • Explain the basic flow of information on the site – which page links to which, and show any parts where data passed between pages.
  • Explain where you used Javascript on the site, and why.
  • What are any problems you had, solutions you arrived at (or didn’t), and any other issues you encountered when building the site that may be interesting or helpful to others.
  • Is the site a good representation of what you had originally planned?
  • What are your plans for the site (if any) once the class is over?
  • What are your plans (if any) for continuing on with web development in the future?
  • What did you or didn’t you learn in this class that you had originally wanted to learn?

Class 10 – An MVC Social Network Example

December 5th, 2009 § 0

Let’s say that we are building a social network.  What we call a social network is a site that has a bunch of users, and those users can decide to be “friends” with any other user.

You can view this example live here.

The Views

There will be four pages that the user sees:

  • Register – where new users go to register to become users
  • Login – where registered users go to login
  • Home – a page that shows a list of the logged-in user’s friends, and a list of people who are not his/her friends.
  • Friends’s Profile – a page that shows details about another user

Each “page” requires a View in order to be displayed to the user in the browser.  Anytime there is something displayed to the user, we should know that there is at least one View used to create that interface.  So we can say that there are four Views in this application.  In our example application, the files that contain the templates for these Views are:

  • views/register_view.php – the template for the Register page
  • views/login_view.php – the template for the Login page
  • views/index_view.php – the template for the Home page
  • views/profile_view.php – the template for the Friend’s Profile page

The Controllers

  • We only want to let a user go to the Login page if they are not already logged in.  If they are already logged in, we need to redirect them to the Home page. Anytime we have a script that performs some logic like this, we should consider it a Controller.
  • Likewise, a user should only see the Register page if they are not already logged in.  Again, this logic is handled by a Controller.
  • Assuming a user is not already logged in, when they enter their username/password in the form on the Login page and click the submit button, there has to be some script that performs the logic to compare the username/password data the user entered in the Login page to the user data stored in the database. If the username/password matches what is found in the database, this script has to let the user in to the site.  This logic is the job of a Controller
  • Likewise, when a user fills out the Register form and clicks submit, a script has to check to make sure they entered a valid username/password, and then if everything is ok, the script has to somehow create a new row in the database that stores that username/password.  Then the user should be redirected to the Home page.  This business logic is the job of a Controller.
  • The Home page needs to check to make sure the user is logged in.  It then needs to retrieve the list of friends of the logged-in user, the list of people who are not friends of the logged-in user, and then display that data to the user.  The decision of what data to retrieve from the Model, and the job of then forwarding that data to the View which displays the interface, is the job of a Controller.
  • The Friend’s Profile Page has the same type of Controller as the Home page.  Data must be retrieved from the Model, and then that data must be properly inserted into the View for this page.  So a Controller must be present to take care of this.

The Controller scripts that I have created for this application to handle these tasks are:

  • authenticate.php – handles all tasks related to Login and Register functionality
  • index.php – handles all tasks related to viewing the Home page
  • profile.php – handles all tasks related to viewing a Friend’s Profile
  • friendship.php – handles all tasks related to adding/removing friends

The Models

As should be clear by now, Models are necessary to handle the parts of this site where direct access to the database is needed:

  • compare any username/password combo to those already stored in the database (necessary for the Login and Register pages)
  • create a new user in the database (necessary for the Register page)
  • create/delete friend associations in the database (necessary for the Home and Friend’s Profile pages)
  • get a list of a user’s friends (necessary for the Home and Friend’s Profile pages)
  • get a list of people who are not a user’s friends (necessary for the Home and Friend’s Profile pages)

The Model scripts that do these tasks are:

  • models/User.class.php – handles any tasks related to creating, reading, updating, or deleting users
  • models/Authentication.class.php – handles any tasks related to logging in or registering a user
  • models/Friendship.class.php – handles any tasks related to friendships between two users
  • models/Santize.class.php – handles any tasks related to data sanitization

To be consistent and complete, I have added the standard CRUD functions to each Model script, as well as the functions which handle each of the database-related tasks listed above.

It’s object-oriented

You can see that I have created seperate class files for each Model.  I am combining MVC architecture concepts with object-oriented programming techniques.  I have created classes for each type of “object” or “entity” that I think may conceptually need specific actions taken on it.

Object oriented programming is a seperate concept from MVC architecture.  But I have used this example to exhibit both.

It uses a home-brewed framework

You can see that I have organized my code in a specific way.  All Model files are contained in the models/ folder.  All View files are contained in the views/ folder, and all Controller files are contained in the root folder.  Javascript files would go in the scripts/ folder.  Style sheets are in the styles/ folder, and database connection info is in the dbinfo/ folder.

One of the core features of a so-called “framework” is a clear organization of the files involved in a project.  So you could call this organization that I have come up with a sort-of home-brewed framework.  It is very simple, and crude, but it is effective at helping organize our MVC object-oriented application.

The popular frameworks that PHP developers use, such as Zend, CakePHP, Symfony, and CodeIgnigter, do much more sophisitcated things than just seperate your code into folders.  So I doubt my framework will become the next big thing.  But it is useful for our purposes nonetheless.

What the user sees

It’s important to note that the users will be completely oblivious to our use of an object-oriented MVC architecture.  This is a good thing: you don’t want users to have to worry about how a site was developed.

When a user goes to the Login or Register pages, they will see the address authenticate.php in the browser address bar.  This is the Controller script that we know handles all tasks related to logging in and registering.  This controller figures out whether the user wants to see the Login Page or the Register Page, and loads up the appropriate View for either page.

When a user goes to the Home Page, they see the address, index.php.  This, as we know, is the Controller file for the tasks related to the Home page.  This Controller calls functions in the Model that get the data related to the Home page, and then this Controller loads up the View file for the Home page, which displays this data nicely.

Similarly, when a user goes to a Friend’s Profile page, they see the address, profile.php in the browser’s address bar.  This is the Controller file for all tasks related to viewing the Friend’s Profile page.  This script gets all the data by calling functions in the Model, and then includes the appropriate View file to display that data.

Class 10 – Favicons

December 5th, 2009 § 0

Favicons are the little icons that show up in the address bar of the web browser for some websites. For example, the favicon for blogger.com is the little orange square with the B in it.

Favicons are totally optional, but a well-designed favicon can add a slightly more sophisticated feel to your site, even though the user may not ever realize it is there.

So of course, you will want to create a favicon for your own sites. Favicons are 16 pixels wide and 16 pixels tall.  They are saved in a special .ico image format that is not natively supported by graphic design programs like Photoshop. This article has a good explanation of how to download a Photoshop plugin that lets you save favicons in .ico format using Photoshop.

If you don’t have Photoshop, you can still create favicons by using a variety of website services that convert standard .jpg, .gif, or .png image files into .ico format. For example, here is one such site.

To use your favicon once you have created it, make sure the file is named favicon.ico, and put it in the root folder of your website. Then add the following code into the <head> of your XHTML code:

<head>
  ...
  <link rel="shortcut icon" href="./favicon.ico" />
  ...
</head>

Class 10 – Fancy URLs: Customizing Your Site’s URLs Using Mod_Rewrite

December 4th, 2009 § 0

Now that you know all the basic techniques of web development, it’s time to start thinking about aesthetics. One the most obvious aesthetic choices you can make on your site is what domain name you choose, and what you call the file names on that site. Domain names are something I can’t help you with, but the rest of the URL after the domain name, including the folder and file names, is something I can help you beautify.

This is an advanced topic, but one that can provide polish to your sites if you are comfortable with all we have covered so far.

The problem: ugly URLs

As you know, depending on what we call our files and how we use the query string to pass data from one page to another, we sometimes end up with URLs that look like this:

http://onepotcooking.com/index.php?post=19&view=rss

But you might rather have URLs that look like this:

http://onepotcooking.com/rss/post/19/

And actually, search engines sometimes prefer more descriptive URLs, so they can more easily determine what a page is about:

http://onepotcooking.com/rss_feed/why_urls_should_be_pretty.html

But you don’t want to change the structure of your folders and file names, and change the entire way you use the $_GET, $_POST, and $_REQUEST variables in PHP just to make the URLs pretty. When you’re coding the site, you’re usually thinking about functionality and getting the job done, not aesthetics.

The solution to URL woes: mod_rewrite

Apache, the most popular software used by web servers to handle the requests and responses for web pages (and the software used by our class server and most other UNIX web servers) comes with a module called mod_rewrite that is used for creating custom URLs.

mod_rewrite lets you publish fancy URLs like:

http://onepotcooking.com/isnt_this_a_prety_url.html

But have them actually get converted internally into ugly URLs like this, without the user ever seeing it:

http://onepotcooking.com/process_something.php?id=1884&to_do=something&this_is=ugly

You will be able to use the fancy URLs for any links to your pages, but your folders, filenames, and PHP code will not have to change, so long as you use mod_rewrite correctly.

Rewriting vs. Redirecting

This process of having fancy URLs that get internally converted by the server into ugly URLs is known as URL rewriting. With a rewrite, since it only happens internally in the server, the user only ever sees the fancy URL. They will never see the ugly URL in the browser address bar.

However, the term redirect is generally used to refer to the technique where client, meaning the web browser, handles the redirecting. In the case of a client-side redirect, the user can see the final destination URL in the browser’s address bar after the redirect occurs. So they will ultimately see the ugly URL clearly in the address bar of the browser.

Another look at the client/server request/response relationship

To understand how mod_rewrite works, it’s important to understand where it fits into the whole request/response relationship. Here’s a very broad overview of the just relevant steps of what happens when a client requests a file from a server:

  • a user tries to load a web page in the browser (whether by going directly to a URL, clicking a link, submitting a form, or making an AJAX request)
  • the browser sends an HTTP request (either GET or POST) for the file to the server.
  • the server receives the request, and launches Apache’s request handler
  • Apache tries to figure out how to respond to the request
  • Apache first checks mod_rewrite settings to see if it should do any fancy processing of the URL of the file that the user is requesting
  • Then, if Apache determines that the requested file is a PHP script, it launches the PHP engine and sends any data that was passed along with the request to the PHP script that the browser requested
  • The PHP script runs and sends its output back to Apache
  • Apache sends a response to the web browser. The response contains an HTTP status code indicating some information about whether the request was processed properly or not, as well as any content that was output by the requested file, regardless of whether it’s a PHP script, HTML file, CSS file, Javascript file, or any other type of file.
  • The browser receives the response from Apache, and figures out how to display whatever content it received back from the server to the user.

As you can see, the mod_rewrite technique we will be discussing that allows sites to use fancy URLs will occur after the server has received the request from the browser, but before it has passed that request on to the PHP processor. It will be written in language that Apache can understand, not in PHP, since when it is processed, the PHP engine hasn’t even been launched yet.

Apache configuration files: httpd.conf and .htaccess

When a user requests a URL like this:

http://onepotcooking.com/spring2009/test.php

the Apache server checks two sets of configuration files to see whether it should do something fancy with that URL.

First, Apache checks its main configuration file, called httpd.conf, which is usually buried somewhere obscure in the deep recesses of the server filesystem. Httpd.conf has global settings that apply to your entire site. If you have a shared hosting plan for your site, which most of you will do, you do not have access to this file.

After it has checked httpd.conf for any relevant settings, Apache then checks the directory-specific configuration files called .htaccess, which have settings that apply only to specific folders.

With the example URL above, Apache would have to check for the existence of either of these two .htaccess files:

/.htaccess
/spring2009/.htaccess

Since the requested file is nested inside the spring2009/ folder, which is inside of the root / folder, either of those settings files could have an effect on how the request for the file is handled by the server.

We will be focusing on settings in the .htaccess files since these are the ones you will always have access to, regardless of your hosting setup. However, the same URL rewriting techniques will be applicable to settings in the httpd.conf file, with slight modifications.

How to use .htaccess files to rewrite URLs

Rather than rewrite an entire tutorial on how to rewrite URLs (which I initially started to do), there is an excellent tutorial already written which covers all the basic types of rewriting you are likely to do:

http://corz.org/serv/tricks/htaccess2.php

Note: Although I don’t think it’s clearly described on this site, all of the example code written there is meant to go into a file called “.htaccess” located in the root folder of your project. So if your project is at http://onepotcooking.com/johnhancock/final_project/, you should create an .htaccess file located at /johnhancock/final_project/.htaccess, so you can create fancy URLs like http://onepotcooking.com/johnhancock/final_project/this-is-a-fancy-url.html

In other words, fancy URLs only work at the level at which you put an .htaccess file. If you want a fancy URL like http://onepotcooking.com/this-is-a-fancy-url.html, you need to put an .htaccess file in the root folder of the server, /.htaccess.

I highly recommend you read that otherwise well-written document linked above if you wish to use fancy URLs on your own sites.

An example page

I have created a single example PHP script which can be accessed by a number of fancy URLs by taking advantage of rewriting rules found in a .htaccess file in the same folder. The PHP script just outputs whatever data was passed to it in the query string along with the GET request.

In other words, there is an .htaccess file which is allowing a variety of fancy URLs to all internally point to the same PHP script. Each URL is meant to exhibit a slightly different aspect of URL rewriting that may be useful to you. Several of them focus on passing data through the query string even though there is no query string in the fancy URL.

You will definitely want to read that tutorial linked above before going in to read the code in this example.

The direct URL to the example script is http://onepotcooking.com/amosbloomberg/spring2009/class12/mod_rewrite/index.php

The fancy URLs that internally rewrite to that same script are:

And the following URL uses mod_rewrite to do a client-side redirect (not a rewrite):

Reminder: all the rules that allow these URLs to point to and pass data to the same index.php script are found in the .htaccess file in the same folder as the PHP script.

Class 9 – Sessions in PHP

November 23rd, 2009 § 0

In your readings, you may have come across mention of PHP Sessions. Sessions are another mechanism, in addition to the $_GET, $_POST, and $_COOKIE variables that allow you to “maintain state”, meaning to pass data from one page to another.

Session variables are just like cookies, but easier

PHP provides a set of functions that allow you to read and write session variables. The basic idea is that session variables allow you to store data for as long as the user’s session is still alive.  Generally, a session is alive as long as the user’s browser is open, just like cookies.  These session variables can be accessed from any page on the site, just like cookies.

These are variables that are stored on the server, and last for a limited amount of time. They are functionally very similar to cookies, and in fact PHP does use cookies to perform most of the tasks involved with Sessions. But PHP hides the internal details of how Sessions work, which makes your job a little bit easier.

How to use sessions in PHP

Any script that uses session variables, either to read or write them, needs to call the session_start() bult-in PHP function at the top of the script.  This is just a command to tell PHP that you want to use sessions on this page.

Once you have done that, you can create a session variable like this:

//create a session variable called "test_variable"
$_SESSION['test_variable'] = "this is the value of the test variable";

Once you have created a session variable, any other page on your site can access that variable like so:

//echo the value of the session variable called "test_variable"
echo $_SESSION['test_variable'];

Example Files

Here is an example of a script that writes a session variable, just like the example code above.

And this page reads that same variable and outputs it to the page.

Further reading

Here are some pages that cover sessions, and explain how to write PHP code to deal with them:

http://php.about.com/od/advancedphp/ss/php_sessions.htm
http://www.tizag.com/phpT/phpsessions.php
http://www.htmlgoodies.com/beyond/php/article.php/3472581
http://us3.php.net/session

Class 9 – Sanitizing User-Generated Content

November 23rd, 2009 § 0

As a general rule, any data that comes from a user is not to be trusted.  So anytime you are dealing with data that may (or may not) have originated from a user, you need to sanitize that data before doing anything else with it.  Think of it as basic web hygiene, akin to washing your hands in the restroom.  Quoting Google’s CEO, Eric Schmidt, the intenet is a “cesspool”.  None of us needed him to tell us that – it’s obvious.

Anytime your site deals with data that does not originate from your own code, you need to sanitize it before letting it touch the internal organs of your website.  When we talk about sanitizing, we’re not talking about removing bad words from the code, we’re generally talking about preventing malicious hackers from trying to break into our website by sending data to the server that may allow them to exploit faults in our code or weaknesses on the server.

User-generated content may often come from any of the following sources:

Practical sanitization

No need to get paranoid yet.  For our practical purposes, any data that you get from the $_REQUEST, $_GET, $_POST, or $_COOKIE arrays should be sanitized.

Let’s say you have code like this:

$dummyData = $_REQUEST['dummy_data'];

This is getting data from the $_REQUEST variable, which as we know is automatically populated with data from the query string in links, from form fields, or from cookies.  In other words, it’s potentially tainted.  And let’s say you are planning to store that $dummyData in a database table like so:

$myQuery = "INSERT INTO abloomberg_dummy (data) VALUES ('{$dummyData}')";$result = mysql_query($myQuery);

You absolutely must sanitize it to prevent malicious things like SQL injection attacks before you run that query.

An example

This example uses PHP code to do just that.  It uses an object-oriented Sanitize class (as in classes and objects in object-oriented programming) that I based off of another well known (but not object-oriented) script.

To use this Sanitize class in your own PHP scripts, before you do anything else:

  1. download a copy of the zip archive, unzip it, and put the file Sanitize.class.php in the folder for your project.
  2. make sure your script includes this file by using require_once(”Sanitize.class.php”);

Once you have that set up, you’re ready to use this class.  Here is an example usage:

<?php    //file: index.php    //an example of using the Sanitize class
    //include the Sanitize class into this script    require_once("Sanitize.class.php"); 

    //on a live site, you'd want to sanitize all data that you got from the user    //in otherwords, any time you use data you got frm the $_REQUEST, $_GET, $_POST, or $_COOKIE variables    //For example, if the data was coming from a form or query string in a link:    //$dirtyData = $_REQUEST['something'];

    //in this example, for simplicity, i'm just sanitizing the contents of a variable that's hardcoded    $dirtyData = "this is  a test with an HTML tag <a href='#'>click me</a>";

  /*    First choose how you want to santize the data.  The choices are:    (PS: notice that these are static properties of the Sanitize class - hence the :: symbol)

    Sanitize::HTML            //replaces any HTML tags with "HTML entities"    Sanitize::SQL             //prevents against SQL injection attacks    Sanitize::UTF8            //makes sure data is in UTF8 format    Sanitize::INT             //makes sure the data is an integer    Sanitize::FLOAT           //makes sure the data is a float (decimal)    Sanitize::LDAP            //prevets against any LDAP code    Sanitize::SYSTEM          //prevents any system commands from being run    Sanitize::PARANOID        //all of the above  */
  //set the $flags variable to be the sum of all the flags you want to use from the list above  $flags = Sanitize::HTML + Sanitize::SQL; //this example removes any HTML or SQL commands from the string
  //now pass the data and the $flags variable to the sanitize function to sanitize it  $cleanData = Sanitize::sanitize($dirtyData, $flags); //call the static method "sanitize of the Sanitize class
  //now your data is clean  echo $cleanData; //the text stored in this variable has been "sanitized"
  //you may want to "view source" in the browser to see what happened to the text?>

Understanding the Sanitize::sanitize() method

The most important part to understand is the command that actually does the sanitizing:

  $cleanData = Sanitize::sanitize($dirtyData, $flags); //call the static method "sanitize of the Sanitize class

This line calls the Sanitize::sanitize() function and passes it two arguments: the data to be sanitized, and the flags that indicate what type of sanitization you want to do.  The result of this sanitize() function is then put into the variable $cleanData, which now has the sanitized version of the data.

In this example, we have set the $flags variable to indicate that we want to remove any HTML or SQL code from the data:

  $flags = Sanitize::HTML + Sanitize::SQL; //this example removes any HTML or SQL commands from the string

We can use any combination of the available flags by adding them together.

Now that the data has been sanitized, you can safely store that data in a database without worrying about SQL injection attacks:

$myQuery = "INSERT INTO abloomberg_dummy (data) VALUES ('{$cleanData}')";$result = mysql_query($myQuery);

Or do whatever else you want with it.  But rest assured it does not have any malicious HTML or SQL code in it.

Note that since this example is object oriented, we never have to look at the source code of Sanitize.class.php.  This is abstraction at work.

Class 7 – Advanced assignment & very advanced assignment

November 7th, 2009 § 0

If you have finished the in-class assignment of creating a message board, you are ready for the advanced assignment.

The advanced assignment is to add the ability for users to upload images along with their message posts.

add_post.php

So when users add new posts, they enter in their name, the title of the post, the body of the post, as well as the image that goes along with the post.

Here’s the updated wireframe for the add_post.php page:

updated add_post.php wireframe

updated add_post.php wireframe

You will want to check out the upload file example on the server here.

You use an <input type=”file” …> tag to allow users to select a file to upload.

Make sure your form has the “enctype” attribute set to “multipart/form-data”.  This indicates to the server that it should expect to be receiving binary data for the file.

process_post.php

When the server receives the data that the user submits along with the HTTP POST request for process_post.php, it has to store the data in the database, as you have already done in the first part of this assignment.

You will want to create a new “image_path” field in your messages table in the database where you will store the path where you uploaded the image.  You can get the path where the image is uploaded by going through the file upload example code linked above.

So when you create the new row in the database table, you will be storing the author, post title, image path, and post body in the database table.

index.php

And when a user views the list of all the messages, if there is an image that has been uploaded along with a particular message, that image shows up next to the message.  If there is no image associated with a post, the layout should adjust accordingly.

Here is the updated index.php wireframe:

updated index.php wireframe

updated index.php wireframe

Very advanced assignment

If this was too easy for you, here is a very advanced assignment.  Require the users to register with your site before they can post a message.

Users should be able to view the home page whether they are registered or not.

But only registered users should be able to post a new message.  If a user has not registered, they should be redirected to the index.php page if they try to view the add_post.php or process_post.php pages.

Class 7 – Assignment

November 7th, 2009 § 0

Your assignment this class is to create a message board.  The message board allows users to view all of the messages that have been posted to the board so far.  It also allows users to post new messages.

Here is the user flow of the site:

User flow of message board site

User flow of message board site

index.php – the message list page

When the user first comes to the site, they see the main page, index.php.  This page shows them a list of all of the messages on the board in reverse chronological order.  The page reads all of the rows of data from the database table and displays them.

This page also has a link to “add a new post”.  When the user clicks that link, she is brought to add_post.php.

index.php wireframe

index.php wireframe

You will want to use code similar to what is available in the read.php example available on the server here.

Also, you will eventually want to format the dates that you retrieve from the created field of the database table, you will want to read this post about beautifying MySQL timestamps.

add_post.php – the post page

add_post.php consists of a form the user can fill out in order to post a new message.  This form has three fields: the user’s name, the post title, and the post body.  When the user clicks submit, this page makes an HTTP GET request for process_post.php, and passes along the data that the user entered into the form as part of the query string of the URL of the request.

add_post.php wireframe

add_post.php wireframe

This page can be a simple XHTML page.  It’s ok to name it add_post.php even if it just has XHTML code inside of it.

process_post.php – the process post page

The process_post.php script receives the data that was passed in the GET request to the server by using the built-in PHP $_GET or $_REQUEST variables, and enters that information as a new row in the database table.

You will want to use code that is similar to the PHP used in the create.php example available on the server here.

This script will then redirect the user back to the main page, index.php.  Check out this example of how to redirect a user from one page to another.  You will be using the built-in header() function in PHP in order to pass a special “Location” HTTP header from the server to the client that instructs it to go make a request for a different page.

For example, this code redirects a user to the nytimes.com website:

header("Location: nytimes.com"); //redirect to another page