Posts Tagged ‘injection attack’

Class 9 – Sanitizing User-Generated Content

Monday, November 23rd, 2009

As a general rule, any data that comes from a user is not to be trusted.  So anytime you are dealing with data that may (or may not) have originated from a user, you need to sanitize that data before doing anything else with it.  Think of it as basic web hygiene, akin to washing your hands in the restroom.  Quoting Google’s CEO, Eric Schmidt, the intenet is a “cesspool”.  None of us needed him to tell us that – it’s obvious.

Anytime your site deals with data that does not originate from your own code, you need to sanitize it before letting it touch the internal organs of your website.  When we talk about sanitizing, we’re not talking about removing bad words from the code, we’re generally talking about preventing malicious hackers from trying to break into our website by sending data to the server that may allow them to exploit faults in our code or weaknesses on the server.

User-generated content may often come from any of the following sources:

Practical sanitization

No need to get paranoid yet.  For our practical purposes, any data that you get from the $_REQUEST, $_GET, $_POST, or $_COOKIE arrays should be sanitized.

Let’s say you have code like this:

$dummyData = $_REQUEST['dummy_data'];

This is getting data from the $_REQUEST variable, which as we know is automatically populated with data from the query string in links, from form fields, or from cookies.  In other words, it’s potentially tainted.  And let’s say you are planning to store that $dummyData in a database table like so:

$myQuery = "INSERT INTO abloomberg_dummy (data) VALUES ('{$dummyData}')";$result = mysql_query($myQuery);

You absolutely must sanitize it to prevent malicious things like SQL injection attacks before you run that query.

An example

This example uses PHP code to do just that.  It uses an object-oriented Sanitize class (as in classes and objects in object-oriented programming) that I based off of another well known (but not object-oriented) script.

To use this Sanitize class in your own PHP scripts, before you do anything else:

  1. download a copy of the zip archive, unzip it, and put the file Sanitize.class.php in the folder for your project.
  2. make sure your script includes this file by using require_once(“Sanitize.class.php”);

Once you have that set up, you’re ready to use this class.  Here is an example usage:

<?php
//file: index.php    //an example of using the Sanitize class

//include the Sanitize class into this script
require_once("Sanitize.class.php");

//on a live site, you'd want to sanitize all data that you got from the user
//in otherwords, any time you use data you got frm the $_REQUEST, $_GET, $_POST, or $_COOKIE variables
//For example, if the data was coming from a form or query string in a link:
//$dirtyData = $_REQUEST['something'];

//in this example, for simplicity, i'm just sanitizing the contents of a variable that's hardcoded
$dirtyData = "this is  a test with an HTML tag <a href='#'>click me</a>";

/* First choose how you want to santize the data. 
The choices are:    (PS: notice that these are static properties of the Sanitize class - hence the :: symbol
Sanitize::HTML            //replaces any HTML tags with "HTML entities"
Sanitize::SQL             //prevents against SQL injection attacks
Sanitize::UTF8            //makes sure data is in UTF8 format
Sanitize::INT             //makes sure the data is an integer
Sanitize::FLOAT           //makes sure the data is a float (decimal)
Sanitize::LDAP            //prevets against any LDAP code
Sanitize::SYSTEM          //prevents any system commands from being run
Sanitize::PARANOID        //all of the above
*/

//set the $flags variable to be the sum of all the flags you want to use from the list above
$flags = Sanitize::HTML + Sanitize::SQL; //this example removes any HTML or SQL commands from the string

//now pass the data and the $flags variable to the sanitize function to sanitize it
$cleanData = Sanitize::sanitize($dirtyData, $flags); //call the static method "sanitize of the Sanitize class

//now your data is clean
echo $cleanData; //the text stored in this variable has been "sanitized"

//you may want to "view source" in the browser to see what happened to the text

?>

Understanding the Sanitize::sanitize() method

The most important part to understand is the command that actually does the sanitizing:

  $cleanData = Sanitize::sanitize($dirtyData, $flags); //call the static method "sanitize of the Sanitize class

This line calls the Sanitize::sanitize() function and passes it two arguments: the data to be sanitized, and the flags that indicate what type of sanitization you want to do.  The result of this sanitize() function is then put into the variable $cleanData, which now has the sanitized version of the data.

In this example, we have set the $flags variable to indicate that we want to remove any HTML or SQL code from the data:

  $flags = Sanitize::HTML + Sanitize::SQL; //this example removes any HTML or SQL commands from the string

We can use any combination of the available flags by adding them together.

Now that the data has been sanitized, you can safely store that data in a database without worrying about SQL injection attacks:

$myQuery = "INSERT INTO abloomberg_dummy (data) VALUES ('{$cleanData}')";$result = mysql_query($myQuery);

Or do whatever else you want with it.  But rest assured it does not have any malicious HTML or SQL code in it.

Note that since this example is object oriented, we never have to look at the source code of Sanitize.class.php.  This is abstraction at work.