Class 11 – Intro to Privacy on the Web

May 1st, 2010 § 0

Despite a very vocal minority of concerned citizens, privacy does not seem to be anywhere near as big an issue in the news as it could potentially be.

You should assume that just about everything you do online can be tracked and traced, if someone were to put the effort into doing so.  And some people are putting in that effort.

Children’s Privacy & COPPA Compliance

A topic that has received some attention is children’s privacy.  The Children’s Online Privacy Protection Act of 1998 (COPPA) defines a set of compliance guidelines for sites that collect personal information from children under the age of 13.

The act itself is a short read.  In summary, it declares that websites dealing with children’s information must do their best to obtain parental consent before storing any personally identifiable information or communicating directly to children.  Parents of children must also be allowed to request a copy of all the information the site has stored about their children and request that the data be deleted and no further data be collected on their children.  The website must disclose how they are using that information, whether they are using it for direct marketing, prize giving in competitions, providing it to third parties, etc.

In practice, parental consent is often obtained by putting a checkbox on the page that could easily be clicked by someone other than the parent.  Sometimes, the parent’s email address is required in order to register with the site – an email sent to the parent with a link to approve the collection of information about their children.  In general, the burden falls on the website operator to do their best to be compliant with COPPA.  Each site, if it runs into legal problems, is evaluated on a case-by-case basis.

Network Eavesdropping

Like all telecommunications, the Internet holds a risk that your communication will be intercepted while en route between you and the intended counter-party, and the data that you assumed was private will be picked up by a third party, be that the government, a hacker, a neighbor, or an employer.

When you visit a website, the data packets that constitute your client request and the server’s response pass through a variety of network nodes on the way to get to their intended destination.

Wi-fi vs. Wired

The first vector of transmission in a typical home or office setup may be between your computer and a router.  If you are using a wireless router, your radio transmitter is broadcasting data to anyone within your router’s transmission radius, which can be quite large.  Even if you are using an encrypted connection to your wireless router, such as WEP or WPA, a hacker with very little skill will be able to crack your encryption system using free software readily availble online (AirSnort, AirCrack, WEPCrack, Ethereal, etc).

With a wired connection to a router, the hacker would have to have access to tap into the actual wires involved in your connection, which reduces the risk significantly.

Your Employer

If you use the Internet at work, your employer has legal right to view emails you send using their email system.  They also have the right to track which websites you visit using their network.  Your employer may or may not choose to exercise that right.

Your employer no doubt knows your identity, so they are able to link your Intenet usage to your personal identity without problem.

Your Internet Service Provider

At home or at work, you probably pay an Internet Service Provider (ISP) to provide you with Internet service.  When you stop paying for service, they cut it off.  When you profusely download illegal copies of movies, your ISP may send you a warning that you must stop doing so or face the legal consequences.

They are able to do all this because all your internet traffic goes through network nodes controlled by the ISP.  They are the gateway through which all your internet data passes. And the network connection your computer uses has a unique identifier called an IP address, so they know it’s you and not someone else. The ISP may be (and undoubtedly is to some extent) analyzing your Internet usage.

In order to sign up for service, you have supplied your name, address, phone number, credit card number, and other personally identifiable information to your ISP in order to set up your account.  So they are able to tie your IP address and thus your Internet usage to your identity without difficulty.

Your IP address is not just seen by your ISP.  Your unique IP address is supplied as a header along with every HTTP request you make on the web.  It is standard operating procedure for most sites to log IP addresses of their visitors.

Email

When you send an email, your email program and the recipient’s email program both have copies of that email.  If you are both using email programs on your own computers, and the email does not pass through a webmail service, then the main risk to your privacy involves potential interception of that email while it is in transit between your computer and the recipient’s computer.

If, however, either you or the recipient uses a webmail system, those copies of your email reside on servers owned and operated by some other entity, such as Google, Yahoo, a university, and employer, etc.  So your email is only as private as the poorest privacy policy of either your or the recipient’s email service provider.

Search & Single sign-on

If you use webmail services provided by companies that additionally offer other services, such as search or other online services, logging in to your webmail account also logs you in to the other services.  So if you search, as a logged in user, your search queries are being tied to your email address, which are most likely tied to other aspects of your online and personal identities.

For example, if you use Gmail, your email account and the contents of all the emails inside of it are tied to the blogs you read in your Google Reader Account, all Google searches you’ve done while logged in (and probably some while logged out), and any other behavior or usage you perform with any other Google service, such as Google Maps, Google Wave, Blogger, Youtube, the ads you click on that are operated by Doubleclick, the content of any websites you operate that use Google Analytics to track usage or Google AdSense to server advertising, etc.  Google has such a reach across the web due to its advertising that most likely they have a large data set about your behavior online.

If you’re wondering whether Google is obligated to keep any of this information private and to remove your personally identifiable information if they choose to analyze it, you should probably read their Privacy Policy.

Privacy Policies, as you probably have experienced, change frequently.  What is written there today may be gone tomorrow.  We have all received mail from our banks indicating changes to our policies.  Social networks, webmail applications, and other online services do the same.  As their business and legal needs change, so do their Privacy Policies.  You may receive a letter in the mail indicating this, or a screen that pops up on their website, or some more subtle indicator that something has changed.  Usually, you implicitly agree to their new terms by either clicking a button or closing the window.

So just because a site promises to keep your data private today doesn’t mean that it will always be so.  How vigilant you want to be about these policies is up to you, obviously.

Social networking

It goes without saying that social networking sites collect personally identifiable data.  That is their primary business.  All actions you take on any major social networking site, such as Facebook, MySpace, LinkedIn, Twitter, and others are logged for later analysis.  Whether or not these sites are obligated to maintain the privacy of these records is of course regulated by their Terms of Service and Privacy Policies, which almost nobody reads.

Facebook, in particular, has a strong advertising revenue model.  Like a mini-Google, they target ads directly to the individual viewer so the viewer is more likely to find any given ad relevant and click it.  This is done by profiling users, analyzing their likes and dislikes, and predicting what sorts of products and services they may be interested in.

Like other ad-based online services, they collect as much behavioral data as possible.  It is not especially paranoid to assume that they could be analyzing all links posted to profiles, the content of all emails sent between users, all posts that a user has clicked to indicate they “Like” it, and all behavior gathered from third-party sites that integrate the Facebook API and the Facebook Social Plugins.

Facebook has just launched a major push to integrate its social networking features on third-party sites across the web.  Actually, this is mostly just a repackaging of something they have been doing for a while now. As Facebook content is integrated into more of third-party sites, Facebook will have more data to tie to personal accounts and analyze for potential revenue streams.

Of course, sites like Facebook are wary of breaching the trust of their users.  If users distrust the site, they will no longer use it.  For this reason, if none other, they are unlikely to expose most of the data they collect.  However, who knows how they will operate in the future.  Again, it depends on the Privacy Policy and Terms of Service legalese that nobody reads.

If and when any of these sites begin to decline in popularity, and the users start to leave anyway, social networking sites will perhaps look for ways to monetize on the information they have stored about user behavior and tendencies.  Perhaps this will include personally identifiable information, perhaps not…. better read that Privacy Policy.

References & other links

Class 11 – HTTP Basic Authentication using .htaccess files

April 27th, 2010 § 0

Overview

HTTP is the protocol which web browsers and web servers use to communicate via client requests and server responses, respectively.  We’ve seen that the browser uses HTTP GET and POST methods to request data from the server.

HTTP also provides a very basic level of authentication which you can use to password-protect your sites or certain folders within your sites.  And Apache servers, such as our class server, make it is possible to use this authentication system by simply writing a bit of special code in a file called .htaccess.

We have previously used .htaccess files for rewriting URLs to create Fancy URLs.  The .htaccess file is a directory-specific configuration file – it can hold a variety of server settings that apply only to the folder in which you place it.  This post is about one such setting.

Password-protecting a folder

To password protect a specific folder, we will create two files: one named .htaccess and another named .htpasswd.

.htaccess holds the server instructions indicating that the folder should be password protected.  This file gets placed in the folder which you want to password protect.

.htpasswd holds the username/password combinations of users who are allowed to view the folder.  Passwords are encrypted.  This gets placed somewhere on the server where it is not accessible from the web – you don’t want people loading this file up directly in their web browsers.

The .htaccess file

The .htaccess file contains the following code.

AuthUserFile <the server path to the folder where your .htpasswd file will live>/.htpasswd
AuthGroupFile /dev/null
AuthName EnterPassword
AuthType Basic

Replace <the server path to the folder where your .htpasswd file will live> with the path to your own .htpasswd file.  Ideally this will be somewhere outside of the web root of the server.  On the class server, the web root is the folder /home/scps/onepotcooking.com/

As an aside, saying this is the “web root” means that when a user goes to http://onepotcooking.com in their browser, they will by default view the files in the folder /home/scps/onepotcooking.com.  The “server root” is /, the very topmost folder on the server.

So, if your name is George Washington, perhaps put your .htpasswd file at

/home/scps/passwords/georgewashington/.htpasswd

so it is outside of the web root, yet still somewhere you might be able to find it again if you ever went looking.

The .htpasswd file

The .htpasswd file will contain one of the following lines for each user that has access to the protected folder:

<username>:<encrypted password>

Replace <username> with the username of the user you want to give access.  And replace <encrypted password> with an encrypted password for that user.

How do you get an encrypted password?  You use one of the many websites that encrypt your .htpasswd passwords for you for free, such as this one.

So, for example, if your username is “scps” and your encrypted password is “pnzpsMNdWW6aw”, you will put the following line in your .htpasswd file:

scps:pnzpsMNdWW6aw

And you will save this .htpasswd file into the folder that you indicated in the first line of your .htaccess file.

An example

See an example here.  The username is our standard username, and the password is our standard password minus the last character.  You’ll notice that I have been naughty and put the .htpasswd file in the same folder as the .htaccess file.  On a real site you shouldn’t put it anywhere where a web browser can find it.

How it works

Here’s an overview of the steps that are happening behind the scenes to make this system work:

  1. Your client (most likely your web browser) makes a standard HTTP GET request for a password protected area of the server
  2. The server looks for any .htaccess file in the requested folder
  3. The server reads the .htaccess file and sees that the requested file or folder should be password protected
  4. The server responds to the client with an HTTP HTTP response code indicating that the requested file is password protected.
  5. The browser is built to know what to do with this response code: it pops up a dialog that the user must fill in with a username and password
  6. The user fills in the username and password and clicks submit
  7. The client sends another HTTP GET request to the server, but this time includes the login credentials as extra HTTP headers along with the request.
  8. The server again looks at the .htaccess file and sees that the requested file or folder is password protected, but this time notices that the client included the necessary login credentials along with the request
  9. The server responds to the client with the requested page
  10. The client stores the login credentials the user entered somewhere on the client machine (similar to a cookie) so that next time the page is requested, it doesn’t have to ask the user to enter them again.  The client just sends them to the server in the HTTP headers automatically.

Class 11 – Intro to Security on the Web

May 1st, 2009 § 0

Security risks on the web fall into 3 general categories:

  1. Server-side risks
  2. Client-side risks
  3. Network eavesdropping

Server-side risks

Every web server is a security risk.  When you publish a website, you are letting anyone in the world connect to your server and access your files, run scripts, upload files, run queries on and store data in your database. The more complicated your setup, both in terms of the server setup as well as your code setup, the more likely you are to have bugs, which in turn makes it more likely you have holes in your security. This is true not only of the code you write, but also of all the products you use to help make your web site work.  Common risks include the theft of confidential information and the installation of malicious scripts onto your servers.

A common example of something hackers will do once they compromise your server is a distributed denial of service attack (DDOS). Hackers will gain access to many insecure servers and install scripts that do nothing but make requests to a particular web server. With thousands of these scripts running concurrently on many compromised servers, a setup known as a botnet, hackers can easily create so much traffic for a website that it brings the web server to its knees and is not able to respond to all the requests. This happens all the time to the most popular sites. Usually web servers have software that detects attempted DDOS attacks and has mechanisms for blocking requests from any server that seems to be compromised in this way.

Another common attack is the SQL injection attack. Hackers will try to gain access to your database this way, and can easily steal private information, for example credit card numbers, if you are not careful. This is the primary reason why you should ALWAYS sanitize user input before using it in queries to the database. Make sure what the user has submitted does not contain any weird code in it, and that it is of the type that you expected (e.g. if it’s a phone number you expect, make sure it’s a phone number the user entered).

Client-side risks

Attackers may also target the client in a variety of ways. Each web browser runs as an application on your local client machine. This means the browser software has access to your file system and everything on it. Since the information that the browser uses to display content from the web is usually coming from servers on the web, there’s a chance that a hacker will be able to use a server to send instructions to your browser that may install malicious software, or force the client to do things like upload personal information to the hacker’s server.

Multiple layers of anti-virus software is a must on both PC and Mac for preventing malware from running your computer. Given that the web is a high-risk environment, most web browsers and email clients are thoroughly tested and can be considered secure. However, all of the major web browsers and email clients do issue security updates from time-to-time to fix security problems they find in their software.

Certain types of web applications, such as Java, ActiveX, Silverlight, Flash, Adobe PDF are not natively supported by most web browsers. This means that they must run as separate applications from the web browser (even though they show up in the web browser window), and so these technologies have their own security risks that their developers must constantly mitigate. Like browsers, these technologies are so commonly used that security risks are usually discovered quickly, and updates are sent out that patch the bugs. But bugs do exist, and hackers are always trying to find new ones. Do a search for “flash vulnerabilities” on Google, and you will see examples of exploits that hackers have created using Flash.

Phishing scams are another major client-side risk that you should be aware of. Scammers could create a website, for example, that looks exactly like Amazon.com’s checkout page, but is actually created by a hackers in Nigeria. If for some reason you find yourself on this site thinking it is Amazon.com, you may enter your credit card information, which is then used by the hackers to buy gifts for themselves (or other more nefarious things). Phishing scams are also commonly used for identity theft – the phishing sites trick users into revealing personal information which is then used to apply for credit cards, issue passports, buy weapons, etc.

Most web browsers and email clients (e.g. Microsoft Outlook, Mozilla Thunderbird, Mac Mail, etc.), and client security programs (e.g. Norton Antivirus) have ways they try to identify phishing scams. But hackers are constantly figuring out new ways of bypassing or compromising every new tool that developers create, so most software should be updated regularly to keep it secure.

Network eavesdropping

Any time a client communicates with a server, the data is physically transmitted either via electric current in a wire or via radio waves in the air. There are ways hackers can intercept either of these means of communication.

Wireless communication is notoriously insecure. Anyone with a wifi card in their laptop can easily intercept unencrypted data being passed between the wireless router and other laptops. So some people encrypt the data that is passed between the two. The thinking goes that even if someone does intercept the signal, they won’t be able to understand it since it’s encrypted. However, WEP, the most commonly used encryption protocol available on wireless routers is known to be very weak encryption. WPA2 is supposedly a bit more secure, if it is available on your router. Another way to secure your wireless network is to set up your wireless router to only accept connections from computers with particular MAC addresses. Each computer has a unique MAC address that never changes.  Most new routers will have all of these options.

Wired communication, via ethernet cable, or other types of wires, can also be intercepted by someone who plugs into the same network as either the client or the server. Since all communication between client and server shares wires that also are used by other clients and other servers, it’s not crazy to imagine that someone could find a way to intercept and listen in on your conversation.

Like wireless communication, there are methods of encrypting communication over the wires so that even if someone does intercept communications, they won’t be able to easily decipher them.

Many web servers, especially for e-commerce sites, are called “secure servers”. Secure servers use the HTTPS protocol instead of the regular HTTP, so the URL will look like https://something.com, for example. Often, the checkout pages of online stores, or any page that asks the user to enter confidential information will be hosted on a secure server.

HTTPS encrypts the communication between the client and the server using the SSL encryption protocol. So the “secure server” is actually just encrypting the network communication between client and server, not securing the server itself against server attacks. The server and the client still have the same security risks as any other client or server. As with all encryption methods, SSL (and thereby HTTPS) can be hacked – a common exploit being the man-in-the-middle attack.

Further reading

http://www.w3.org/Security/Faq/
http://www.securityfocus.com/infocus/1864
http://www.windowsecurity.com/articles/Common_Attacks.html
http://www.icir.org/vern/cs294-28/scribe/WebClientAttacks.pdf
http://www.icir.org/vern/cs294-28/syllabus.html

Where Am I?

You are currently browsing the security category at Web Development Intensive.