Class 11 – Intro to Privacy on the Web

Posted: May 1st, 2010 | Author: | Filed under: facebook, privacy, security | Tags: | No Comments »

Despite a very vocal minority of concerned citizens, privacy does not seem to be anywhere near as big an issue in the news as it could potentially be.

You should assume that just about everything you do online can be tracked and traced, if someone were to put the effort into doing so.  And some people are putting in that effort.

Children’s Privacy & COPPA Compliance

A topic that has received some attention is children’s privacy.  The Children’s Online Privacy Protection Act of 1998 (COPPA) defines a set of compliance guidelines for sites that collect personal information from children under the age of 13.

The act itself is a short read.  In summary, it declares that websites dealing with children’s information must do their best to obtain parental consent before storing any personally identifiable information or communicating directly to children.  Parents of children must also be allowed to request a copy of all the information the site has stored about their children and request that the data be deleted and no further data be collected on their children.  The website must disclose how they are using that information, whether they are using it for direct marketing, prize giving in competitions, providing it to third parties, etc.

In practice, parental consent is often obtained by putting a checkbox on the page that could easily be clicked by someone other than the parent.  Sometimes, the parent’s email address is required in order to register with the site – an email sent to the parent with a link to approve the collection of information about their children.  In general, the burden falls on the website operator to do their best to be compliant with COPPA.  Each site, if it runs into legal problems, is evaluated on a case-by-case basis.

Network Eavesdropping

Like all telecommunications, the Internet holds a risk that your communication will be intercepted while en route between you and the intended counter-party, and the data that you assumed was private will be picked up by a third party, be that the government, a hacker, a neighbor, or an employer.

When you visit a website, the data packets that constitute your client request and the server’s response pass through a variety of network nodes on the way to get to their intended destination.

Wi-fi vs. Wired

The first vector of transmission in a typical home or office setup may be between your computer and a router.  If you are using a wireless router, your radio transmitter is broadcasting data to anyone within your router’s transmission radius, which can be quite large.  Even if you are using an encrypted connection to your wireless router, such as WEP or WPA, a hacker with very little skill will be able to crack your encryption system using free software readily availble online (AirSnort, AirCrack, WEPCrack, Ethereal, etc).

With a wired connection to a router, the hacker would have to have access to tap into the actual wires involved in your connection, which reduces the risk significantly.

Your Employer

If you use the Internet at work, your employer has legal right to view emails you send using their email system.  They also have the right to track which websites you visit using their network.  Your employer may or may not choose to exercise that right.

Your employer no doubt knows your identity, so they are able to link your Intenet usage to your personal identity without problem.

Your Internet Service Provider

At home or at work, you probably pay an Internet Service Provider (ISP) to provide you with Internet service.  When you stop paying for service, they cut it off.  When you profusely download illegal copies of movies, your ISP may send you a warning that you must stop doing so or face the legal consequences.

They are able to do all this because all your internet traffic goes through network nodes controlled by the ISP.  They are the gateway through which all your internet data passes. And the network connection your computer uses has a unique identifier called an IP address, so they know it’s you and not someone else. The ISP may be (and undoubtedly is to some extent) analyzing your Internet usage.

In order to sign up for service, you have supplied your name, address, phone number, credit card number, and other personally identifiable information to your ISP in order to set up your account.  So they are able to tie your IP address and thus your Internet usage to your identity without difficulty.

Your IP address is not just seen by your ISP.  Your unique IP address is supplied as a header along with every HTTP request you make on the web.  It is standard operating procedure for most sites to log IP addresses of their visitors.

Email

When you send an email, your email program and the recipient’s email program both have copies of that email.  If you are both using email programs on your own computers, and the email does not pass through a webmail service, then the main risk to your privacy involves potential interception of that email while it is in transit between your computer and the recipient’s computer.

If, however, either you or the recipient uses a webmail system, those copies of your email reside on servers owned and operated by some other entity, such as Google, Yahoo, a university, and employer, etc.  So your email is only as private as the poorest privacy policy of either your or the recipient’s email service provider.

Search & Single sign-on

If you use webmail services provided by companies that additionally offer other services, such as search or other online services, logging in to your webmail account also logs you in to the other services.  So if you search, as a logged in user, your search queries are being tied to your email address, which are most likely tied to other aspects of your online and personal identities.

For example, if you use Gmail, your email account and the contents of all the emails inside of it are tied to the blogs you read in your Google Reader Account, all Google searches you’ve done while logged in (and probably some while logged out), and any other behavior or usage you perform with any other Google service, such as Google Maps, Google Wave, Blogger, Youtube, the ads you click on that are operated by Doubleclick, the content of any websites you operate that use Google Analytics to track usage or Google AdSense to server advertising, etc.  Google has such a reach across the web due to its advertising that most likely they have a large data set about your behavior online.

If you’re wondering whether Google is obligated to keep any of this information private and to remove your personally identifiable information if they choose to analyze it, you should probably read their Privacy Policy.

Privacy Policies, as you probably have experienced, change frequently.  What is written there today may be gone tomorrow.  We have all received mail from our banks indicating changes to our policies.  Social networks, webmail applications, and other online services do the same.  As their business and legal needs change, so do their Privacy Policies.  You may receive a letter in the mail indicating this, or a screen that pops up on their website, or some more subtle indicator that something has changed.  Usually, you implicitly agree to their new terms by either clicking a button or closing the window.

So just because a site promises to keep your data private today doesn’t mean that it will always be so.  How vigilant you want to be about these policies is up to you, obviously.

Social networking

It goes without saying that social networking sites collect personally identifiable data.  That is their primary business.  All actions you take on any major social networking site, such as Facebook, MySpace, LinkedIn, Twitter, and others are logged for later analysis.  Whether or not these sites are obligated to maintain the privacy of these records is of course regulated by their Terms of Service and Privacy Policies, which almost nobody reads.

Facebook, in particular, has a strong advertising revenue model.  Like a mini-Google, they target ads directly to the individual viewer so the viewer is more likely to find any given ad relevant and click it.  This is done by profiling users, analyzing their likes and dislikes, and predicting what sorts of products and services they may be interested in.

Like other ad-based online services, they collect as much behavioral data as possible.  It is not especially paranoid to assume that they could be analyzing all links posted to profiles, the content of all emails sent between users, all posts that a user has clicked to indicate they “Like” it, and all behavior gathered from third-party sites that integrate the Facebook API and the Facebook Social Plugins.

Facebook has just launched a major push to integrate its social networking features on third-party sites across the web.  Actually, this is mostly just a repackaging of something they have been doing for a while now. As Facebook content is integrated into more of third-party sites, Facebook will have more data to tie to personal accounts and analyze for potential revenue streams.

Of course, sites like Facebook are wary of breaching the trust of their users.  If users distrust the site, they will no longer use it.  For this reason, if none other, they are unlikely to expose most of the data they collect.  However, who knows how they will operate in the future.  Again, it depends on the Privacy Policy and Terms of Service legalese that nobody reads.

If and when any of these sites begin to decline in popularity, and the users start to leave anyway, social networking sites will perhaps look for ways to monetize on the information they have stored about user behavior and tendencies.  Perhaps this will include personally identifiable information, perhaps not…. better read that Privacy Policy.

References & other links

Related posts:

  1. Class 11 – Intro to Security on the Web
  2. Class 11 – Brief Intro to Facebook Application Development


Leave a Reply