Class 11 – Intro to Privacy on the Web

May 1st, 2010 § 0

Despite a very vocal minority of concerned citizens, privacy does not seem to be anywhere near as big an issue in the news as it could potentially be.

You should assume that just about everything you do online can be tracked and traced, if someone were to put the effort into doing so.  And some people are putting in that effort.

Children’s Privacy & COPPA Compliance

A topic that has received some attention is children’s privacy.  The Children’s Online Privacy Protection Act of 1998 (COPPA) defines a set of compliance guidelines for sites that collect personal information from children under the age of 13.

The act itself is a short read.  In summary, it declares that websites dealing with children’s information must do their best to obtain parental consent before storing any personally identifiable information or communicating directly to children.  Parents of children must also be allowed to request a copy of all the information the site has stored about their children and request that the data be deleted and no further data be collected on their children.  The website must disclose how they are using that information, whether they are using it for direct marketing, prize giving in competitions, providing it to third parties, etc.

In practice, parental consent is often obtained by putting a checkbox on the page that could easily be clicked by someone other than the parent.  Sometimes, the parent’s email address is required in order to register with the site – an email sent to the parent with a link to approve the collection of information about their children.  In general, the burden falls on the website operator to do their best to be compliant with COPPA.  Each site, if it runs into legal problems, is evaluated on a case-by-case basis.

Network Eavesdropping

Like all telecommunications, the Internet holds a risk that your communication will be intercepted while en route between you and the intended counter-party, and the data that you assumed was private will be picked up by a third party, be that the government, a hacker, a neighbor, or an employer.

When you visit a website, the data packets that constitute your client request and the server’s response pass through a variety of network nodes on the way to get to their intended destination.

Wi-fi vs. Wired

The first vector of transmission in a typical home or office setup may be between your computer and a router.  If you are using a wireless router, your radio transmitter is broadcasting data to anyone within your router’s transmission radius, which can be quite large.  Even if you are using an encrypted connection to your wireless router, such as WEP or WPA, a hacker with very little skill will be able to crack your encryption system using free software readily availble online (AirSnort, AirCrack, WEPCrack, Ethereal, etc).

With a wired connection to a router, the hacker would have to have access to tap into the actual wires involved in your connection, which reduces the risk significantly.

Your Employer

If you use the Internet at work, your employer has legal right to view emails you send using their email system.  They also have the right to track which websites you visit using their network.  Your employer may or may not choose to exercise that right.

Your employer no doubt knows your identity, so they are able to link your Intenet usage to your personal identity without problem.

Your Internet Service Provider

At home or at work, you probably pay an Internet Service Provider (ISP) to provide you with Internet service.  When you stop paying for service, they cut it off.  When you profusely download illegal copies of movies, your ISP may send you a warning that you must stop doing so or face the legal consequences.

They are able to do all this because all your internet traffic goes through network nodes controlled by the ISP.  They are the gateway through which all your internet data passes. And the network connection your computer uses has a unique identifier called an IP address, so they know it’s you and not someone else. The ISP may be (and undoubtedly is to some extent) analyzing your Internet usage.

In order to sign up for service, you have supplied your name, address, phone number, credit card number, and other personally identifiable information to your ISP in order to set up your account.  So they are able to tie your IP address and thus your Internet usage to your identity without difficulty.

Your IP address is not just seen by your ISP.  Your unique IP address is supplied as a header along with every HTTP request you make on the web.  It is standard operating procedure for most sites to log IP addresses of their visitors.

Email

When you send an email, your email program and the recipient’s email program both have copies of that email.  If you are both using email programs on your own computers, and the email does not pass through a webmail service, then the main risk to your privacy involves potential interception of that email while it is in transit between your computer and the recipient’s computer.

If, however, either you or the recipient uses a webmail system, those copies of your email reside on servers owned and operated by some other entity, such as Google, Yahoo, a university, and employer, etc.  So your email is only as private as the poorest privacy policy of either your or the recipient’s email service provider.

Search & Single sign-on

If you use webmail services provided by companies that additionally offer other services, such as search or other online services, logging in to your webmail account also logs you in to the other services.  So if you search, as a logged in user, your search queries are being tied to your email address, which are most likely tied to other aspects of your online and personal identities.

For example, if you use Gmail, your email account and the contents of all the emails inside of it are tied to the blogs you read in your Google Reader Account, all Google searches you’ve done while logged in (and probably some while logged out), and any other behavior or usage you perform with any other Google service, such as Google Maps, Google Wave, Blogger, Youtube, the ads you click on that are operated by Doubleclick, the content of any websites you operate that use Google Analytics to track usage or Google AdSense to server advertising, etc.  Google has such a reach across the web due to its advertising that most likely they have a large data set about your behavior online.

If you’re wondering whether Google is obligated to keep any of this information private and to remove your personally identifiable information if they choose to analyze it, you should probably read their Privacy Policy.

Privacy Policies, as you probably have experienced, change frequently.  What is written there today may be gone tomorrow.  We have all received mail from our banks indicating changes to our policies.  Social networks, webmail applications, and other online services do the same.  As their business and legal needs change, so do their Privacy Policies.  You may receive a letter in the mail indicating this, or a screen that pops up on their website, or some more subtle indicator that something has changed.  Usually, you implicitly agree to their new terms by either clicking a button or closing the window.

So just because a site promises to keep your data private today doesn’t mean that it will always be so.  How vigilant you want to be about these policies is up to you, obviously.

Social networking

It goes without saying that social networking sites collect personally identifiable data.  That is their primary business.  All actions you take on any major social networking site, such as Facebook, MySpace, LinkedIn, Twitter, and others are logged for later analysis.  Whether or not these sites are obligated to maintain the privacy of these records is of course regulated by their Terms of Service and Privacy Policies, which almost nobody reads.

Facebook, in particular, has a strong advertising revenue model.  Like a mini-Google, they target ads directly to the individual viewer so the viewer is more likely to find any given ad relevant and click it.  This is done by profiling users, analyzing their likes and dislikes, and predicting what sorts of products and services they may be interested in.

Like other ad-based online services, they collect as much behavioral data as possible.  It is not especially paranoid to assume that they could be analyzing all links posted to profiles, the content of all emails sent between users, all posts that a user has clicked to indicate they “Like” it, and all behavior gathered from third-party sites that integrate the Facebook API and the Facebook Social Plugins.

Facebook has just launched a major push to integrate its social networking features on third-party sites across the web.  Actually, this is mostly just a repackaging of something they have been doing for a while now. As Facebook content is integrated into more of third-party sites, Facebook will have more data to tie to personal accounts and analyze for potential revenue streams.

Of course, sites like Facebook are wary of breaching the trust of their users.  If users distrust the site, they will no longer use it.  For this reason, if none other, they are unlikely to expose most of the data they collect.  However, who knows how they will operate in the future.  Again, it depends on the Privacy Policy and Terms of Service legalese that nobody reads.

If and when any of these sites begin to decline in popularity, and the users start to leave anyway, social networking sites will perhaps look for ways to monetize on the information they have stored about user behavior and tendencies.  Perhaps this will include personally identifiable information, perhaps not…. better read that Privacy Policy.

References & other links

Class 11 – Brief Intro to Facebook Application Development

May 2nd, 2009 § 0

As we saw in class today, Facebook Application development is not very much different from the sort of XHTML, CSS, PHP, and MySQL development we have been covering in class.

The main difference is that in addition to data that you store and retrieve from your own database, you have access to “social graph data” that comes from the Facebook Platform.

Here is the link to the official documentation for building Facebook Apps at http://developers.facebook.com

Application setup

To initially set up an application on Facebook, you’ll need to go to your personalized developer home page at http://facebook.com/developers.  There you need to click the “Set Up New Application” button, which will bring you to a page where you fill in a few details about your application.

Set up new application button

Set up new application button

The most important two bits of information that you need to fill in are your application’s “canvas URL”, and the “callback URL”.

Canvas settings

Canvas settings

Canvas URL

The term “canvas URL” refers to the URL that your application will have on the facebook site.  For example, http://apps.facebook.com/webdevspring/

Callback URL

The term “callback URL” refers to the actual location of your application if you were to access it directly in the browser.  For example, http://onepotcooking.com/amosbloomberg/spring2009/class11/facebook/.  However, users will not ever actually go to this page directly in their browser.  Instead, they will view your application as if it were on the Facebook site itself at the canvas URL.

When a user loads the canvas URL in their browser, Facebook serves as a sort of proxy.  Behind the scenes, Facebook loads the page from your callback URL, parses the code that it finds there and replaces any FBML it finds in that code with its XHTML equivalent.  Then it places the result of that parsing process into the main section of the Facebook page template.  So your page looks like it is hosted on the Facebook site, although you and I know that it is on our own server.

API Key & Secret

Once you have filled in all the required fields for the Application setup, Facebook will show a page that has your new application’s API Key and Application Secret.

API key and application secret

API key and application secret

These two bits of information are used for authentication, and are necessary for allowing your application’s code to make requests to Facebook’s Platform server.

You will need to use these two encoded strings to make a secure connection to Facebook’s servers on every page where your code interacts with the Facebook Platform.

FBML

Facebook’s proprietary markup language, FBML, is a subset of XML, like XHTML.  From the prefix, “fb:” that is prepended to every FBML tag, you should recognize that the tag names use their own XML namespace.

One of the reasons Facebook  created FBML is to make it easy for independent developers to place users’ photos and first and last names, and other bits of “social data” on a page, without Facebook having to give any random developer access to the database where that data is actually stored.  Giving independent developer’s database access would obviously be bad for security and performance of their site.

As an example of a typical use of FBML, to place a user’s profile photo on one of your application pages, you would use FBML like this somewhere embedded in your XHTML:

<fb:profile-pic uid=”112233″ size=”square” />

When Facebook parses the code from your callback URL, it will see that you are using FBML code, and it will replace this bit of code with the profile photo of user #112233, assuming there is such a user.

So when a user views the source code of any Facebook Application page, they will not see the FBML code – it will be parsed and removed by the Facebook server.

Facebook PHP Client

To get your code talking to the Facebook Platform, you will need to download the Facebook PHP Client Library, which is an object-oriented set of classes that provide some easy-to-use methods and properties for accessing Facebook user data, and some other common tasks that relate to your application interacting with their site.

You will need to include the Facebook PHP Client Library into your scripts by using the same require_once() function that we have been using to include our own files into our scripts.

Our example page

Let’s go line by line through a very simple application page that will display the profile photos of the logged-in user’s list of friends.
The first command just includes the Facebook PHP Client Library.

//include the Facebook API Client
require_once("facebook_client/facebook.php");

The next few lines set up our basic communication channel with the Facebook Platform, using the API Key and Application Secret that we got when we set up the application on Facebook.

You will recognize that we are creating an object from the Facebook class that is defined in the Facebook PHP Client we downloaded and include in this script.

//when we set up a new application on Facebook, they give us an API Key and API Secret for this app
//these will be different for each app
//we store them in variables and use these to set up communication with the Facebook API
$fb_api_key = "5a8c964c7f38f6aa53035f91133321d5";
$fb_api_secret = "00d997c03f7d342906b3aaa5da7956e5";

//INSTANTIATE FB API
//this creates a Facebook object, which we call $FB
//you can see that we are passing the API Key and Application Secret as required parameters to the Facebook class's constructor function
//this essentially authenticates a connection to the Facebook Platform and allows us to communicate with it in code
$FB = new Facebook($fb_api_key, $fb_api_secret);

So now we have an object called $FB, which is a Facebook object.  This object has all the properties and methods that Facebook has defined in their Facebook class definition.

One of those methods of the Facebook object is a method that requires the current user to be actually logged-in to the Facebook site.  We run that to make sure that user’s are logged in.

//REQUIRE USER TO BE LOGGED IN
$FB->require_login();

Then, we call a built-in method of the Facebook object that returns an array of all the user ids of the current user’s friends.  We store that array of friends’ ids in a variable called $arrFriendIds.

//GET A LIST OF THIS USER'S FRIENDS
$arrFriendIds = $FB->api_client->friends_get();

If you wanted to see the raw data that is contained within the Facebook object, you could, of course, use print_r() to output the contents of the $FB object.

//print_r($arrFriendIds);

In our example, we then use the data we have gathered from the $FB object and dislpay it using XHTMl and FBML:

<div class="container">
  <h1>Welcome, <fb:name uid="<?= $FB->user ?>" useyou="false" /></h1>
  <p>Here are your friends</p>
  <div id="friend_container">
    <?php foreach ($arrFriendIds as $friendId) : ?>
      <fb:profile-pic uid="<?= $friendId ?>" size="square" />  
    <?php endforeach ?>
  </div>
</div>

The Facebook property, $FB->user always holds the user id of the current user.  You can see that I have highlighted PHP code in green, and FBML code in red, to easily identify how each is being used in this simple example.
Notice that this XHTML & FBML code is just a code “snippet”, not a full XHTML document.  This is because Facebook will take this code and stick it inside their own XHTML document, so we should not redefine the <head>, <body>, or other basic tags that they will be creating for us on the page that shows our application.
We wrap the XHTML & FBML snippet inside of a div with id=”container”, just as we would on any other web page for the same reasons: this makes our part of the page easier to style and the layout easier to manage.

Conclusion

Obviously, this is just the beginning of developing for Facebook.  The interesting part of developing applications occurs when you combine the data that Facebook gives you through its Platform APIs with the user-generated content that you store in your own database.

For example, you could port your blog assignments to become Facebook applications by changing your code so that every time a user makes a post, you are storing the post along with their Facebook user id, found in the $FB->user property, not the user id that you automatically assigned to users when you made your stand-along blog site.

You would not have to perform any user authentication (i.e. registration or login), since Facebook would do all that for you when a user signs up for their site, so you would not need a “users” table at all in your database.  All user information is obtained through the Facebook Platform, and your database just stores everything else except that.

As another example, our earlier homemade social network example would be totally redundant if you ported it over to Facebook, since all “friend” information for Facebook apps is handled by the Facebook Platform, not by your own code.  So you wouldn’t have to keep track in your database of who is friends with whom.  You could leave those tasks to Facebook and concentrate on building out more interesting and compelling functionality on top of that.

You will be surprised at how many apps are just glorified message boards that store data in much the same way as some of your previous assignments.

For further reading, I recommend exploring the documentation linked to from the “Get Started” section of the Facebook Developers site.

Where Am I?

You are currently browsing the facebook category at Web Development Intensive.