Author: pde

Sony Steals Feature From Your PlayStation 3
If the messages in EFF’s inbox today are anything to go by, a lot of people are upset and angry — with good reason — over Sony’s announcement that it is going to disable a feature that allows people to run GNU/Linux and other operating systems on their PlayStation 3 consoles.

Sony had included a hypervisor feature called “Other OS” on the PS3, which meant that you didn’t need to break all the DRM on the device in order to install a Linux kernel or other custom software. But on Sunday, Sony announced that the Other OS feature will be disabled in the next firmware release. In short, Sony is effectively downgrading PS3s already sold and in the hands of consumers — when you bought it, it could play games, play Blu-ray discs, and run GNU/Linux. After April 1, it’s an inferior product.

The backstory is that Sony provided the Other OS feature in order to support IBM’s Cell Project, which produced the PS3’s CPU and made it practical to use PS3 consoles as compute nodes for a scientific supercomputer. The U.S. Army did just that, buying more than 2,000 PS3s to build a supercomputer. Lots of hobbyists also made use of the Other OS feature, using it to write their own games and creatively repurpose their PS3s.

Recently, however, a hobbyist named Geohot announced that he was able to use the Other OS feature along with a bit of soldering in a manner that gave him more control over the PS3 hardware than Sony had intended. Sony responded with the “upgrade” that removes the Other OS feature.

PlayStation 3 owners aren’t technically required to upgrade their firmware. However, Sony has built a vast and sticky web of DRM restrictions that will kick in to make life miserable for anyone who declines the “upgrade”:
- It will be impossible to play PS3 games online.
- It will be impossible to play new PS3 games.
- It will be impossible to watch new Blu-ray videos.
- New Blu-ray discs could even disable the Blu-ray drive entirely if they contain an AACS Host Revocation List that affects the old firmware version.
- Videos on DTCP-IP media servers will be disabled.
So, as an owner of an affected PS3, how can you keep all the features that Sony sold you? Well, Geohot is reportedly working on custom firmware that would preserve the Other OS feature while avoiding the DRM meltdowns mentioned bove. At that point, we see if Sony will bring in lawyers brandishing the anticircumvention provisions of the Digital Millenium Copyright Act (a tactic that backfired when Sony tried it on Aibo robot dog hobbyists a few years ago).

This is just the latest example of the way in which digital rights management hurts consumers — at the end of the day, hardware that includes DRM is always silently waiting to protect someone else’s interests, at the expense of your own.
March 30, 2010
FTC to Internet Companies: Start Using SSL

HTTPS is the backbone of web security. The protocol, which is also commonly known as the Secure Sockets Layer (SSL), is what guarantees we can use the web to transmit sensitive information — financial, medical, or other — with relative confidence that it won’t be intercepted or stolen. EFF has been arguing for years that best practices demand that all sensitive data be sent exclusively over SSL.

Unfortunately, most major providers of web-based email and other sensitive web-based services do not even give their users the option of using SSL, let alone turn it on by default. As a result, countless terabytes of sensitive data are transmitted over the Internet insecurely every day, greatly contributing to online fraud, data-theft and surveillance by authoritarian regimes.

Now, the Federal Trade Commission has officially put these companies on-notice. In a speech before an FTC roundtable yesterday, outgoing FTC Commissioner Pamela Jones Harbour called on Web services services like Yahoo!, Facebook and Hotmail to start using HTTPS/SSL encryption.

Google has recently shown leadership in this space, by enabling HTTPS for Gmail, as well as making it the default behavior so that even users who don’t understand security will be protected. It’s time for other services (including Google Search!) to catch up with Gmail.

As Commissioner Harbour put it:

These vulnerabilities are easily preventable. Security needs to be a default in the cloud.

We couldn’t agree with her more.

March 18, 2010
Help EFF Research Web Browser Tracking

What fingerprints does your browser leave behind as you surf the web?

Traditionally, people assume they can prevent a website from identifying them by disabling cookies on their web browser. Unfortunately, this is not the whole story.

When you visit a website, you are allowing that site to access a lot of information about your computer’s configuration. Combined, this information can create a kind of fingerprint — a signature that could be used to identify you and your computer. But how effective would this kind of online tracking be?

EFF is running an experiment to find out. Our new website Panopticlick will anonymously log the configuration and version information from your operating system, your browser, and your plug-ins, and compare it to our database of five million other configurations. Then, it will give you a uniqueness score — letting you see how easily identifiable you might be as you surf the web.

Adding your information to our database will help EFF evaluate the capabilities of Internet tracking and advertising companies, who are already using techniques of this sort to record people’s online activities. They develop these methods in secret, and don’t always tell the world what they’ve found. But this experiment will give us more insight into the privacy risk posed by browser fingerprinting, and help web users to protect themselves.

To join the experiment:
http://panopticlick.eff.org/

To learn more about the theory behind it:
http://www.eff.org/deeplinks/2010/01/primer-information-theory-and-priva…

January 27, 2010

Browser Versions Carry 10.5 Bits of Identifying Information on Average

This is part 3 of a series of posts on user tracking on the modern web. You can also read part 1 and part 2.

Whenever you visit a web page, your browser sends a “User Agent” header to the website saying precisely which operating system and web browser you are using. This information could help distinguish Internet users from one another because these versions differ, often considerably, from person to person. We recently ran an experiment to see to what extent this information could be used to track people (for instance, if someone deletes their browser cookies, would the User Agent, alone or in combination with some other detail, be unique enough to let a site recognize them and re-create their old cookie?).

Our experiment to date has shown that the browser User Agent string usually carries 5-15 bits of identifying information (about 10.5 bits on average). That means that on average, only one person in about 1,500 (2^10.5) will have the same User Agent as you. On its own, that isn’t enough to recreate cookies and track people perfectly, but in combination with another detail like geolocation to a particular ZIP code or having an uncommon browser plugin installed, the User Agent string becomes a real privacy problem.

User Agents: An Example of Browser Characteristics Doubling As Tracking Tools

When we analyze the privacy of web users, we usually focus on user accounts, cookies, and IP addresses, because those are the usual means by which a request to a web server can be associated with other requests and/or linked back to an individual human being, computer, or local network.

Typical advice for improving your privacy as you surf the web might include blocking or deleting cookies (and supercookies), and using proxy servers or tools like Tor to hide your IP address.

It’s not intuitively obvious that a User Agent poses a similar risk to a unique tracking cookie. After all, cookies were designed, in part, to help web sites distinguish and recognize individual browsers, and User Agents weren’t. And there could be millions of people out there who use the same browser and operating system that you do. But let’s examine the matter more closely. A typical User Agent string looks something like this:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)

In fact, that was the most common user agent string among browsers visiting the EFF website during the test period: Firefox 3.5.3 running on Windows XP. Notice that the operating system and browser versions are extremely specific and that the User Agent also includes the user’s preferred language. There are a lot of things that can vary inside that string, and those variations can be used to distinguish and track people as they browse the Web.

Our Results to date on User Agent Identifiability

We ran an experiment to measure precisely how identifying the User Agent strings would have been among a 36-hour anonymized sample of requests to the EFF website. The following table shows different classes of browser, with the number of bits for best and average case User Agents within that class:

Identifying information in various classes of browsers

Browser class	Avg. identifying information	Minimum identifying information	(Least identifying user agent)
Modern Windows Desktops	10.3-11.3 bits	4.6 – 5.0 bits	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Internet Explorer	13.2-13.5 bits	6.3 – 7.2 bits	Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Firefox	8.6 – 9.4 bits	4.6 – 5.0 bits	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Chrome	7.5-8.5 bits	5.7 – 6.2 bits	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.0 (KHTML, like Gecko) Chrome/3.0.195.27 Safari/532.0
Linux	11.8-13.15 bits	6.6-7.9 bits	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.14) Gecko/2009090216 Ubuntu/9.04 (jaunty) Firefox/3.0.14
Ubuntu	9.6 – 11.7 bits	6.6 – 7.8 bits	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.14) Gecko/2009090216 Ubuntu/9.04 (jaunty) Firefox/3.0.14
Debian	13.5-15.3 bits	10.50 – 11.7 bits	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.14) Gecko/2009091010 Iceweasel/3.0.6 (Debian-3.0.6-3)
Macintosh	8.8-9.3 bits	5.8-5.8 bits	Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3
iPhone	10.8 – 11.3 bits	8.7 – 9.3 bits	Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_1 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7C144 Safari/528.16
Blackberry	14.7 – 15.5 bits	12.0 – 12.7 bits	BlackBerry9530/4.7.0.148 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105
Android	14.4 – 14.4 bits	12.2-12.4 bits	Mozilla/5.0 (Linux; U; Android 1.6; en-us; T-Mobile G1 Build/DRC83) AppleWebKit/528.5+ (KHTML, like Gecko) Version/3.1.2 Mobile Safari/525.20.1

There are several remarkable facts about this dataset. Overall, it’s amazing how identifying User Agent strings are. 10.5 bits is about one-third of the total information required to identify an Internet user.

It’s also surprising that platforms like Firefox and Ubuntu, which have lower market penetration, are on average comparable or even less identifying than Windows and Microsoft Internet Explorer, which have very large userbases and should therefore have larger crowds to hide in. Part of this may be that visitors to the EFF website are over-representative of the former groups, but it’s also clear that a large part of this is that Internet Explorer has a very high level of variation in its User Agent strings, with typical examples looking something like this:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618)

All of the different library and component versions there essentially function as partial tracking tokens.

We’ve launched a project called Panopticlick to collect a new dataset that extends this analysis from User Agents to the full browser plugin and configuration space. You can use Panopticlick to receive a uniqueness measurement for your own browser, and help EFF’s privacy research efforts at the same time!

Methodology

During September 2009, we took a 36 hour sample of anonymized requests to the eff.org web server by hashing the IP address of each request with a random salt, and throwing away the salt. We then calculated the amount of identifying information conveyed by each browser. Identifying information is measured in “bits of entropy“, and says how large a crowd the information would reveal you within. Browsers usually convey between 5 and 15 bits of identifying information, about 10.5 bits on average. 10 bits of identifying information would allow you to be picked out of a crowd of 2¹⁰, or 1024 people. 10.5 bits of information identifies can identify people from crowds of just under 1,448.

Because we did not use cookies or any other mechanism to distinguish between repeat and new visitors, each measurement of bits of identifying information lies between an upper and lower bound.1

1. One bound is based on a count in which each hashed IP address is counted for only one request; the other bound is based on treating each hit as a unique browser. In almost all cases, the true amount of identifying information pertaining to the browser should lie between these two values.

January 27, 2010

A Primer on Information Theory and Privacy
If we ask whether a fact about a person identifies that person, it turns out that the answer isn’t simply yes or no. If all I know about a person is their ZIP code, I don’t know who they are. If all I know is their date of birth, I don’t know who they are. If all I know is their gender, I don’t know who they are. But it turns out that if I know these three things about a person, I could probably deduce their identity! Each of the facts is partially identifying.

There is a mathematical quantity which allows us to measure how close a fact comes to revealing somebody’s identity uniquely. That quantity is called entropy, and it’s often measured in bits. Intuitively you can think of entropy being generalization of the number of different possibilities there are for a random variable: if there are two possibilities, there is 1 bit of entropy; if there are four possibilities, there are 2 bits of entropy, etc. Adding one more bit of entropy doubles the number of possibilities.1

Because there are around 7 billion humans on the planet, the identity of a random, unknown person contains just under 33 bits of entropy (two to the power of 33 is 8 billion). When we learn a new fact about a person, that fact reduces the entropy of their identity by a certain amount. There is a formula to say how much:

ΔS = – log₂ Pr(X=x)

Where ΔS is the reduction in entropy, measured in bits,2 and Pr(X=x) is simply the probability that the fact would be true of a random person. Let’s apply the formula to a few facts, just for fun:

Starsign: ΔS = – log₂ Pr(STARSIGN=capricorn) = – log₂ (1/12) = 3.58 bits of information
Birthday: ΔS = – log₂ Pr(DOB=2nd of January) = -log₂ (1/365) = 8.51 bits of information

Note that if you combine several facts together, you might not learn anything new; for instance, telling me someone’s starsign doesn’t tell me anything new if I already knew their birthday.3

In the examples above, each starsign and birthday was assumed to be equally likely.4 The calculation can also be applied to facts which have non-uniform likelihoods. For instance, the likelihood that an unknown person’s ZIP code is 90210 (Beverley Hills, California) is different to the likelihood that their ZIP code would be 40203 (part of Louisville, Kentucky). As of 2007, there were 21,733 people living in the 90210 area, only 452 in 40203, and around 6.625 billion on the planet.

Knowing my ZIP code is 90210: ΔS = – log₂ (21,733/6,625,000,000) = 18.21 bits
Knowing my ZIP code is 40203: ΔS = – log₂ (452/6,625,000,000) = 23.81 bits
Knowing that I live in Moscow: ΔS = -log₂ (10524400/6,625,000,000) = 9.30 bits

How much entropy is needed to identify someone?

As of 2007, identifying someone from the entire population of the planet required:

S = log₂ (1/6625000000) = 32.6 bits of information.

Conservatively, we can round that up to 33 bits.

So for instance, if we know someone’s birthday, and we know their ZIP code is 40203, we have 8.51 + 23.81 = 32.32 bits; that’s almost, but perhaps not quite, enough to know who they are: there might be a couple of people who share those characteristics. Add in their gender, that’s 33.32 bits, and we can probably say exactly who the person is.5

An Application To Web Browsers

Now, how would this paradigm apply to web browsers? It turns out that, in addition to the commonly discussed “identifying” characteristics of web browsers, like IP addresses and tracking cookies, there are more subtle differences between browsers that can be used to tell them apart.

One significant example is the User-Agent string, which contains the name, operating system and precise version number of the browser, and which is sent every web server you visit. A typical User Agent string looks something like this:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6

As you can see, there’s quite a lot of “stuff” in there. It turns out that that “stuff” is quite useful for telling different people apart on the net. In another post, we report that on average, User Agent strings contain about 10.5 bits of identifying information, meaning that if you pick a random person’s browser, only one in 1,500 other Internet users will share their User Agent string.

EFF’s Panopticlick project is a privacy research effort to measure how much identifying information is being conveyed by other browser characteristics. Visit Panopticlick to see how identifying your browser is, and to help us in our research.
1. 1. Entropy is actually a generalization of counting the number of possibilities, to account for the fact that some of the possibilities are more likely than others. You can find a pretty version of the formula here.
2. 2. This quantity is called the “self-information” or “surprisal” of the observation, because it is a measure of how “surprising” or unexpected the new piece of information is. It is really measured with respect to the random variable that is being observed (perhaps, a person’s age or where they live), and a new, reduced, entropy for their identity can be calculated in the light of this observation.
3. 3. What happens when facts are combined depends on whether the facts are independent. For instance, if you know someone’s birthday and gender, you have 8.51 + 1 = 9.51 bits of information about their identity because the probability distributions of birthday and gender are independent. But the same isn’t true for birthdays and starsigns. If I know someone’s birthday, then I already know their starsign, and being told their starsign doesn’t increase my information at all. We want to calculate the change in conditional entropy of the person’s identity on all the observed variables, and we can do that by making the probabilities for new facts conditional on all the facts we already know. Hence we see ΔS = -log₂ Probability(Gender=Female|DOB=2nd of January) = -log₂(1/2) = 1, and ΔS = -log₂ Probability(Starsign=Capricorn|DOB=2nd of January)=-log₂(1) = 0. In between cases are also possible: if I knew that someone was born in December, and then I learn that they are a Capricorn, I still gain some new bits of information, but not as much as I would have if I hadn’t known their month of birth: ΔS = -log₂ Probability(Starsign=Capricorn|month of birth=December)=-log₂ (10/31) = 1.63 bits.
4. 4. Actually, in the birthday example, we should have accounted for the possibility that someone was born on the 29th of February during a leap year, in which case ΔS =-log₂ Pr(1/365.25)
5. 5. If you’re paying close attention, you might have said, “Hey, that doesn’t sound right; sometimes there will be only one person in ZIP code 40203 who has a given birthday, in which case you don’t need gender to identify them, and it’s possible (but unlikely) that ten people in 40203 were all born on the 2nd of January. The correct way to formalize these issues would be to use the real fequency distribution of birthdays in the 40203 ZIP code.
January 27, 2010
Some Lessons from the AT&T/Facebook Switcheroo
Over the weekend, there was an odd story about people using AT&T’s wireless network trying to log in to Facebook, and suddenly finding themselves logged in to somebody else’s Facebook account. What could have caused such a strange phenomenon to occur? What does it tell us about the innards of the mobile web, and what lessons might it convey for network and application design?

Ars Technica had a good post documenting some of the possibilities, and AT&T has now made some public statements containing a few key clues about the problem. We have a few things to add.

[Warning – this post gets fairly technical]

1. Facebook. Facebook needs to start using HTTPS for everything! Without HTTPS and secure cookies, the private and sensitive information in their users’ accounts is vulnerable to being mixed up by ISPs’ proxy servers, logged, eavesdropped or pilfered by hackers.1 Google now uses HTTPS by default for every interaction with Gmail, and there’s no excuse for Facebook not to do the same.

2. AT&T. Here, the story is more complicated, but the short summary is that AT&T (and all other ISPs) really need to migrate away from using proxy and gateway servers to perform complicated software tasks.

The problem at the ISP’s end appears to have been a manifestation of an engineering hangover from WAP 1.0, which was the first attempt to bring the Web to mobile phones. WAP made a number of design decisions intended to work around the limitations of 1990s-era cell phones, including tiny storage space, limited bandwidth, and small keypads. In retrospect, some of those design decisions appear to have been unwise. A relevant example was the decision to involve the wireless carrier in website authentication. Where the normal HTTP Web stores authentication cookies on users’ computers, early versions of WAP specified that cookies should be stored on proxy servers called WAP gateways, operated by wireless carriers.2 Another practice was to try to avoid ever having to make the user type a username and password with only a numeric keypad, by circulating URLs that contained automatic authentication parameters.

It was this WAP tradition of getting ISPs intimately involved in authentication that led to a situation today where a malfunction on AT&T’s proxies could let one user log in to another’s Facebook account. This situation is bad for the privacy and security of mobile web users, and it carries some important lessons about the division of responsibility between ISPs and web and application providers.

Wherever possible, ISPs should try to avoid solving complicated problems — like web authentication — by using proxy and gateway servers on their network. Inevitably, having an extra machine in the loop raises the complexity of the solution and increases the number of possible points of failure. If this had been a problem with a website smaller than Facebook, the chances are that it would have remained undiagnosed and unfixed for much longer.

There is a lot of engineering controversy about whether it’s ever appropriate for complex application functions to be performed by proxies, gateways or transcoders operated by ISPs. One key argument is that if the ISPs pick a poor solution, or don’t all implement exactly the same thing, then developers and users will be worse off than if the ISP had done nothing at all.

Whether or not this is true in all cases, it’s clear, at the very least, that ISPs need to be extremely cautious in this space. They need to only deploy a proxy-type solution when it is certain that clients and servers can’t solve the problem for themselves. They need to be transparent: follow well-established standards, clearly document their practices, and answer technical questions promptly. Lastly, they should offer users and application providers a standardised way to opt-out of the proxies if they might cause technical or security problems.

Even as mobile phones and mobile browsers are approaching the sophistication of desktop PCs, many mobile carriers are continuing to play strange and undocumented tricks with subscribers’ data communications.

And AT&T in particular still has a way to go with respect to transparency. Their public statements indicated that they had deployed some new security measures in the wake of the Facebook affair. When we asked them what those measures were, their spokesperson’s response was:

In terms of the new security measures AT&T has put into place, due to security sensitivity, we aren’t providing specifics.

AT&T’s disappointing response is to retreat to security through obscurity. But long experience teaches that security through obscurity is usually no security at all.
1. 1. Unlike the main Facebook site, some mobile versions of Facebook do now use HTTPS for parts of the login process. But until they wrap their entire sessions in encryption and set the “secure” flag for authentication cookies, it will remain possible to eavesdrop on Facebook communications, and to perform numerous cookie hijacking and javascript injection attacks to hack into an account.
2. 2. In practice, this made cookie authentication unusable in WAP, because the way that WAP gateways were implemented and configured was insufficiently standardized, and because many developers realised that it was unacceptable to trust carriers’ gateway servers with so much of their authentication housekeeping. This meant that websites had to fall back to a practice known as “URL rewriting” or “URL decoration”, which meant adding an authentication token to every URL. In practice, this is frequently equivalent to putting the user’s password in the URL.
January 20, 2010
Gmail Takes the Lead on Email Security

Last night, Google announced that Gmail sessions will now be fully encrypted with HTTPS by default. This is excellent news — EFF congratulates Google for taking this significant step to safeguard their users’ privacy and security.

Previously, it was possible to encrypt your access to Gmail, but it required altering the default configuration. Now every Gmail user will get the benefits of encryption without needing to know that they need it.

With this development, Google has taken a clear two-step lead over its competition: other major hubs for personal communication such as Facebook, Yahoo! mail, Hotmail, and LiveJournal do not even make the use of HTTPS possible, let alone the default. A handful of smaller, specialist webmail providers do offer HTTPS, but Google is alone in bringing basic email security to the mainstream Web.

Frankly, it’s time for Facebook, Yahoo!, Microsoft, and company to raise their game. If you are using those email services, then anyone using the same local network as you can read your communications or break into your account. And that’s just not good enough.

P.S.: A great next step for Google would be to implement HTTPS for Google Search. Until that happens, the only way to get private, encrypted searches is by using a an HTTPS search engine like Ixquick or a third-party proxy to Google like ssl.scroogle.org, which requires users to trust the proxy operator. We understand that there are some latency costs to delivering search over HTTPS, and while new standards are needed to solve that problem, there’s no reason not to offer optional search encryption in the mean time.

January 13, 2010

Author: pde

User Agents: An Example of Browser Characteristics Doubling As Tracking Tools

Our Results to date on User Agent Identifiability

Identifying information in various classes of browsers

Methodology

How much entropy is needed to identify someone?

An Application To Web Browsers