Edward Tufte forum: Not spying on users should be a feature: The keep-it-and-lose-it hypothesis

All 5 books, Edward Tufte paperback $180
All 5 clothbound books, autographed by ET $280
Visual Display of Quantitative Information

Envisioning Information

Visual Explanations

Beautiful Evidence

Seeing With Fresh Eyes
catalog + shopping cart

Edward Tufte e-books
Immediate download to any computer:
Visual and Statistical Thinking $5

The Cognitive Style of Powerpoint $5

Seeing Around + Feynman Diagrams $5

Data Analysis for Politics and Policy $9
catalog + shopping cart

New ET Book

Seeing with Fresh Eyes:
Meaning, Space, Data, Truth

catalog + shopping cart

Analyzing/Presenting Data/Information
All 5 books + 4-hour ET online video course, keyed to the 5 books.

registration page

Current Topics | All Topics

Not spying on users should be a feature: The keep-it-and-lose-it hypothesis

Two provocative reports:

news.com on Google

NY Times on Google

The New York Times report, by Randall Stross, is alas very damaging to the once-pristine image of Google.

-- Edward Tufte

Response to Search engines and personal privacy: not spying on users should be a feature

SEARCH ENGINES AND PERSONAL PRIVACY: NOT SPYING ON USERS SHOULD BE A FEATURE

These articles prompt some questions and speculations, after a definition:

"An IP address (Internet Protocol address) is a unique number, similar in concept to a telephone number, used by machines (usually computers) to refer to each other when sending information through the Internet. This allows machines passing the information onwards on behalf of the sender to know where to send it next, and for the machine receiving the information to know that it is the intended destination." Wikipedia

Is recording every search and the resulting direct links made from the search (which are known, tracked, and arcived by Google) by IP address similar to reference librarians recording the social security number of clients along with their questions?

Should a given IP address holder be able to obtain or delete their lifetime Google data file? Just think, a potted history of all your searches, where you came from to get to Google, and linkings from those searches. Some of information is apparently now available from Google, which has turned their observation of your searches into a product. What other uses might made of that product? And if you can see it, who else can? If the CEO of Google is upset about the results of a standard Google search on his name, and ithereby induced to make a public relations gaffe, imagine the excitement induced by publication or a leak of the activities of his home IP address.

Google seeks to provide access to all the world's information, and "we mean ALL" said Google's CEO, so how about access to everyone's personal search history? See Mick Jagger's Google search and link history! See Karl Rove's! See your neighbors! See your own! A new measure of Zeitgeist celebrity then results: the number of visits to see a celebrity's search history. And then who looked at whose search histories . . . in a endless echo of turning user activities into products. Why should we expect the only information that remains sacred and private be that information collected by Google about its users?

What is the market value (say, to tabloid newspapers or to an extortionist) of seeing every search request and the resulting links made by [name of celebrity, name of politician, name of whoever here]? Or the uses in commercial espionage? What about lawyers seeking evidence for a trial? Or others? Imagine a divorce lawyer, a prosecutor, a stalker, a psychobiographer, a reporter in action.

How often do government security agencies request from Google or Microsoft Search the complete search and linkage history of a given IP address? How many people at Google or Microsoft devote full time to handling information requests from governments? Phone companies have always been tightly linked to government security agencies (to the point of swapping employees), what about search-engine companies?

Surely government spy agencies the world over have long been real busy with IP addresses and their histories. Now imagine that a White House political operative obtains some information from a spy agency and leaks an opponent's IP address history to an accommodating journalist in order to smear the opponent. Just a fantasy. Nowadays in politics, "opposition research" may well involve a Google search of the opponent's IP address. Or perhaps working with a rogue employee of a search-engine company to get the really good stuff.

How do we balance off tracking down malicious internet activities and evil internet users with maintaining the privacy of the personal histories of the non-malicious? How many false positives should be tolerated to obtain one true positive? How do the malicious mask their IP addresses?

For a given active IP address, who are the chances that the data are all jumbled up with multiple users, spoofs, noise, and the like? For a given user, what are the chances their search and resulting links will tracked and archived via a distinctively identifying IP address?

Why would any organization mainly motivated by money-making ultimately defend privacy or defend any other principle other than their ability to maximze stock price, income, market share, and profits? For me, a turning point on these matters came from (1) Google's recent activities described in the New York Times article, and (2) when Microsoft, as a matter they claimed was routine policy about "host countries", blocked the use of the words "freedom," "democracy," and similar naughty words from their MSN services in China at the request of the People's government.

Instead of being turned into a commodity, should IP address logs be erased by search engine companies? Should IP's fade away from all records after their immediate use? Could IP addresses have copyright protection kicking in after their immediate intended use? Would lawsuits regulate search engine company breeches of privacy? Lawsuits, individual or class actions, will work to get the attention of search engine companies, and some of them certainly do have deep pockets.

The other side of the coin is that every site you visit has logged, perhaps permanently, your IP address. Some websites even post their logs, which show the IP address of every visitor to that website. You might try a Google search on your own IP address (put the address number in quotes).

Perhaps the only rational response to all this is to agree with Scott McNealy's remark: "You have zero privacy anyway. Get over it."

The Silicon Valley Libertarians, who ferociously protect their own privacy, find that their philosophy is relevant only to themselves not others.

TWO STRAIGHTFORWARD PROPOSALS

(1) Should Google, on their frontpage, inform users that Google tracks and archives all user IP address activities and seeks to turn those activities into commodities? Should users be given a choice about whether to opt-in to Google tracking? (Opt-in is appropriate, opt-out is evasive corporate mischief.) What proportion of current Google users, without special incentives, would opt-in to Google's current monitoring and archiving? How about 1 in 20? What proportion of current Google users even know that search engines are monitoring and archiving all their IP address activities? 1 in 5,000?

Even better, in the competition among search engines, how about if one search engine did NOT monitor their users? Not spying on users would be a feature! In such a market, users could decide what they want. Since there are few relevant differences in the search quality of the various search engines these days, here's a chance for some real product differentiation.

(2) And should users routinely mask their IP addresses?

-- Edward Tufte

Response to Search engines and personal privacy

The government doesn't need to sub peona IP logs from Google and Yahoo. The Echelon for more info) has been spying on electronic communications for some years.

-- Thaths

Response to Search engines and personal privacy

The Electronic Frontier Foundation is funding the development of a system called "Tor" to help people browse the web without revealing their IP address. The system is still considered alpha, but I use it daily, and it is quite stable. See http://tor.eff.org for more details.

They are actually holding a contest to get the best, most usable GUI (http://tor.eff.org/gui/ ), and I'm sure some of those who read and contribute here would have great ideas to put forth. As one of the judges for the contest, I'd love to have your participation.

Adam

-- Adam Shostack (email)

Response to Search engines and personal privacy

IP addresses are not reliable for tracking a user. Today's dialup Internet connections (including DSL) assign the IP number dynamicaly to users systems, assigning each logon a new IP number from a pool of numbers shared by all customers of a provider.

IP addresses of users using proxy servers are not directly visible by the server, HTTP header analysis is necessary. Systems like EFF Tor mentioned above have the target of hiding the connection and the origin of a connection on IP address level. IP addresses are only of use to reliable recognise permament connected systems, and to identify the geographic region of a given system.

Users are tracked with the help of cookies, by Google, Amazon and all the others. This works with Tor, thru firewalls, thru proxies, with dynamic IP addresses. As far the browser with his set of stored cookies remains the same, a server operator is able to track what a user is doing, and is able to log it.

This works, by using clever methods, very well for the advertising industry. They set cookies with each advertisement, and are able to track a user visiting different sites, even of competing companies, if they use the same advertising service.

Amazon makes the use of amazon.com and A9 cookies attractive by offering the PI/2 reward. This way the shopping and searching habits of the Amazon customer are linked with the A9 search habits of the same user. A9 identifies a amazon.com customer by reading the cookie set by amazon.com, and vice versa. As A9 cooperates with Google...

The 'posibilities' of this mechanism are endless, we just started to scratch the surface of. Cookies are helpful, for maintaining sessions and _identifying_ user at a web shop, but could also be used for evil.

Hix

-- hix (email)

Response to Search engines and personal privacy

The above contribution makes it clear that cookies should be ruthlessly deleted from one's own computer.

Apple's Safari browser has a first-level menu item "Reset Safari," which in one click clears out cookies, cache, Google search entries, history, saved names and passwords, other autofill text, and download links.

-- Edward Tufte

Response to Search engines and personal privacy

People should not get the impression that routinely deleting all cookies, or even routinely emptying the browser's history, cache, etc. is adequate to prevent identifying information from leaking to web sites they visit.

Systems I have designed, including Tor, separate identification from routing by anonymizing the "pipe" data flows through. Sanitizing the data is a separate issue. Sometimes one wants to give various kinds of identifying information to the visited site, e.g., if it is your physician, your employer, etc.

If one wants to keep all identifying information from the responding site, then filtering or sanitizing cookies, blocking java and javascript, and other techniques offered by browser settings and proxies such as Privoxy will help. But they will also prevent many current Web sites from functioning. And, as ever more functionality is passed through the data stream, it becomes ever harder to preserve privacy unless the functionality is designed with security and privacy in mind from the start.

-- Paul Syverson (email)

Response to Search engines and personal privacy

Better than deleting cookies is not to store them in the first place. Most browsers have settings that will turn cookies off entirely; a compromise position usually allows you to reject cookies from third-party sites (this is when you browse to a certain site, but the cookie comes from a different site). Third-party cookies are often used for tracking purposes (Webtrends, Doubleclick, hitbox, etc.).

More advanced techniques allow you to turn cookies on and off for specific sites or specific hosts/domains, which allows you to avoid many tracking cookies while still using sites which rely on cookies to work properly (notably, on-line shopping).

Personal firewall software often can be configured to inspect traffic of "known" types (unencrypted web traffic, most often) and remove personal information. This technique is usually effective, if suboptimal. Sometimes, though, it removes information you really need to transmit to use the site in question.

-- Scott Zetlan (email)

Response to Search engines and personal privacy: not spying on users should be a feature

Regarding a security/privacy seal of approval: It's been tried (e.g. TRUSTe), and it failed. Like many such certifications (SPF and Sender-ID for email, as recent examples), they are readily adopted by the least trustworthy, and the value of the seal/brand becomes diluted and/or co-opted.

Additionally, privacy and security policies can change without notice and without repercussion.

-- Chris Palmer (email)

Response to Search engines and personal privacy: not spying on users should be a feature

Frederick Mosteller, the Harvard statistician, once made an interesting argument about interventions in complex situations. Given some half reasonable assumptions, the theory claims that the best strategy is to make small improvements on many variables rather than a big improvement on one variable. This theory involves the usually suspect assumption of approximate independence of causes, but, at any rate, the idea might be helpful here. The list of approaches described above might all be helpful: a combination of a sturdy clean browser, masked IPs, going after those who trade on collecting user behavior, hoping that search engine competition leads to an engine with less surveillance, legal action, more opting-in required for surveillance, resistance to surveillance, rewards for those websites avoiding surveillance archives, and so on.

There may be inherent defect here however, since search results are largely a byproduct of seeking opportunities to sell advertising and market the users of the search engine. Could marketing-free and surveillance-free search be financed by a user fee of $10 per year?

-- Edward Tufte

Response to Search engines and personal privacy: not spying on users should be a feature

Apropos to approaching this complex issue on multiple fronts: This is certainly being done already in many of the ways you describe and others as well, but there is a long way to go in all respects. I generally favor making the technology such that it is not necessary to motivate through markets, legislation, and courts. But those will always be important as well. For a more specific analog to your point see Richard Clayton's panel presentation from Financial Cryptography 2005 "Who'd phish from the summit of Kilimanjaro?" Abstract: www.cl.cam.ac.uk/~rnc1/kilimanjaro.pdf Slides: www.cl.cam.ac.uk/~rnc1/talks/050228-Kilimanjaro.pdf

The issue of financial incentives is especially tricky in anonymity and privacy. There are several papers cited on my homepage <http:www.syverson.org> discussing this. See also the important papers by Andrew Odlyzko on privacy and price discrimination, which also draw striking analogies between the internet and 19th C railroads. See also Alessandro Acquisti's work showing the effects of myopeia and immediate gratification on the rationality of privacy decision making.

-- Paul Syverson (email)

Response to Search engines and personal privacy: not spying on users should be a feature

See John C. Dvorak: http://www.pcmag.com/article2/0,1759,1853324,00.asp?

-- Edward Tufte

Response to Search engines and personal privacy: not spying on users should be a feature

At last, phone records and customer billing records are sold online:

http://informationweek.com/story/showArticle.jhtml?articleID=170101910

The first Grand Principle of surveillance: If someone collects the information, someone will sell it.

-- Edward Tufte

Response to Search engines and personal privacy: not spying on users should be a feature

The costs of computing and storage are sufficiently low, and continue to decrease so rapidly, and the costs of searching storage so low and the algorithms for searching and analyzing it improving so rapidly, that you can't expect information to be erased, lost, hidden, or even undiscovered, unless you make a concerted effort to do so (or unless it's information you actually wanted and Murphy's Law overrode Moore's Law for you.) David Brin's book "The Transparent Society" argues that this is also happening to video cameras and video recording. Even ephemeral networks tend to become recorded and searchable - some of my rants on Usenet in the early 1980s are still around.

The Cypherpunks movement spent a lot of time looking at how to protect privacy and civil liberties in this environment, and while it's difficult to implement on a large scale without popular perception of the issues, there are technologies that can help. Some of these became businesses back during the Internet Boom - the www.anonymizer.com web proxy service is one of the best known examples that's still in business, and it's particularly well-suited to letting you use search engines without disclosing personal information, and there are half a dozen other proxy services in business. Email remailers perform similar functions for email. Decentralized solutions can provide much stronger privacy, but they require either enough popular support for free networks to work, or else successful business models for commercial services, most of which failed back during the boom.

An unfortunate discovery has been that technologies which are good at protecting privacy are extremely useful to spammers. Unlike the civil-liberties supporters, who need technology that's good enough to protect their peace-movement groups from FBI infiltration or democracy advocates in Burma from military government spies, spammers can get by with relatively weak protection, because they're mostly protecting disposable zombies from low-intensity attackers. And the techniques used to deter spammers - such as user-registration with image recognition to deter spammer robots from signing up, or credit-card verification for user accounts, or email handshakes - tend to increase the amount of tracing information that's available to search engines. On the other hand, even relatively weak privacy protection, such as web anonymizers, can be a start in reducing the information that's easily available from search engine logs.

-- Bill Stewart (email)

Response to Search engines and personal privacy: not spying on users should be a feature

The Dvorak article nicely illustrates some issues that arise as more functionality is added to more devices in more contexts. It is worth asking whether some of this functionality is actually needed; however, he misplaces the problem. It is not the proliferation of addresses or of devices with addresses per se. Rather it is the conflation of identities, names, addresses, routes, and authenticators. For example, using IP addresses as authenticators of valid users is a now common instance of function creep. This misuse is a handy quick fix, but it facilitates a longterm problem. It is a common assumption that all increases in functionality require a concomitant gathering of security and/or privacy reducing information. But, many of us have devised technologies to facilitate secure and private communication, transactions, etc. Much more is possible than is generally understood. Gus Simmons is a polymath who among other things invented a sensor that makes the airbag go off in your car and a device to test for water on Mars that now resides in the Smithsonian. He once said of cryptography that it has more I'll-be-damned's in it than any other field he had worked in.

As someone who has been designing and deploying private and anonymous communication systems for over a decade I find Bill Stewart's comments on decentralized systems for communication privacy unnecessarily pessimistic. Our latest version of Onion Routing, Tor, has been deployed for two years and has been steadily gaining users and servers. Checking just now, there are over 250 servers on five continents and the network continuously processes about 15 megabytes/sec of traffic. We don't know how many users there are because the system hides that even from us, but a reasonable estimate puts it in the tens of thousands. This is a volunteer system; while not commercial, a fair amount of economic and social analysis has gone into its "business model". The JAP service developed at TU Dresden has a different approach to diffusing trust, so does not have as many servers as Tor, but it also has thousands of users.

The comments about the relation between privacy and spam especially requires response. Systems like Tor or the Mixmaster and Mixminion remailer networks are not useful for spammers. They would be very inefficient ways to distribute spam. In fact, even though it would not be useful for spam, nearly all Tor servers block any email (i.e., block exiting to Port 25 for the technorati) just to avoid the ubiquitous misperception of spam as facilitated by anonymity technology. As Bill says, the primary source of spam is worldwide market of disposable zombies (botnets) numbering at least in the hundreds of thousands. The spammers don't need anonymous communication networks. More signifcantly anonymizing the communication (i.e., the pipe/wire/circuit data passes over) is not at all incompatible with authentication techniques. They are orthogonal issues. While the inability to keep the personally identifiable information (PII) in databases out of range of even simple googlehacks is troubling, anonymizing communication helps protect this information. For example, if it is difficult for a network observer to see what financial institution I make encrypted connections to online and when, then it will be more difficult for them to construct targetted phishing attacks or to otherwise attack my financial and personal data. The insider threat is still probably larger at the moment, but this is still one advantage. To reduce the insider threat one could use various types of anonymous, pseudonymous, or only ephemerally stored credentials, private information retrieval, etc. But that is a whole 'nother very large topic.

-- Paul Syverson (email)

Response to Search engines and personal privacy: not spying on users should be a feature

No wonder the colors have been a little bit off!

Tracking Code Discovered in Color Printers Mike Musgrove Washington Post Wednesday, October 19, 2005

"It sounds like a conspiracy theory, but it isn't. The pages coming out of your color printer may contain hidden information that could be used to track you down if you ever cross the U.S. government.

Last year, an article in PC World magazine pointed out that printouts from many color laser printers contained yellow dots scattered across the page, viewable only with a special kind of flashlight. The article quoted a senior researcher at Xerox Corp. as saying the dots contain information useful to law-enforcement authorities, a secret digital "license tag" for tracking down criminals.

The content of the coded information was supposed to be a secret, available only to agencies looking for counterfeiters who use color printers.

Now, the secret is out.

Yesterday, the Electronic Frontier Foundation, a San Francisco consumer privacy group, said it had cracked the code used in a widely used line of Xerox printers, an invisible bar code of sorts that contains the serial number of the printer as well as the date and time a document was printed.

With the Xerox printers, the information appears as a pattern of yellow dots, each only a millimeter wide and visible only with a magnifying glass and a blue light.

The EFF said it has identified similar coding on pages printed from nearly every major printer manufacturer, including Hewlett-Packard Co., though its team has so far cracked the codes for only one type of Xerox printer.

The U.S. Secret Service acknowledged yesterday that the markings, which are not visible to the human eye, are there, but it played down the use for invading privacy.

"It's strictly a countermeasure to prevent illegal activity specific to counterfeiting," agency spokesman Eric Zahren said. "It's to protect our currency and to protect people's hard- earned money."

It's unclear whether the yellow-dot codes have ever been used to make an arrest. And no one would say how long the codes have been in use. But Seth Schoen, the EFF technologist who led the organization's research, said he had seen the coding on documents produced by printers that were at least 10 years old.

"It seems like someone in the government has managed to have a lot of influence in printing technology," he said.

Xerox spokesman Bill McKee confirmed the existence of the hidden codes, but he said the company was simply assisting an agency that asked for help. McKee said the program was part of a cooperation with government agencies, competing manufacturers and a "consortium of banks," but would not provide further details. HP said in a statement that it is involved in anti-counterfeiting measures and supports the cooperation between the printer industry and those who are working to reduce counterfeiting.

Schoen said that the existence of the encoded information could be a threat to people who live in repressive governments or those who have a legitimate need for privacy. It reminds him, he said, of a program the Soviet Union once had in place to record sample typewriter printouts in hopes of tracking the origins of underground, self-published literature.

"It's disturbing that something on this scale, with so many privacy implications, happened with such a tiny amount of publicity," Schoen said.

And it's not as if the information is encrypted in a highly secure fashion, Schoen said. The EFF spent months collecting samples from printers around the world and then handed them off to an intern, who came back with the results in about a week.

"We were able to break this code very rapidly," Schoen said."

-- Edward Tufte

Response to Search engines and personal privacy: not spying on users should be a feature

See http://www.concurringopinions.com/archives/2005/11/the_googl e_empi.html

-- Edward Tufte

Response to Search engines and personal privacy: not spying on users should be a feature

For a try at GoogleAnon, see

http://mamamusings.net/archives/2005/11/21/google_and_anonymity.php

Also the Apple Safari browser on OS10 has a helpful "Reset Safari" which clears out all at once cookies, browsing history, the cache, the downloads window, Google search entries, and autofill passwords.

-- Edward Tufte

Response to Search engines and personal privacy: not spying on users should be a feature

Forget your cell phone calls? For $110 to $170, you can get a list of all your cell phone calls for a month. So can anyone else.

Another example my Keep-It-and-Leak-It Hypothesis: if user information is stored beyond the time of the initial transaction, that information will be eventually leaked, sold, or turned over to the government.

http://www.suntimes.com/output/news/cst-nws-privacy05.html

An excerpt:

YOUR CELL PHONE RECORDS ARE FOR SALE By Frank Main, Crime Reporter, Chicago Sun-Times, January 5, 2006

The Chicago Police Department is warning officers their cell phone records are available to anyone -- for a price. Dozens of online services are selling lists of cell phone calls, raising security concerns among law enforcement and privacy experts. Criminals can use such records to expose a government informant who regularly calls a law enforcement official. Suspicious spouses can see if their husband or wife is calling a certain someone a bit too often. And employers can check whether a worker is regularly calling a psychologist -- or a competing company . . . In some cases, telephone company insiders secretly sell customers' phone-call lists to online brokers, despite strict telephone company rules against such deals.

-- Edward Tufte

Response to Not spying on users should be a feature: The keep-it/leak-it hypothesis

MSN, Yahoo, AOL gives the Feds internet search results; Google says no. Here's the NYTimes account:

The Washington Post:

http://www.washingtonpost.com/wp-dyn/content/article/2006/01/19/AR2006011903331.html

The New York Times:

http://nytimes.com/2006/01/20/technology/20google.html?

This is a data mining adventure by Feds seeking to hack search engine data.

Many such data dumps from MSN, Yahoo, Google to the Feds have already surely taken place under the Patriot Act--the only uncertainty is what power of ten the number of data dumps to the Feds has been. Those dumps to the Feds can not be revealed by the search engine companies, under the 1984-style provisions of the Patriot Act. That the government seizures of user information are substantial is suggested by the government's looseness in defining "a credible terrorist threat," a term applied to a trivial anti-war demonstration at the University of California at Santa Cruz. Would this justify getting the IP number and its Google search record everyone who searched on "UCSC demo" or similar?

-- EdwardTufte

Here's a Wired piece about search engine snoops. I was on a little committee that helped evaluate new interfaces for Tor, an anonymizer described in the article on the second page.

http://www.wired.com/news/technology/0,70051-1.html?tw=wn_story_page_next1

-- Edward Tufte

What seems at discussion here is the notion of who can record our attention. The organization AttentionTrust has been involved to construct a specific set of principles around attention ownership and attention management.

Check out their principles at http://www.attentiontrust.org/about#principles.

-- Tim Chambers (email)

Keeping Secrets A simple prescription for keeping Google's records out of government hands. By Tim Wu

http://www.slate.com/id/2134670/

-- ET

Deep in a NYTimes (2006 January 26) story by Adam Liptak:

"According to a 2004 decision of a federal court in Virginia, America Online alone responds to about 1,000 criminal warrants each month. AOL, Google and other Internet companies also receive subpoenas in divorce, libel, fraud and other types of civil cases. With limited exceptions, they are required by law to comply"

That was 2004. Also it presumably does not include all the secret requests through the Patriot Act, requests which are likely to be sweeping in breadth. Or evoked by automatic pilot where certain searches go straight to the government; consider "aljazeera.net" for example.

As a first guess, 250,000 to 500,000 individual user files will be released by software houses in 2006. That would average out to be about 1 of every 400 US households. The error limits are, say, between 1 in 200 and 1 in 1000 households.

Perhaps someone more knowledgeable could refine the numbers. It is difficult to know what counts as a released piece of information.

Data-mining is a strategy when you don't have any ideas; garbage in, garbage out. There is here also a larger government strategy of intimidation, to make people real careful about what they say, what they search, what they read.

On this matter, the software moguls are way over their heads, their skills, and ultimately, their values, for nearly every decision they make will be a what's-good-for-business decision. They are unable to make any kind of privacy guarantee to their customers and users. That will be the case as long as they continue to retain information about their users and customers. Why doesn't someone make not spying on users a feature?

http://nytimes.com/2006/01/26/technology/26privacy.html?hp&ex=1138338000&en=166d09f833a74d52&ei=5094&partner=homepage

-- ET

Here's a collection of things I've looked into over the years that get to the nitty gritty technical concepts involved.

First, the privacy policy of an organization that thinks about this issue a great deal: The Electronic Frontier Foundation's privacy policy.

IP address tracers are readily available free services on the net and will generally lead the investigator to an internet service provider. Providers fundamentally have to track IP addresses and associate them with the people who are paying the bills. If the government is interested in an IP address, it can subpoena the billing records from the internet service provider and then send an agent to the physical address to pick up the person.

If one really want to get into the weeds, some search phrases to start with are border gateway protocol and root server.

One may say "Why, can't I just use a whois search to find the registrar of any IP address?" Well, not any more. As many more machines continue to be added to the internet, the original global routing table scheme filled up. Most ISPs now control the delivery of packets to their subscribers through randomly assigned IP addresses that they register en bloc. This saves on registration costs (less than all subscribers have machines online at any given time), slows growth of the global routing tables, and does reduce the odds that to much information will be associated with one IP address. It also makes it harder to trace attacks, but it doesn't make it harder for governments to issue subpoenas.

Cookies, which the search engines also use, are a different story. Philip Greenspun has an excellent write-up on the spying potential of cookies (scroll down to the napkin drawing).

While resistance to a subpoena is probably argued on the assumption that the matter will end up in court, (otherwise, why the subpeona?), merely delivering a subpoena can be very coercive. Many people and businesses would decide it is in their best interest to cooperate, rather than spend time and money resolving the issue.

The big picture remains the same: if the information is recorded, the government can get it unless it's privileged communication, that is, the witnesses's relationship to the client would have to be spousal, attorney-client, clergy-parishoner, or psychotherapist-client. Even these few privileges only come into play in court, and only bear on what is actually admitted as evidence. Nothing prevents the government from using the subpoeneaed information for something else, once the information is in hand.

-- Niels Olson (email)

Permalink to the above New York Times story on Google.

-- Niels Olson (email)

The report referenced by ET on 26 January is interesting, and, of course, I agree with the general tenor of the discussion. Nonetheless, I wonder if at least part of the problem is a purely technical one of how Google handles badly spelled queries. Take the following note, for example, which appears on the page in question:

This Web site appears when searching for "Reporters Without Borders" but not for its French name, "Reporters Sans Frontiers".

The problem is that that is not its French name, which is "Reporters Sans Frontières." In fact the usual Google search that I use (not via China) does indeed find the right page and lists it first if I search for "Reporters Sans Frontiers" (including the quotes in the search), but I could forgive it if it did not, because in fact the phrase "Reporters Sans Frontiers" occurs nowhere on the page, not even hidden in a <meta> element. So clearly the ordinary Google does use a bit of intelligence to correct frequent errors (I'm talking about the missing e at the end, not the missing accent), but maybe the Chinese version has not yet reached the stage where it can do that.

None of this undermines the main point that ET and the news story are making, but it does remind us that we should not be too quick to attribute to ill will problems that could also be explained by technical difficulties or (though probably not in this case) stupidity.

-- Athel Cornish-Bowden (email)

From ET:

The first Grand Principle of surveillance: If someone collects the information, someone will sell it.

Not always. The European Union has strict data protection laws making it illegal (or at least finable) to harvest and sell personal information. As a result, I get almost no junk mail in England, perhaps two pieces every week. In the land of the free I get maybe 5-10 per day. Of the last ten years I've lived seven in England and three in freedom, so the ratio should be closer to the reverse, and it would be without the Data Protection Act.

Of course, if the state itself feels threatened, no data protection laws will protect the citizen. The European Union recently, and outrageously, decided to violate its own law and allow its airlines to hand over passengers' personal data (credit card numbers, date of birth, meal preferences, religion, ...) to US immigration authorities.

-- Sanjoy Mahajan (email)

Thought you might find this interesting: IRS plans to allow preparers to sell data

-- Cameroon (email)

IRS To Allow Preparers to Sell Tax Return Data

The text of the Federal Register entry cited in the article above. The hearing on this will be held on 4 April. The url of the link above looks squirrelly to me, I don't know how long it will survive, so here's the link to the Federal Register's table of contents for 8 December 2005. Scroll down to Internal Revenue Service - Procedure and Administration.

-- Cameroon (email)

From the fifth paragraph of Background in the Federal Register entry:

these proposed regulations allow tax return preparers to obtain consents to use tax return information for solicitation of services or facilities furnished by any person

H&R Block will be able to sell your information to eBay, Yahoo, possibly the Chinese government if it can conjure up a sham business that purports to sell a product or service to Americans. Note the information available for sale is not limited to name and address. Presumably eBay will be able to target customers based on income, distribution of assets, etc. Will preparers do it? Is there some market pressure resisting it? Not that I know of.

In addition, the preparers may only need implied consent. I'm not really savvy enough with tax law to know which phrases in here are terms of art. So it may be that if H&R Block even thinks you're okay with the sale of your tax return to the Chinese government, then they might be able to do it legally. If I were working at H&R Block and looking for a promotion, I might be thinking really hard about how to write the consent document as obtusely as possible, and embed as much of it as possible in the use-of-service contract.

-- Cameroon (email)

If you think spying is bad, what about paying folks for them to sell out their business or personal contacts? Already there is a bunch of outcry on the net over this news posting.

http://www.techcrunch.com/2006/03/23/jigsaw-is-a-really-really-bad-idea/

-- Don (email)

An astonishing report about the daily monitoring by Microsoft of Windows users:

http://www.groklaw.net/article.php?story=20060608002958907

-- Edward Tufte

It seems Google wants to literally "listen in" too! Beginning with monitoring your television.

http://news.bbc.co.uk/2/hi/technology/5084870.stm

-- Daniel Meatte (email)

See "Google to adopt new privacy measures:"

http://www.huffingtonpost.com/huff-wires/20070314/google-privacy

Why keep track of searches at all?

For marketing and usage analysis, can searches be analyzed live--as each search arrives-- and then the identifying IP discarded?

-- Edward Tufte

Update, March 22, 2007

It appears that Google has not in fact improved its user surveillance policies. See Victoria Shannon, "Google's Privacy Policy is Clearer, Not Tougher."

http://www.nytimes.com/iht/2007/03/22/technology/IHT-22ptend22.html

-- Edward Tufte

Spying on road users!

http://www.pipstechnology.com/alpr/demonstrations/

Click on the bottom video "PAGIS Real Life Application."

-- Andrew Nicholls (email)

Readers of this board may be familiar with some of the recent court cases that have targeted the long-term holding of search histories by the search engines. In a novel development, Google is apparently seeking the cooperation of their users in tracking not only their search history, but their entire Web History.

-- Niels Olson (email)

Bringing together John Snow and the recasting of privacy as a feature:

Brownstein J. S., Cassa C. A., Mandl K. D. No Place to Hide -- Reverse Identification of Patients from Published Maps. N Engl J Med 2006; 355:1741-1742, Oct 19, 2006

-- Niels Olson (email)

In Five must-have security/privacy extensions for Firefox, Chris Soghoian points out some of the leading concerns for today's web in a readable way and even references a couple of primary sources. The extensions he cites are intended for Firefox users but no browser is safe from these attacks as shipped.

-- Niels Olson (email)

In A Story of Surveillance, Washington Post reporter Ellen Nakashima writes about Mark Klein, the AT&T tech that the Electronic Frontier Foundation is bringing to Washington to speak against telecom immunity for participating in the NSA Internet collection program.

Hear NPR's Robert Seigel interview Mr Klein here.

Here are some YouTube videos Google found of Mr Klein.

-- Niels Olson (email)

Robert Vamosi reports for Cnet on a feature for Firefox 3 that isn't ready yet and didn't ship with the new release: Private Browsing.

The feature, Private Browsing, would have disabled all caching, cookie downloads, history records, and form data used during the current session. In essence, you could surf the Web and leave no fingerprints.
"It basically said to the browser: I would like what I'm about to do to not be logged anywhere," said Johnathan Nightingale, Mozilla's "human shield," aka its security user interface designer.
He described the private browsing process as this: you hit a button and everything past that point isn't logged. Then, at some point in the future, you hit the button again and it's as though what you just did never happened.

This would not replace the Five must-have security/privacy extensions for Firefox that Chris Soghoian reported on, but could work with them quite nicely.

-- Niels Olson (email)

Some former google search engineers have introduced their own search engine, cuil. Their privacy policy seems to be pretty user-conscious.

-- Niels Olson (email)

Permit Cookies is an extension for Firefox that I've been using for some time now. It allows you to set a default-deny policy for HTTP cookies and then selectively enable them for specific sites. Not only does it satisfy the paranoid, it's quite effective as a litmus test for Web development acumen. Many sites assume users have cookies enabled — and that's not counting those that actually make use of sessions and/or chide cookie-deniers for their transgressions. Needless to say, interesting things occur.

Although I've found it useful overall, it goes without saying that my browsing experience is significantly compromised. Especially when paired with NoScript. And still, there can be significant improvements, like:

Filtering out cookies on a per-key basis (advertising/tracking products tend to use the same cookie keys)
Denying attempts to set or retrieve cookies through embedded resources, so that only the page in the location bar has access to do so (e.g. block cookies for requests to resources in img tags, object, embed, iframe, link, script etc.)

I suspect as well that when enough people get significantly bitten by the (lack-of) privacy bug, there will be more demand for information exchange auditing tools, and more pressure on organizations to cooperate.

As Bruce Schneier and others are wont to say, such information is a natural byproduct that is generated as a result of other activities, and it costs almost nothing to keep it — all of it — around. When that concept hits the mainstream, I'm sure there will be more focus on privacy all around.

Ultimately, however, I don't expect any company to make good on any promise of throwing such potentially valuable information in the trash, unless it was somehow worth more to them to do so.

-- Dorian Taylor (email)

Email scams

I know there was discussion somewhere on this forum some time ago about having email addresses hanging out freely on the web, in places such as this forum. So a change had been in how email addresses were stored so as to prevent bots from running around and grabbing email addresses and sending all the contributors all manners of email junk.

But this morning I received the following email:

Subject: Hospital (sent through edwardtufte.com)
From: jcloud003@yahoo.se [jcloud003@yahoo.se]
To:  Donahue, Rafe 

Dearest One

I am Mr Jean Cloud, born 18 January 1936, hailed from Canada. I am
suffering from Throat Cancer and i am suffering very bitterly as i
write you this message. My doctor just informed me that my days are
counted considering my health status. My matrimonial situation is that
i lost my wife and Two kids in Motor accident on 25 december 2007, So I
have neither a wife nor children now whom i can WILL my heritage.  This
is why i wanted a gracious way and a way to help the less-privileged,
wished to give you this very heritage amounting the sum of Four
Millions Canadian Dollars for the Charity. I would want to have your
full names, your telephone and fax number while responding to this
message. I count on your sincere willingness to employ this fund in a
Godly manner. I will wait to get your news soon, accept my cordial
salutations.

Mr Jean Cloud.

So, either someone is manually trolling through ET-land and sending us emails or they have figured ways around the precautions put in place. There may be other possibilities but I don't know what they are.

Regardless, I thought that someone at ET's site might want to know that this is going on. Has anyone else gotten this scam for this site?

Rafe

ps. I looked at the Cancer Survival Rates thread and found out that "Throat Cancer" is not on the list. If Mr Cloud is translating thyroid as throat, he might need a second opinion; however, if he is translating esophagus as throat, I get first dibs on his four million Canadian dollars for my favorite charity...

-- rafe donahue (email)

I received the above email overnight (13/14 Aug) as well.

-- Andrew Nicholls (email)

Email spamming of board

Our webmaster reports about the spamming of the board's contributors:

"The good news it that we're not giving out anyone's email address. A spammer can send email through the web form, but he doesn't see the address it's going to.

From going through the log, it appears that this person was manually sending emails by clicking on the email link and filling out the web form repeatedly. This takes a little time to do, and in fact it looks like it was taking nearly 1 minute per email to send. I see records for 69 emails in 51 minutes.

We have some code in place that prevents someone from sending too many emails in a short period of time, but he was sending them slowly enough to not trigger this feature. The limit was 5 emails in 90 seconds. I'll decrease that to 5 emails in 5 minutes.

Even with this change, it will allow anyone to send 5 free emails before we stop them. If this is still a problem, we could change the mail system to delay sending each email for 5 minutes; if someone queues up more than 5 emails in that 5 minutes, we can discard them instead of sending them out."

-- Edward Tufte

Response to (Recent board spam) and Not spying on users should be a feature: The keep-it-and-lose-it hypothesis

I've gotten a few of these from your site over the years, maybe five to ten. From what I've seen of the site's code, I would be inclined to agree with the webmaster, it's very, very unlikely that someone has gotten access to the email addresses. If you're looking for a spam-resistant content management system, Ars Digita is probably pretty good. Now, the fact that it's written in tcl, well, that's a different story . . .

-- Niels Olson (email)

While many people are rightfully concerned about Google's privacy policies with regards to how long they keep data around, this is certainly news: Google plans to renegotiate its position in China after finding that China has systematically attacked gmail, seeking the information of dissidents

-- Niels Olson (email)

In an expectedly insightful article, Bruce Schneier introduces a useful talking point: "...it's bad civic hygiene to build technologies that could someday be used to facilitate a police state."

-- Niels Olson (email)

Two search engines with good privacy policies:

- StartPage (aka Ixquick): http://startpage.com

- Clusty: http://clusty.com

StartPage doesn't even record IP addresses:

"Startpage and its EU brand Ixquick has long been among the search engine industry's leaders when it comes to protecting your privacy. These services do not record the IP addresses of legitimate users and abolished the use of Unique ID cookies already in 2006. [...]"

(Source: Surf the web anonymously w/Startpage.com: http://www.pandia.com/sew/2535-surf-the-web-anonymously.html)

-- John Galada (email)

Here's a nice FOIA form letter creator for your data from the FBI, its field offices, the NSA, DSA, DIA, CIA, US Marshals, Secret Service, and Army CID. I've heard Facebook provides a similar service in those jurisdictions where it is required by law.

http://www.getmyfbifile.com/form.php

-- Niels Olson (email)

DuckDuckGo.com prides itself on its policy of not tracking or bubbling its users.

Here's their privacy policy.

I'm not affiliated with DuckDuckGo/

-- Chris Pudney (email)

Threads relevant to business:
Cleaning up Excel's poshlust graphics Communicating software design Executive dashboards Executive decision support systems Formatting for Financial Scorecards and Detailed Reports Formula 1 real-time telemetry displays Narrative sparklines should replace one-at-time instantaneous performance readings. Map of the stock market Monitoring complex processes Numerical language		Photographing whiteboards... for the record Plotting Share (Stock) Volumes Prediction Markets Process Mapping Project Estimates Project Management Graphics (or Gantt Charts) Resumes and Presentation of Data Visualizing song structure to maximize studio productivity corporate design manuals

Threads relevant to the internet:
Design of Humanities Portal Editorial policies and reader ratings Good web design, web standards, user testing HTML in Email Linkrot: what to do? Mapping the Internet		Measuring website traffic Moderating internet forums: What's smart, not what's new Not everyone knows what quote marks mean in a search entry Scroll bars Some Google hits, January 1, 2003 Web site color choice