Not everyone knows what quote marks mean in a search entry

From The New York Times, February 12, 2006, article by Randall Stross on amazon:

"Something in Amazon's secretive investor relations office has wafted through the air into its customer service department. The company has long provided a toll-free number for customers who want to speak with a human representative to solve a problem. Yet it does not mention the number on any help page on its site. If this guess-our-phone-number game is intended to curtail calls and to keep associated costs low, it seems unlikely to thwart anyone. A search for "Amazon 800 number" takes Google's servers an eighth of a second to produce 3.3 million results.

Putting exact phrase quotation marks into this search yields a grand total of 760 results for the phrase, a number which differs from 3.3 million. But the whole idea in the last paragraph is wrong, since the number of results has little to do with thwarting a search; for in this case there is no practical difference in ease of finding the amazon 800 number between hundreds and billions of pages returned.

No doubt everyone here knows that searching AMAZON 800 NUMBER produces all those pages containing the words AMAZON and 800 and NUMBER. Searching "AMAZON 800 NUMBER" returns all those pages that contain the exact phrase AMAZON 800 NUMBER.

I once was talking to web designer A who said that he was more famous than some other web designer B. Designer A claimed some vast number of results in Google, far more than the wretched B. The number seemed large, so I asked A if he put quote marks around his name when ego-surfing. "No, that makes B higher than me." I almost blurted out something about designer arithmetic, but instead bit my tongue.

What is the biggest possible difference in results for words in quotes and not in quotes? Here's a guess. The exact phrase "the and" yields 0 documents. Without quotes, the phrase yields 8,200,000,000. This will hold at least until Google spiders this thread.

-- Edward Tufte

Different Google servers return different numbers of documents. "the and" just returned 0 and then, 10 minutes later, probably hitting another server or an update, 288,000, then 270,00, then 227,000, 300,000, 278,000, 209,000, 305,000. And then a few more minutes later 0 (zero) again.

According to our logs, the googlebot spider did stop by as usual last night and picked up this thread. Perhaps a few of the "the and" returns are possibly newly generated by this thread.

Remember, however, that the puzzle is about the maximum difference between quote-marked and quote-unmarked results, and not that the quote-marked result is zero. And that the point of the puzzle is to highlight the importance of using exact phrase quotes in your search requests.

Regular use of exact phrase quotes will also simplify the surveillance and review of your searches by the Feds, Google, the Chinese government, stalkers, and your opponent's lawyers.

-- ET

formatting query strings

As for the typographical problem of formatting queries, Matt Cutts' blog mentions Google's in-house formatting practice: [ and ] to mark the beginning and end of queries. So in this example [the and] returns 8.2 billion results, and ["the and"] returns, er, a roll of the dice.

"In-house practice" pretty much equals "jargon", but the square-bracket notation is reasonably concise and, after the briefest explanation, entirely unambiguous -- obviating any need to ever again say "without the quotation marks", especially since Google ignores the brackets when parsing its search terms.

-- Orion Montoya (email)

Studying cardiac physiology tonight, it occured to me Google's informal use of brackets may work within computer science, but tends to conflict with all the other hard sciences (biology, chemistry, physics, medicine) because concentrations of chemical species are typically abbreviated with brackets, eg, the concentration of ionized calcium in a solution is abbreviated [Ca2+]. This isn't entirely trivial. Some authors of review articles have taken to reporting what search strings they looked for in which databases. Did the author search for Ca2+ or [Ca2+]? A review article on calcium channel blockers (commonly prescribed blood pressure medications) could get tricky. Meanwhile, any scientist who also happens to use a computer has probably come across the code convention.

-- Niels Olson (email)

Don't Make Me Think! ( Steve Krug)

Perhaps Google needs to add a bit more intelligence to its results. What is the likelihood that I will enter three completely unrelated search criterion in one search? It is far more likely that I have entered a bunch of words that I think best identify the pages I want to see.

Following Steve Krug's wonderful book title: "Don't Make Me Think", Google should do the thinking for me. If I "punch these up in the computer", then the computer should do the thinking. If the set of search words I enter combine to drastically reduce the result set, then that variant of the results should be presented to the user along with the full result set.

Google is already doing a similar thing with Froogle. Searches generate numerous result vectors.

Rather than forcing the user to be syntax aware, make the software flexible and intelligent.

-- Steven Chalmers (email)

Greenspun's blog has a funny: google latex and compare the search returns on the left to the ads on the right!

-- Niels Olson (email)

