Books 'R' Google

By Robert Darnton
The New York Review of Books
Volume 56, Number 2, February 12, 2009

Edited by Andy Ross

Google has digitized millions of books and made the texts searchable online.

When fields of knowledge turned into professions and university departments, professional journals sprouted throughout the fields. Commercial publishers made a fortune by selling subscriptions to the journals. They could ratchet up prices without causing cancellations, because the libraries paid for the subscriptions and the professors did not. And the professors provided free labor: they wrote the articles, refereed submissions, and served on editorial boards.

When businesses like Google look at libraries, they see potential assets, content, ready to be mined. Built up over centuries at an enormous expenditure of money and labor, library collections can be digitized en masse at relatively little cost. To digitize collections and sell the product in ways that fail to guarantee wide access would be to repeat the mistake that was made when publishers exploited the market for scholarly journals, but on a much greater scale.

Four years ago, Google began digitizing books from research libraries, providing full-text searching and making books in the public domain available on the Internet at no cost to the viewer. Google collected revenue from some discreet advertising attached to the service. Google also digitized an ever-increasing number of library books that were protected by copyright in order to provide search services that displayed small snippets of the text. In September and October 2005, a group of authors and publishers brought a class action suit against Google. In October 2008, the opposing parties announced agreement on a settlement.

The settlement creates a registry to represent the interests of the copyright holders. Google will sell access to a gigantic data bank composed primarily of copyrighted, out-of-print books. Organizations will be able to subscribe via an institutional license for access to the data bank. A public access license will make this material available to public libraries. Individuals will be able to access and print out digitized versions of the books by purchasing a consumer license. Google will retain 37 percent of the revenue, and the registry will distribute 63 percent among the copyright holders.

Of the seven million books digitized by November 2008, one million are works in the public domain, one million are in copyright and in print, and five million are in copyright but out of print. Google will continue to make books in the public domain available for users to read, download, and print, free of charge. Many of the books in copyright and in print will not be available in the data bank unless the copyright owners opt to include them. They will be sold as printed books and perhaps also as digitized copies via the consumer license. Most of the books covered by the institutional license are in copyright but our of print.

The proposal could result in the world's largest digital library. Google could also become the world's largest book business. Virtually all books will be brought within the reach of anyone with access to the Internet. Not only will Google bring books to readers, it will also open up extraordinary opportunities for research.

Google did not set out to create a monopoly. But the class action character of the settlement makes Google invulnerable to competition. Most book authors and publishers who own US copyrights are automatically covered by the settlement. No new digitizing enterprise can get off the ground without winning their assent.

This outcome was not anticipated at the outset. We missed a great opportunity. We could have created a National Digital Library. It is too late now. Not only have we failed to realize that possibility, but we are allowing the control of access to information to be determined by a private lawsuit.

Google will enjoy a monopoly of access to information. Google has no serious competitors. Google alone has the wealth to digitize on a massive scale. And having settled with the authors and publishers, it can exploit its financial power from within a protective legal barrier. No new entrepreneurs will be able to digitize books within that fenced-off territory. Only Google will be protected from copyright liability.

This is a tipping point in the development of the information society.

Google Book Search

By Robert Darnton
The New York Review of Books
Volume 56, Number 20, December 17, 2009

Edited by Andy Ross

Google has by now digitized some ten million books. On what terms will it make those texts available to readers? The terms of the settlement will have a profound effect on the book industry for the foreseeable future.

Google plans to enable consumers to purchase access to millions of copyrighted books currently in print, with payment going to authors and publishers as well as Google. Books covered by copyright but out of print, at least seven million in all, will be available through subscriptions paid for by institutions such as universities. The database, along with books in the public domain that Google has already digitized, will constitute a gigantic digital library.

But Google's dominance of access to books will reinforce its power over access to other kinds of information, raising concerns about privacy, competition, and commitment to the public good. As a commercial enterprise, Google's first duty is to provide a profit for its shareholders, and the settlement leaves no room for representation of the public.

Google Book Search (GBS) will certainly be challenged by groups and individuals who claim they were not fairly represented in the classes of authors and publishers. The case may take years to work its way through the courts. As the first step toward a resolution, the filing on November 13 suggested just how far Google is willing to go in modifying the original settlement.

The governments of France and Germany urged the court to reject the settlement. Far from seeing any potential public good in it, they condemned it for creating an "unchecked, concentrated power" over the digitization of a vast amount of literature and for doing so by a "commercially driven" agreement negotiated "in secrecy." In contrast to the commercial character of Google's enterprise, both governments stressed the higher values represented by their national literatures.

The French emphasized the unique character of the book, which, they claimed, would be compromised by Google's commitment to commercialization. The Germans spoke in the name of "the land of poets and thinkers," but they laid most stress on the right of privacy, which, they argued, Google could threaten. Both governments then listed a series of subsidiary arguments:

1. The settlement gives Google a virtual monopoly over orphan works, even though it has no claim to their copyrights.

2. Its opt-out provision, which means that authors will be deemed to have accepted the settlement unless they notify Google to the contrary, violates the rights inherent in authorship.

3. It contains a provision that prevents a potential competitor from obtaining better terms than Google in any new commercial uses of the digitized books. The terms of such future enterprises will be determined by a Books Rights Registry composed of representatives of the authors and publishers.

4. It gives Google the power to censor its database by excluding up to 15 percent of the digitized works.

5. Its guidelines for pricing will promote Google's commercial interests, not the good of the public, through the use of algorithms created by Google according to Google's secret methods.

6. It favors secrecy in general, hiding audit procedures, preventing the public from attending meetings in which Google and the Registry will discuss library matters, and even requiring Google, the authors, and publishers to destroy all documents relevant to their agreement on the settlement.

Above all, the French and Germans condemned the settlement for sanctioning the "uncontrolled, autocratic concentration of power in a single corporate entity," which threatened the "free exchange of ideas through literature."

The same points were made in a hearing before the European Commission in September by the International Federation of Library Associations (IFLA), the European Bureau of Library, Information and Documentation Associates (EBLIDA), and the Ligue des Bibliothèques Européennes de Recherche (LIBER).

All three stressed the danger that "a large proportion of the world's heritage of books in digital format will be under the control of a single corporate entity." They summoned up the prospect of a digital library of 30 million books and concluded that Google would exercise something close to hegemony in the book world. They appealed to the European Commission to defend the interests of the public.

The U.S. Department of Justice pointed to serious difficulties with the settlement and suggested the following changes:

1. Require rights-holders of out-of-print books to participate in the settlement by opting in instead of operating from the assumption that they had agreed to participate unless they opted out.

2. Do not distribute the profits from the sale of orphan books to the parties of the settlement but rather use the money to fund a thorough search for the unknown rights-holders.

3. Appoint guardians to protect the interests of orphan rights-holders by serving on the registry.

4. Find some mechanism by which potential competitors to Google could gain access to orphan works without exposure to suits for infringement of copyright.

5. Prevent Google from using out-of-print works in new commercial products without the owner's permission.

The revised settlement, or GBS 2.0, released on November 13, reads as if Google and the plaintiffs took most of their cues from the DOJ recommendations. GBS 2.0 provides that the Registry will include a court-appointed guardian to represent the rights-holders of unclaimed books. But Google alone would enjoy immunity from prosecution by any rights-holders.

As to revenue from the sale of orphan books, GBS 2.0 accepts that the money not go to Google and the plaintiffs but will be spent in efforts to search for the unidentified rights-holders. GBS 2.0 also allows Google's competitors to license out-of-print books in retail enterprises, although Google would maintain exclusive control of the institutional subscriptions to its gigantic database.

How the prices will be set remains unclear. GBS 2.0 contains no effective mechanism to prevent price gouging, no provision for a public authority to monitor prices, and no way to protect the public from excessive pricing should Google be taken over in the future by rapacious speculators.

GBS 2.0 does not therefore differ in essentials from GBS 1.0. It largely ignores the objections of foreign governments, except by narrowing the scope of GBS to books published in the United States, the United Kingdom, Canada, and Australia. GBS will not cover books published in countries like France and Germany.

One can imagine two general solutions to the problems posed by GBS, one maximal, one minimal.

The most ambitious solution would transform Google's digital database into a truly public library. An act of Congress would clear up a messy legal landscape and give the American people a national digital library equal to the needs of the twenty-first century.

A minimal solution could be devised for the private sector. Congress would legislate to protect the digitization of orphan works from lawsuits, but it would not appropriate funds. To avoid conflict with market interests, the database would include only books in the public domain and orphan works. At the rate of a million books a year, we would have a great library, free and accessible to everyone, within a decade.

The Future Of Publishing

By Jason Epstein
The New York Review of Books
Volume 57, Number 4, March 11, 2010

Edited by Andy Ross

The digitization of the book publishing industry is now irreversible. The publishing industry's capital stock faces dissolution within a vast cloud in which all the world's books will eventually reside as digital files to be downloaded instantly title by title wherever on earth connectivity exists.

Digitization makes possible a world in which anyone can be a publisher and anyone can be an author. In this world, the traditional filters will have melted into air and only the human inability to read what is unreadable will remain to winnow what is worth keeping. Amid the chaos, readers will be guided by the imprints of reputable publishers. The more adaptable of today's general publishers will survive.

The difficult, solitary work of literary creation demands rare individual talent and in fiction is almost never collaborative. Until it is ready to be shown to a trusted friend or editor, a writer's work in progress is intensely private. Informed critical writing of high quality on general subjects will be as rare and as necessary as ever and will survive as it always has in print and online for discriminating readers.

The cost of entry for future publishers will be minimal, requiring only the upkeep of the editorial group and its immediate support services but without the expense of traditional distribution facilities and multilayered management. Traditional territorial rights will become superfluous and a worldwide, uniform copyright convention will be essential. Protecting content from unauthorized file sharers will remain a vexing problem. If I were a publisher today, I would consider a renewable rental model for all e-book downloads.

Literary form has been remarkably conservative throughout its long history. Actual books, printed and bound, will continue to be the irreplaceable repository of our collective wisdom. My rooms are piled from floor to ceiling with books. I mention this so that you will know the prejudice with which I celebrate the inevitability of digitization.

Googled

By John Lanchester
The Observer, February 21, 2010

Edited by Andy Ross

Googled: The End of the World as We Know It
By Ken Auletta
Virgin Books, 400 pages

No company in history has grown as fast as Google. Within 400 weeks of its founding, it was earning revenues of $20 billion a year. The 1998 start-up has reached deep into the everyday experience of millions, put itself in the centre of the internet culture that is defining the new century, and had a disruptive impact on some industries and a potentially terminal one on others. Google is one of the wonders of the world.

Since Google's mission statement is "Don't be evil", people hold it to a high standard. Sergey Brin and Larry Page don't ask for permission: they do what they want to do, and rely on the fact that people will understand the point of it afterwards. The basic move in Google's rise to dominance was copying stuff without asking. Don't ask for permission, and rely on the fact that people will love the results when they see them. This model has stood the company in very good stead, but it plainly involves an attitude in which innocence and arrogance are emulsified together.

Auletta looks at the company in its pomp, and sees problems and threats everywhere. At one point in 2008, Google was offering 150 products. Only targeted advertising made real money. YouTube lost $500m in 2009. Google's programme to digitise books has caused a bitter backlash. That was an example of the no-permission policy going badly wrong, because as Brin told Auletta, if they had asked authors and publishers, "we might not have done the project".

Google's mission is "to organise the world's information and make it universally accessible and useful", but that doesn't extend to its own intellectual property, which it guards with ferocity. As its share prospectus says: "Our patents, trademarks, trade secrets, copyrights and all of other intellectual property rights are important assets for us ... any significant impairment to our intellectual property rights could harm our business or our ability to compete." It's hypocritical to pretend that the same isn't true for everybody else.

Google and Money

By Charles Petersen
The New York Review of Books, December 9, 2010

Edited by Andy Ross

Google's search engine remains its single largest source of revenue. Stanford graduate students Sergey Brin and Larry Page launched Google in 1998 with a new algorithm, called PageRank, that made use of the links between sites to determine relevancy. Google became the best search engine available but it also left Google with almost no source of revenue.

Google grew desperate for funding during the dot-com bust. Aside from page views, one of the few easily measured statistics on the early Web was click-throughs, the number of times visitors to a site found an ad displayed enticing enough to click on it, and then be taken to the advertiser's own website, where the product or service in question might be purchased or used.

Google realized that ads on search engines reach users when they are looking for something specific. The Google advertising system charges advertisers for each time a user clicks on an ad that is displayed next to related search engine results. Google developed programs to link specific ads to millions of different search terms and to ensure that the ads sold were priced fairly. The system provides the vast majority of Google's billions of dollars in revenues.

Google's approach to advertising is unlike the page-view model of its competitors. Google's success depends on finding ways to produce results of such high quality that users need not worry about clicking unnecessarily.

Google has had other challenges. The Internet, as originally conceived, gave the same priority to every piece of data that passed through the network. As the Internet has developed, this principle of net neutrality has largely been retained. In August 2010, Google executives claimed that they would continue to support net neutrality on traditional cable and telephone services, but they dropped their support for net neutrality for wireless devices.

Google's ad exchange lets advertisers target individual people and buy access to them in real time as they surf the Web. In August 2010, Google proposed to become a clearinghouse for everyone's data, too. Google would be at the center of the trade in other people’s data.

Google's proposed data clearinghouse would target ads more precisely by bringing together all the private information that companies have gathered on users in one place. These personally targeted ads will be intrusive and pervasive, allowing advertisers to coordinate campaigns across a single user’s computer, e-reader, and cell phone, as well as other devices with wireless connections. An efficient data clearinghouse will enable marketers to update these campaigns instantaneously.

As advertising becomes more personalized and pervasive, it seems likely that more and more users will want to opt out of the system. Google executives have considered allowing users to pay Google the amount that advertisers would otherwise offer the company to reach them, in exchange for receiving an ad-free service. The next obvious step would be to provide well-off users with greater privacy, at a price.

Google tracks information about users not just to target advertisements but to provide better services. We have always traded a bit of our privacy in order to receive better service. Google executives habitually speak of privacy in terms of these kinds of trade-offs.

Regulators should impose a Chinese Wall between the private data that sites need for personalized services from the private data that sites may use for commercial purposes. A Chinese Wall would make it harder for sites to profit online but it might also protect our privacy.

Googleplex

TechRepublic.com
April 2011

Edited by Andy Ross

In the Plex
By Steven Levy

Over a two-year period, Levy got unprecedented access to people, places, and meetings at the Google headquarters in Silicon Valley. His new book tells all.

Early on, co-founders Larry Page and Sergei Brin listed all the smartest and most influential people in computer science and then tried to hire them all.

Once Page and Brin hired a bunch of smart people, they asked them to turn Google into an artificial intelligence learning machine.

When Google created its AdWords and AdSense programs, it hired statisticians and mathematicians to predict user behavior. This information is a critical part of the auctions for various ads.

When the company went public, Page and Brin told investors that sometimes they would forgo profits to do the right thing for humanity.

When Google launched Gmail, a lot of users freaked out about contextual ads because they thought people were reading their mail. Google just used a search engine to scan the messages.

Google dreams of "zero query search" where Google anticipate what you want and gives it to you before you ask. This could be based on location or on search history.

Page said he's surprised that people aren't more ambitious because there are so many possibilities for doing things that have never been done before.

Google calls its big, ambitious projects moonshots.

Page and Brin continue to see Google Books as something that Google is doing for the good of humanity.

Google lets its employees try lots of different projects on the principle that if they aren't having enough failures then they aren't taking enough risks.

At Google, the job of the lawyers is to figure out how to say yes to the things that Page and Brin want to do.

Levy says that when Google went into China, China changed Google more than Google changed China.

Levy: "Google is very worried about Facebook. It's going through a Facebook panic right now."

How Google Dominates Us

By James Gleick
The New York Review of Books, August 18, 2011

In the Plex: How Google Thinks, Works, and Shapes Our Lives
By Steven Levy
Simon and Schuster, 424 pages

I'm Feeling Lucky: The Confessions of Google Employee Number 59
By Douglas Edwards
Houghton Mifflin Harcourt, 416 pages

The Googlization of Everything (and Why We Should Worry)
By Siva Vaidhyanathan
University of California Press, 265 pages

Search & Destroy: Why You Can't Trust Google Inc.
By Scott Cleland with Ira Brodsky
Telescope, 329 pages

Google is where we go for answers. Most of the time Google does not actually have the answers. Google is the oracle of redirection. Google defines its mission as to organize the world's information.

Google dominates the information economy. Google has many secrets but the main ingredients of its success have not been secret at all. Steven Levy has visited Google’s headquarters periodically since 1999, talking with its founders, Larry Page and Sergey Brin.

Google's single greatest innovation was the algorithm called PageRank, developed by Page and Brin when they were Stanford graduate students. The algorithm assigns every page a rank, depending on how many other pages link to it. All links are not valued equally. A recommendation is worth more when it comes from a page that has a high rank itself. Page and Brin patented PageRank and published the details. It is one of those ideas that seem obvious after the fact.

The Google founders, Larry and Sergey, did everything their own way. Even in the unbuttoned culture of Silicon Valley they stood out from the start as originals. As they saw it, their mission encompassed not just the Internet but all the world's books and images. Google Translate has achieved more in machine translation than the rest of the world's artificial intelligence experts combined.

Google owns and operates a constellation of giant server farms spread around the globe — huge windowless structures, resembling aircraft hangars or power plants, some with cooling towers. The server farms stockpile the exabytes of information and operate an array of staggeringly clever technology. This is Google's share of the cloud.

Google's business is advertising. Google makes more from advertising than all the nation's newspapers combined. Doug Edwards interviewed for a job as marketing manager in 1999. As Google employee number 59, he is the first Google insider to have published his memoir.

The merchandise of the information economy is attention. When information is cheap, attention becomes expensive. Attention is what we give to Google, and our attention is what Google sells.

Siva Vaidhyanathan: "We are not Google's customers: we are its product. We — our fancies, fetishes, predilections, and preferences — are what Google sells to advertisers."

The evolution of this unparalleled money machine piled one brilliant innovation atop another, in fast sequence:

1 Early in 2000, Google sold premium sponsored links: simple text ads assigned to particular search terms. They charged according to how many people saw each ad.

2 Late that year, engineers devised an automated self-service system, dubbed AdWords. Suddenly thousands of small businesses were buying their first Internet ads.

3 Google learned to charge per click rather than per view, and to let advertisers bid for keywords against one another in fast online auctions. Pay-per-click auctions opened a cash spigot.

4 Google had instant knowledge of which ads were succeeding and which were not. It could view click-through rates as a measure of ad quality. An effective ad would get better placement. By 2003, AdWords Select was making so much money that Google was deliberating hiding its success from the press and from competitors.

5 Google expanded its platform outward. The aim was to develop a form of artificial intelligence that could analyze chunks of text — websites, blogs, e-mail, books — and match them with keywords. Given a text, it could predict which advertisements would be effective.

Google called its program AdSense. For anyone hoping to monetize their content, it was the Holy Grail. Anyone could now add a few lines of code to their website, automatically display Google ads, and start cashing monthly checks. Vast tracts of the Web that had been free of advertising now became Google partners.

Search and advertising thus become the matched edges of a sharp sword. The perfect search engine, as Sergey and Larry imagine it, reads your mind and produces the answer you want. The perfect advertising engine does the same: it shows you the ads you want. Anything else wastes your attention.

Google began tracking the behavior of individual users from one Internet site to the next. They observe our every click and they measure in milliseconds how long it takes us to decide. If they didn't, their results wouldn't be so uncannily effective. They have no rival in the depth and breadth of their data mining.

The Google corporate motto is "Don't be evil." The Googlers believed a corporation should behave ethically. But when Google embarked on its program to digitize copyrighted books and copy them onto its servers, it deceived publishers. Google knew that the copying bordered on illegal but it considered its intentions honorable and the law outmoded. Eric Schmidt: "Evil is what Sergey says is evil."

Google did some evil in China. It collaborated in censorship. Beginning in 2004, it arranged to tweak and twist its algorithms and filter its results to omit results unwelcome to the government. Yet Google pushed back against the government. When results were blocked, Google insisted on alerting users with a notice at the bottom of the search page. The company now serves China only from Hong Kong, with results censored not by Google but by government filters.

Scott Cleland: "There is evidence that Google is not all puppy dogs and rainbows." Google's corporate mascot is a replica of a Tyrannosaurus Rex skeleton on display outside the corporate headquarters. T. Rex was a terrifying predator.

Google's founders are visionaries. Google's business competitors charge that the company manipulates its search results to favor its friends and punish its enemies. Google seems to be everywhere and seems to know everything and offends against cherished notions of privacy.

The rise of social networking upends the equation again. Users of Facebook choose to reveal aspects of their private lives, at least to "friends." On Twitter, every remark can be seen by the whole world. The Library of Congress is archiving all tweets. Now Google is rolling out its social-networking platform Google+. Are the social networks our friends?

AR February 2009: I guess Google will work in the perceived public interest, either so as not to be evil or because the public authorities demand it. In the latter case, the public interest will be American. We won't have a globally effective legal framework for such issues for a while yet.

November 2009: The issue is big enough to take very seriously. We cannot merely hope that Google will always do the right thing. I guess Darnton's "ambitious" solution is the best — perhaps then we can hope that the European Union will get on board and make the result a truly global repository.

February 2010: Publishers will need to do deals with Google and Amazon. That's no problem — publishers have always done deals to secure their business. And Google will have to grow up. That may be a bigger problem.

November 2010: The Chinese Wall proposal seems like a good idea to me.

April 2011: Levy writes well so the book may be good.

August 2011: I need to deploy AdSense on my blog.