Cloudy Judgment

By Paul Boutin
Slate, April 3, 2008

Edited by Andy Ross

There's now a flood of Web-based applications that serve as simplified versions of popular desktop software. Google Docs, the in-your-browser competitor to Microsoft Office, is probably the best example. Still, the more time I spend using Web-based apps like Google Docs, the more I appreciate my desktop computer.

First, networks are flaky. Part of what makes the Internet so powerful is that it doesn't have to maintain a live, nonstop, real-time connection. As long as your mail gets transferred and Web pages download within a reasonable amount of time, you don't notice if your connection briefly goes down once in a while. If you're using that connection to edit photos, you do notice.

Second, today's network apps run inside another application — your Web browser. That makes them slower, and it limits the possibilities for the apps' user interface. The Google Docs slide-show editor has the same functionality of an early-1990s version of Microsoft PowerPoint and has just as many bugs in the way it formats text.

The people who build browsers need to do a better job, too. Don't even get me started on the daily hell wherein I hit a Web site that locks up Firefox, killing all of my browser windows. Even Microsoft Word doesn't crash that often anymore.

In theory, Web-based apps — also known as cloud computing — are the future of computers. That ignores the huge progress in personal computers that sit on your desktop, in your lap, or in your pocket. Multi-core processors, touch screens, motion sensors — all major computing advances, none of which are happening in the cloud.

For me, it'll be years before Photoshop Express can become powerful enough to replace my desktop version, or before Google Docs gets me to uninstall Microsoft Office. One of the nice things about Word and Photoshop is that once I fire them up and start working, I can forget all about the Internet for a few hours.
 

The Google Cloud

By Stephen Baker
Business Week, December 13, 2007

Edited by Andy Ross

Google's globe-spanning network of computers blitz through mountains of data faster than any machine on earth. Most of this hardware isn't on the Google campus. It's just out there, somewhere, whirring away in big refrigerated data centers. Folks at Google call it the cloud.

In 2006, Google launched a course at the University of Washington to introduce programming at the scale of a cloud. Call it Google 101. It led to an ambitious partnership with IBM to plug universities around the world into Google-like computing clouds.

As this concept spreads, it promises to expand Google's footprint in industry far beyond search, media, and advertising, leading the giant into scientific research and perhaps into new businesses. In the process Google could become, in a sense, the world's primary computer.

Google's cloud is a network made of maybe a million cheap servers. It stores staggering amounts of data, including numerous copies of the World Wide Web. This makes search faster, helping ferret out answers to billions of queries in a fraction of a second.

Cloud computing, with Google's machinery at the very center, fits neatly into the company's grand vision, established a decade ago by founders Sergey Brin and Larry Page: "to organize the world's information and make it universally accessible."

For small companies and entrepreneurs, clouds mean opportunity — a leveling of the playing field in the most data-intensive forms of computing. To date, only a select group of cloud-wielding Internet giants has had the resources to scoop up huge masses of information and build businesses upon it. A handful of companies — the likes of Google, Yahoo, or Amazon — transform the info into insights, services, and, ultimately, revenue.

This status quo is already starting to change. In the past year, Amazon has opened up its own networks of computers to paying customers, initiating new players, large and small, to cloud computing. In November, Yahoo opened up a small cloud for researchers at Carnegie Mellon University. And Microsoft has deepened its ties to communities of scientific researchers by providing them access to its own server farms.

For clouds to reach their potential, they should be easy to program and navigate. This should open up growing markets for cloud search and software tools — a natural business for Google and its competitors. Google CEO Eric E. Schmidt won't say how much of its own capacity Google will offer to outsiders, or under what conditions or at what prices. "Typically, we like to start with free," he says, adding that power users "should probably bear some of the costs." And how big will these clouds grow? "There's no limit," Schmidt says.

Google is poised to take on a new role in the computer industry. Not so many years ago scientists and researchers looked to national laboratories for the cutting-edge research on computing. Now, says Daniel Frye, vice-president of open systems development at IBM, "Google is doing the work that 10 years ago would have gone on in a national lab."

MapReduce is the software at the heart of Google computing. While the company's famous search algorithms provide the intelligence for each search, MapReduce delivers the speed and industrial heft. It divides each task into hundreds or thousands of tasks and distributes them to legions of computers. In a fraction of a second, as each one comes back with its nugget of information, MapReduce quickly assembles the responses into an answer. It was developed by University of Washington alumnus Jeffrey Dean.

Students rushed to sign up for Google 101 as soon as it appeared in the U-Dub syllabus. Within weeks the students were learning how to configure their work for Google machines and designing ambitious Web-scale projects, from cataloguing the edits on Wikipedia to crawling the Internet to identify spam.

Luck descended on the Googleplex in the person of IBM Chairman Samuel J. Palmisano. The winter day was chilly, but Palmisano and his team sat down with Schmidt and a handful of Googlers and discussed cloud computing. It was no secret that IBM wanted to deploy clouds to provide data and services to business customers.

Over the next three months they worked together at Google headquarters. The work involved integrating IBM's business applications and Google servers, and equipping them with a host of open-source programs. In February they unveiled the prototype for top brass in Mountain View, California, and for others on video from IBM headquarters in Armonk, New York.

The Google cloud got the green light. The plan was to spread cloud computing first to a handful of U.S. universities within a year and later to deploy it globally. The universities would develop the clouds, creating tools and applications while producing legions of new computer scientists.

Yahoo Research Chief Prabhakar Raghavan says that in a sense, there are only five computers on earth: Google, Yahoo, Microsoft, IBM, and Amazon.

Tony Hey, vice-president for external research at Microsoft, says research clouds will function as huge virtual laboratories, with a new generation of librarians curating troves of data, opening them to researchers with the right credentials.

Mark Dean, head of IBM research in Almaden, California, says that the mixture of business and science will lead, in a few short years, to networks of clouds that will tax our imagination: "Compared to this, the Web is tiny. We'll be laughing at how small the Web is."
 

The Grid

By Jonathan Leake
The Sunday Times, April 6, 2008

Edited by Andy Ross

The scientists who pioneered the internet have now built a much faster replacement — the grid. At speeds about 10,000 times faster than a typical broadband connection, the grid is the latest spin-off from CERN, the particle physics centre that created the web.

David Britton, professor of physics at Glasgow University and a leading figure in the grid project, believes grid technologies could revolutionise society: "With this kind of computing power, future generations will have the ability to collaborate and communicate in ways older people like me cannot even imagine."

The grid will be activated at the same time as the Large Hadron Collider (LHC) at CERN, based near Geneva. Scientists at CERN started the grid computing project seven years ago when they realised the LHC would generate many petabytes of data per year.

The grid has been built with dedicated fibre optic cables and modern routing centres, meaning there are no outdated components to slow the deluge of data. The 55,000 servers already installed are expected to rise to 200,000 within the next two years.

Professor Tony Doyle, technical director of the grid project, said: "We need so much processing power, there would even be an issue about getting enough electricity to run the computers if they were all at CERN. The only answer was a new network powerful enough to send the data instantly to research centres in other countries."

That network is now built, using fibre optic cables that run from CERN to 11 centres in the United States, Canada, the Far East, Europe and around the world. From each centre, further connections radiate out to a host of other research institutions using existing high-speed academic networks.

Ian Bird, project leader for CERN's high-speed computing project, said grid technology could make the internet so fast that people would stop using desktop computers to store information and entrust it all to the internet: "It will lead to what's known as cloud computing, where people keep all their information online and access it from anywhere."

Although the grid itself is unlikely to be directly available to domestic internet users, many telecoms providers and businesses are already introducing its pioneering technologies, such as dynamic switching, which creates a dedicated channel for internet users downloading large volumes of data such as films.
 

The LHC Computing Grid

CERN

Edited by Andy Ross

The Large Hadron Collider (LHC), currently being built at CERN near Geneva, is the largest scientific instrument on the planet. When it begins operations in 2007, it will produce roughly 15 petabytes of data annually, which thousands of scientists around the world will access and analyse. The mission of the Worldwide LHC Computing Grid (LCG) project is to build and maintain a data storage and analysis infrastructure for the entire high energy physics community that will use the LHC.

The data from the LHC experiments will be distributed around the globe, according to a four-tiered model. A primary backup will be recorded on tape at CERN, the Tier-0 centre of LCG. After initial processing, this data will be distributed to a series of Tier-1 centres, large computer centres with sufficient storage capacity and with round-the-clock support for the grid.

The Tier-1 centres will make data available to Tier-2 centres, each consisting of one or several collaborating computing facilities, which can store sufficient data and provide adequate computing power for specific analysis tasks. Individual scientists will access these facilities through Tier-3 computing resources, which can be local clusters in university departments.

When the LHC accelerator is running optimally, access to experimental data needs to be provided for the 5000 scientists in some 500 research institutes and universities worldwide that are participating in the LHC experiments. In addition, all the data needs to be available over the 15 year estimated lifetime of the LHC. The analysis of the data, including comparison with theoretical simulations, requires of the order of 100 000 CPUs at 2006 measures of processing power.

A globally distributed grid for data storage and analysis provides several key benefits. The costs are more easily handled in a distributed environment, where individual institutes and participating national organisations can fund local computing resources and retain responsibility for these. Also, there are fewer single points of failure. Multiple copies of data and automatic reassigning of computational tasks to available resources ensures load balancing of resources and facilitates access to the data for all the scientists involved.

There are also some challenges. These include ensuring adequate levels of network bandwidth between the contributing resources, maintaining coherence of software versions installed in various locations, coping with heterogeneous hardware, managing and protecting the data so that it is not lost or corrupted over the lifetime of the LHC, and providing accounting mechanisms so that different groups have fair access.

The major computing resources for LHC data analysis are provided by the Worldwide LHC Computing Grid Collaboration — comprising the LHC experiments, the accelerator laboratory and the Tier-1 and Tier-2 computer centres. The computing centres providing resources for LCG are embedded in different operational grid organisations. The LCG project is also following developments in industry, where leading IT companies are testing and validating cutting-edge grid technologies using the LCG environment.