|

Cloudy Judgment
By Paul Boutin
Slate, April 3, 2008
Edited by Andy Ross
There's now a flood of Web-based applications that serve as simplified
versions of popular desktop software. Google Docs, the in-your-browser
competitor to Microsoft Office, is probably the best example. Still, the
more time I spend using Web-based apps like Google Docs, the more I
appreciate my desktop computer.
First, networks are flaky. Part of
what makes the Internet so powerful is that it doesn't have to maintain a
live, nonstop, real-time connection. As long as your mail gets transferred
and Web pages download within a reasonable amount of time, you don't notice
if your connection briefly goes down once in a while. If you're using that
connection to edit photos, you do notice.
Second, today's network
apps run inside another application — your Web browser. That makes them
slower, and it limits the possibilities for the apps' user interface. The
Google Docs slide-show editor has the same functionality of an early-1990s
version of Microsoft PowerPoint and has just as many bugs in the way it
formats text.
The people who build browsers need to do a better job,
too. Don't even get me started on the daily hell wherein I hit a Web site
that locks up Firefox, killing all of my browser windows. Even Microsoft
Word doesn't crash that often anymore.
In theory, Web-based apps —
also known as cloud computing — are the future of computers. That ignores
the huge progress in personal computers that sit on your desktop, in your
lap, or in your pocket. Multi-core processors, touch screens, motion sensors
— all major computing advances, none of which are happening in the cloud.
For me, it'll be years before Photoshop Express can become powerful
enough to replace my desktop version, or before Google Docs gets me to
uninstall Microsoft Office. One of the nice things about Word and Photoshop
is that once I fire them up and start working, I can forget all about the
Internet for a few hours.
The Google Cloud
By Stephen Baker Business Week, December 13, 2007
Edited by Andy Ross
Google's globe-spanning network of computers blitz through mountains of data
faster than any machine on earth. Most of this hardware isn't on the Google
campus. It's just out there, somewhere, whirring away in big refrigerated
data centers. Folks at Google call it the cloud.
In 2006, Google
launched a course at the University of Washington to introduce programming
at the scale of a cloud. Call it Google 101. It led to an ambitious
partnership with IBM to plug universities around the world into Google-like
computing clouds.
As this concept spreads, it promises to expand
Google's footprint in industry far beyond search, media, and advertising,
leading the giant into scientific research and perhaps into new businesses.
In the process Google could become, in a sense, the world's primary
computer.
Google's cloud is a network made of maybe a million cheap
servers. It stores staggering amounts of data, including numerous copies of
the World Wide Web. This makes search faster, helping ferret out answers to
billions of queries in a fraction of a second.
Cloud computing, with
Google's machinery at the very center, fits neatly into the company's grand
vision, established a decade ago by founders Sergey Brin and Larry Page: "to
organize the world's information and make it universally accessible."
For small companies and entrepreneurs, clouds mean opportunity — a
leveling of the playing field in the most data-intensive forms of computing.
To date, only a select group of cloud-wielding Internet giants has had the
resources to scoop up huge masses of information and build businesses upon
it. A handful of companies — the likes of Google, Yahoo, or Amazon —
transform the info into insights, services, and, ultimately, revenue.
This status quo is already starting to change. In the past year, Amazon
has opened up its own networks of computers to paying customers, initiating
new players, large and small, to cloud computing. In November, Yahoo opened
up a small cloud for researchers at Carnegie Mellon University. And
Microsoft has deepened its ties to communities of scientific researchers by
providing them access to its own server farms.
For clouds to reach
their potential, they should be easy to program and navigate. This should
open up growing markets for cloud search and software tools — a natural
business for Google and its competitors. Google CEO Eric E. Schmidt won't
say how much of its own capacity Google will offer to outsiders, or under
what conditions or at what prices. "Typically, we like to start with free,"
he says, adding that power users "should probably bear some of the costs."
And how big will these clouds grow? "There's no limit," Schmidt says.
Google is poised to take on a new role in the computer industry. Not so
many years ago scientists and researchers looked to national laboratories
for the cutting-edge research on computing. Now, says Daniel Frye,
vice-president of open systems development at IBM, "Google is doing the work
that 10 years ago would have gone on in a national lab."
MapReduce
is the software at the heart of Google computing. While the company's famous
search algorithms provide the intelligence for each search, MapReduce
delivers the speed and industrial heft. It divides each task into hundreds
or thousands of tasks and distributes them to legions of computers. In a
fraction of a second, as each one comes back with its nugget of information,
MapReduce quickly assembles the responses into an answer. It was developed
by University of Washington alumnus Jeffrey Dean.
Students rushed to
sign up for Google 101 as soon as it appeared in the U-Dub syllabus. Within
weeks the students were learning how to configure their work for Google
machines and designing ambitious Web-scale projects, from cataloguing the
edits on Wikipedia to crawling the Internet to identify spam.
Luck
descended on the Googleplex in the person of IBM Chairman Samuel J.
Palmisano. The winter day was chilly, but Palmisano and his team sat down
with Schmidt and a handful of Googlers and discussed cloud computing. It was
no secret that IBM wanted to deploy clouds to provide data and services to
business customers.
Over the next three months they worked together
at Google headquarters. The work involved integrating IBM's business
applications and Google servers, and equipping them with a host of
open-source programs. In February they unveiled the prototype for top brass
in Mountain View, California, and for others on video from IBM headquarters
in Armonk, New York.
The Google cloud got the green light. The plan
was to spread cloud computing first to a handful of U.S. universities within
a year and later to deploy it globally. The universities would develop the
clouds, creating tools and applications while producing legions of new
computer scientists.
Yahoo Research Chief Prabhakar Raghavan says
that in a sense, there are only five computers on earth: Google, Yahoo,
Microsoft, IBM, and Amazon.
Tony Hey, vice-president for external
research at Microsoft, says research clouds will function as huge virtual
laboratories, with a new generation of librarians curating troves of data,
opening them to researchers with the right credentials.
Mark Dean,
head of IBM research in Almaden, California, says that the mixture of
business and science will lead, in a few short years, to networks of clouds
that will tax our imagination: "Compared to this, the Web is tiny. We'll be
laughing at how small the Web is."
The Grid
By Jonathan Leake The Sunday Times, April 6, 2008
Edited by Andy Ross
The scientists who pioneered the internet have now built a much faster
replacement — the grid. At speeds about 10,000 times faster than a typical
broadband connection, the grid is the latest spin-off from CERN, the
particle physics centre that created the web.
David Britton,
professor of physics at Glasgow University and a leading figure in the grid
project, believes grid technologies could revolutionise society: "With this
kind of computing power, future generations will have the ability to
collaborate and communicate in ways older people like me cannot even
imagine."
The grid will be activated at the same time as the Large
Hadron Collider (LHC) at CERN, based near Geneva. Scientists at CERN started
the grid computing project seven years ago when they realised the LHC would
generate many petabytes of data per year.
The grid has been built
with dedicated fibre optic cables and modern routing centres, meaning there
are no outdated components to slow the deluge of data. The 55,000 servers
already installed are expected to rise to 200,000 within the next two years.
Professor Tony Doyle, technical director of the grid project, said:
"We need so much processing power, there would even be an issue about
getting enough electricity to run the computers if they were all at CERN.
The only answer was a new network powerful enough to send the data instantly
to research centres in other countries."
That network is now built,
using fibre optic cables that run from CERN to 11 centres in the United
States, Canada, the Far East, Europe and around the world. From each centre,
further connections radiate out to a host of other research institutions
using existing high-speed academic networks.
Ian Bird, project
leader for CERN's high-speed computing project, said grid technology could
make the internet so fast that people would stop using desktop computers to
store information and entrust it all to the internet: "It will lead to
what's known as cloud computing, where people keep all their information
online and access it from anywhere."
Although the grid itself is
unlikely to be directly available to domestic internet users, many telecoms
providers and businesses are already introducing its pioneering
technologies, such as dynamic switching, which creates a dedicated channel
for internet users downloading large volumes of data such as films.
The LHC Computing Grid
CERN
Edited by Andy Ross
The Large Hadron Collider (LHC), currently being built at CERN near Geneva,
is the largest scientific instrument on the planet. When it begins
operations in 2007, it will produce roughly 15 petabytes of data annually,
which thousands of scientists around the world will access and analyse. The
mission of the Worldwide LHC Computing Grid (LCG) project is to build and
maintain a data storage and analysis infrastructure for the entire high
energy physics community that will use the LHC.
The data from the LHC
experiments will be distributed around the globe, according to a four-tiered
model. A primary backup will be recorded on tape at CERN, the Tier-0 centre
of LCG. After initial processing, this data will be distributed to a series
of Tier-1 centres, large computer centres with sufficient storage capacity
and with round-the-clock support for the grid.
The Tier-1 centres
will make data available to Tier-2 centres, each consisting of one or
several collaborating computing facilities, which can store sufficient data
and provide adequate computing power for specific analysis tasks. Individual
scientists will access these facilities through Tier-3 computing resources,
which can be local clusters in university departments.
When the LHC
accelerator is running optimally, access to experimental data needs to be
provided for the 5000 scientists in some 500 research institutes and
universities worldwide that are participating in the LHC experiments. In
addition, all the data needs to be available over the 15 year estimated
lifetime of the LHC. The analysis of the data, including comparison with
theoretical simulations, requires of the order of 100 000 CPUs at 2006
measures of processing power.
A globally distributed grid for data
storage and analysis provides several key benefits. The costs are more
easily handled in a distributed environment, where individual institutes and
participating national organisations can fund local computing resources and
retain responsibility for these. Also, there are fewer single points of
failure. Multiple copies of data and automatic reassigning of computational
tasks to available resources ensures load balancing of resources and
facilitates access to the data for all the scientists involved.
There
are also some challenges. These include ensuring adequate levels of network
bandwidth between the contributing resources, maintaining coherence of
software versions installed in various locations, coping with heterogeneous
hardware, managing and protecting the data so that it is not lost or
corrupted over the lifetime of the LHC, and providing accounting mechanisms
so that different groups have fair access.
The major computing
resources for LHC data analysis are provided by the Worldwide LHC Computing
Grid Collaboration — comprising the LHC experiments, the accelerator
laboratory and the Tier-1 and Tier-2 computer centres. The computing centres
providing resources for LCG are embedded in different operational grid
organisations. The LCG project is also following developments in industry,
where leading IT companies are testing and validating cutting-edge grid
technologies using the LCG environment.


|