Sunday, March 06, 2005

Google - How It Works

According to Urs Hoelzle at Eclipse Con 2005, Googles scaling strategy was integral to their search dominance and later emergence as a major player in this net-based software era.

They use cheap, commodity hardware, not high-end servers. Lowers cost/CPU allowing for greater redundancy. But, this increases potential maintenance.

They run Linux.

To win, they planned to fail; seems to break basic rules for success but actually it's smart. When you have hundreds or thousands of servers, expected hardware failure at any rate makes efficient responses to these failures less than trivial.

Urs took us through a few unbelievable, humorous slides showing their hardware progression from the late 90s on.

Redundancy is a Google core value. Not losing data is central to Google's business so, it makes sense. Requires reliable infrastructure building blocks. To achieve this Google realized several useful abstractions:

Google File System (GFS)

  • GFS Master manages metadata; these are replicated
  • 64 MB file 'chunks' are managed Chunkservers, also replicated 3X
  • Chunks also triplicated for fault tolerance.
  • GFS client servers directly access the GFS Master and Chunkservers

Basic Computing Cluster

  • Needed massive parallelization and distribution that are easy to use
  • MapReduce solves the problem. MapReducing = mapping + reduction.

Map: take input k/v and produce set of intermediate k/v pairs
Reduce: emit final, condensed k/v pair - these are sorted, merged search results

MapReduction is so redundant that, in one unplanned test, they lost 90% of their reduction 'worker' servers and all of the reduction tasks still completed. Now that's fault tolerance!

Regarding Query Frequency, he showed several successive graphs where frequency of
"eclipse" searches, before Eclipse.org, peaked every three years
"world series" searches peaked every year
"watermelon" peaked during the summer

Funny but, more importantly, Google uses these patterns to learn from the data. This learning process is broken into two basic steps: establishing relationships between searched data, then clustering the related documents for relevant search results.

It was an interesting talk. Their scaling approach isn't mind bending but it's sooo effective. What's most fascinating to me is that they had the audacity and forsight to tackle the problem at the beginning. For more information on How Google Works, go here.

The tiger in front

Recently i read this interesting article on Economist. In this Simon Long argues,"India can learn much from China's breakneck economic expansion. But it has valuable lessons for China, too.."

HOME to nearly two-fifths of humanity, two neighbouring countries, India and China, are two of the world's fastest-growing economies. The world is taking notice. In December, a report by America's National Intelligence Council likened their emergence in the early 21st century to the rise of Germany in the 19th and America in the 20th, with “impacts potentially as dramatic”.

Comparisons between the two are inevitable. Both are poor, largely agricultural, countries that have made great strides in reducing poverty, especially since embarking on radical, liberalising economic reform. But India and China, always very different civilisations, have followed very different paths to growth. Under reform, they have converged somewhat in the past two decades, but will remain distinctive.

Take the way the two countries reacted to the recent deaths of two reformist leaders. India's P.V. Narasimha Rao, who died in December, was prime minister of India in 1991, when his government rescued the country from financial crisis and launched India's economic reforms. He served until 1996, but was later convicted of corruption. Although he won an appeal, the taint never quite left his name. His death, however, was marked by a state funeral and seven days of official mourning. The media vigorously debated his legacy.

When Zhao Ziyang, a former Chinese prime minister and head of the Communist Party, died three weeks later, he got just a couple of lines from the official news agency. He had been out of favour since siding with student protesters in Beijing's Tiananmen Square against hardliners in his own party in 1989. Dissidents were prevented from attending his funeral. It took two weeks to negotiate an official obituary. His successors, nervous that his memory might evoke the bloody suppression of the protests, did their best to erase it.

That India is an open society and China is not is one of the most glaring differences between the two. Some people in both countries are tempted to use it to explain another: that China's economy has grown much faster. This survey will argue that this view is simplistic and misleading.

Some of the main reasons for China's better performance have nothing to do with the political system. When China started its reforms, in 1978, it was poorer than India. Part of the gap now is due simply to that earlier start. But also, unreformed China seems to have done a more impressive job than India did in educating and providing health care for its poor. Reforms benefited from what economists call “good human capital”, and from a bulge in the working-age population that India itself is now experiencing.

In terms of integration into the global economy, the Chinese reforms have gone much further than India's have, and reaped bigger rewards. But India and China still face similar challenges. When George Fernandes, an Indian opposition politician who was defence minister in the previous government, visited China in 2003, he asked China's prime minister, Wen Jiabao, to list his economic priorities. The answers—unemployment, regional disparities and the enduring poverty of farmers—applied just as much to India. Mr Fernandes, once known as a critic of China, concluded: “We are both sailing in the same boat.”

The two countries have much else in common. Both have massive populations with correspondingly massive needs for resources, especially land, water and energy. Both need to find ways of stemming environmental decay. Both suffer under-reported HIV infection rates. Both face potentially destabilising external disputes: China with America over Taiwan, India with Pakistan over Kashmir.