(Sorry – sing “Shaboom Shaboom and it makes sense)
ESG’er Bob Laliberte attended the Hadoop show in NY last week and had this to report:
“Last Friday I attended Hadoop World in NYC. It was just like VMworld only there were 12,500 fewer people there and very few vendors, but otherwise it was very similar. J. Considering it was the first year, it was a pretty good turnout. Attendees ranged from college students to representatives from GE, SIAC and McKinsey and the show seemed to have a pretty good buzz. Vendor tables were busy and presenters were hall tackled at breaks. There were also some consulting firms and more than a few press attendees.
Hadoop has some pretty big implications to our world. Specifically, with the rapid growth of data – how you manage it better and leverage it to create better business outcomes. Hadoop helps with a lot of that and there are some pretty impressive examples included in the attached notes.
What is it? An open source project from the Apache Software Foundation that provides a software framework for distributing and running applications on clusters of servers.
Why do we need Hadoop?– To change the way web companies think about data. However, it’s not just for Web companies, other large enterprises are using it as well. With PBs of data being more common, there are more challenges in utilizing and effectively leveraging storage. It is occurring across all verticals now.
How was it started? Google initiated it in 2004 with a paper on map reduce/GFS, and in 2005 there was a Hadoop prototype. However, it really took off when Yahoo made a commitment to Hadoop in 2006. By 2007 it was handling PBs of data and thousands of services and in 2008 hit the terasort benchmark
Yahoo example – Search Hints – the box that pops up with suggestions when you start typing
§  Big index behind that, looks at 3 yrs of data, 20 jobs of map reduce
§  Before Hadoop – took 26 days to run on the fastest box, with Hadoop it takes 20 minutes
IBM M&A example – Business, mergers and acquisitions – need to look at patent data
- Poll all of the patent data – 1.4 million patents – extract patent entities
- Run it by attorneys – 1 week to go through 1.4 m files via excel
- IBM’s M2 (powered by Hadoop) does it in 5 minutes”
More to follow, but looks like a cool movement………..
No related posts.
Tags: data management, Hadoop, search




In this blog I look beyond the obvious and try to find out why people and companies do what they do - and what it means for the rest of us.
blogs



