| By Jnan Dash | Article Rating: |
|
| July 11, 2011 04:25 PM EDT | Reads: |
3,493 |
The phrase “Big Data” is thrown around a lot these days. What exactly is referred to by this phrase? When I was part of IBM’s DB2 development team, the largest size limit of a DB2 Table was 64 Gigabytes (GB) and I thought who on earth can use this size of a database. Thirty years later, that number looks so small. Now you can buy a 1 Terabyte external drive for less than $100.
Let us start with a level set on the unit of storage. In multiples of 1000, we go from Byte – Kilobyte (KB) – Megabyte (MB) – Gigabyte (GB) – Terabyte (TB) – Petabyte (PB) – Exabyte (EB) – Zettabyte (ZB) – Yottabyte (YB). The last one YB is 10 to the power of 24. A typed page is 2KB. The entire book collection at the US Library of Congress is 15TB. The amount of data processed in one hour at Google is 1PB. The total amount of information in existence is around 1.27ZB. Now you get some context to these numbers.
When we say Big Data, we enter the petabyte space (1000 Terabytes). There is talk of “personal petabyte” to store all your audio, video, and pictures. The cost has come down from $2M in 2002 to $2K in 2012 – real Moore’s law in disk storage technology here. This is not the stuff for current commercial database products such as DB2 or Oracle or SQLServer. Such RDBMS’s handle maximum of 10 to 100 Terabyte sizes. Anything bigger would cause serious performance nightmares. These large databases are mostly in the decision support and data warehousing applications. Walmart is known to have its main retail transaction data warehouse at 100 plus terabytes in a Teradata DBMS system.
Most of the growth in data is in “files”, not in DBMS. Now we see huge volumes of data in social networking sites like Facebook. At the beginning of 2010, Facebook was handling more than 4TB per day (compressed). Now that it has gone to 750M users, that number is at least 50% more. The new Zuck’s (Zuckerberg) law is , “Shared contents double every 24 months”. The question is how to deal with such volumes.
Google pioneered the algorithm called MapReduce to process massive amounts of data via parallel processing through hundreds of thousands of commodity servers. A simple Google query you type, probably touches 700 to 1000 servers to yield that half-second response time. MapReduce was made an open source under the Apache umbrella and was released as Hadoop (by Doug Cutting, former Xerox Parc, Apple, now at Cloudera). Hadoop has a file store called HDFS besides the MapReduce computational process. Hadoop therefore is a “flexible and available architecture for large scale computation and data processing on a network of commodity servers”. What is Redhat to Linux is Cloudera (new VC funded company) to Hadoop.
While Hadoop is becoming a defacto standard for big data, it’s pedigree is batch. For near-real-time analytics, better answers are needed. Yahoo, for example, has a real time analytics project called S4. Several other innovations are happening in this area of realtime or near realtime analytics. Visualization is another hot area for big data.
Big Data offers many opportunities for innovation in next few years.
Read the original blog entry...
Published July 11, 2011 Reads 3,493
Copyright © 2011 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Jnan Dash
Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.
- Cloud People: A Who's Who of Cloud Computing
- Google Compute enters the IaaS market
- Cloud Expo NY: Environmental Pressures Drive an Evolution in File Storage
- The Software Freedom Conservancy – Fundraising Campaign: Non-Profit Accounting Software
- Cloud Expo NY: Interconnected Machines and the Future of Energy
- Cloud Conversations: AWS EBS, Glacier and S3 Overview | Part 3
- Healthcare Data on the Cloud – The Reality of Sensitive Information Online
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- Google Submits Concessions to EC; Gets Sued in the UK
- Step-by-Step: Extend Your Network to the Cloud with Windows Azure Virtual Networks
- Cloud Expo New York | Storage & Archive: Are Existing Offerings Relevant?
- Shadow IT – The Reality Is Here
- Cloud People: A Who's Who of Cloud Computing
- Cloud Expo New York: How to Use Google Apps Script
- Apple Ordered to Pay VirnetX $333K a Day
- Google Compute enters the IaaS market
- Cloud Expo NY: Environmental Pressures Drive an Evolution in File Storage
- The Software Freedom Conservancy – Fundraising Campaign: Non-Profit Accounting Software
- Cloud Expo NY: Interconnected Machines and the Future of Energy
- Cavalry Rides into Oracle’s Java Suit
- Samsung Uses Centrify for Safer Android Platform
- Cloud Conversations: AWS EBS, Glacier and S3 Overview | Part 3
- Google Maps May Be Banned in Germany
- Healthcare Data on the Cloud – The Reality of Sensitive Information Online
- Where Are RIA Technologies Headed in 2008?
- Personal Branding Checklist
- The Top 250 Players in the Cloud Computing Ecosystem
- AJAXWorld 2006 West Power Panel with Google's Adam Bosworth
- Why Microsoft Loves Google's Android
- Google's OpenSocial: A Technical Overview and Critique
- Cloud People: A Who's Who of Cloud Computing
- Wal-Mart To Sell $399 Ubuntu Linux-based Laptop with Google Operating System
- Cloud Expo New York Call for Papers Now Open
- Dolphin Announces Open API With Over 50 Add-ons Including Dropbox and Wikipedia
- i-Technology Blog: Google Trends on Java, McNealy, AJAX, and SOA Give Pause For Thought
- i-Technology Blog: Is There Life Beyond Google?























