|By Jerry Held||
|August 5, 2008 01:15 PM EDT||
My belief is that cloud computing will change the economics of business intelligence (BI) and enable a variety of new analytic data management projects and business possibilities. It does so by making the hardware, networking, security, and software needed to create data marts and data warehouses available on demand with a pay-as-you-go approach to usage and licensing.
A computing cloud, such as the Amazon Elastic Compute Cloud, is composed of thousands of commodity servers running multiple virtual machine instances (VMs) of the applications hosted in the cloud. As customer demand for those applications changes, new servers are added to the cloud or idled and new VMs are instantiated or terminated.
Cloud computing infrastructure differs dramatically from the infrastructure underlying most in-house data warehouses and data marts. There are no high-end servers with dozens of CPU cores, SANs, replicated systems, or proprietary data warehousing appliances available in the cloud. Therefore, a new DBMS software architecture is required to enable large volumes of data to be analyzed quickly and reliably on the cloud's commodity hardware. Recent DBMS innovations make this a reality today, and the best cloud DBMS architectures will include:
- Shared-nothing, massively parallel processing (MPP) architecture. In order to drive down the cost of creating a utility computing environment, the best cloud service providers use huge grids of identical (or similar) computing elements. Each node in the grid is typically a compute engine with its own attached storage. For a cloud database to successfully "scale out" in such an environment, it is essential that the database have a shared-nothing architecture utilizing the resources (CPU, memory, and disk) found in server nodes added to the cluster. Most databases popularly used in BI today have shared-everything or shared-storage architectures, which will limit their ability to scale in the cloud.
- Automatic high availability. Within a cloud-based analytic database cluster, node failures, node changes, and connection disruptions can occur. Given the vast number of processing elements within a cloud, these failures can be made transparent to the end user if the database has the proper built-in failover capabilities. The best cloud databases will replicate data automatically across the nodes in the cloud cluster, be able to continue running in the event of 1 or more node failures ("k-safety"), and be capable of restoring data on recovered nodes automatically -- without DBA assistance. Ideally, the replicated data will be made "active" in different sort orders for querying to increase performance.
- Ultra-high performance. One of the game-changing advantages of the cloud is the ability to get an analytic application up quickly (without waiting for hardware procurement). However, there can be some performance penalty due to Internet connectivity speeds and the virtualized cloud environment. If the analytic performance is disappointing, the advantage is lost. Fortunately, the latest shared-nothing columnar databases are designed specifically for analytic workloads, and they have demonstrated dramatic performance improvements over traditional, row-oriented databases (as verified by industry experts, such as Gartner and Forrester, and by customer benchmarks). This software performance improvement, coupled with the hardware economies of scale provided by the cloud environment, results in a new economic model and competitive advantage for cloud analytics.
- Aggressive compression. Since cloud costs are typically driven by charges for processor and disk storage utilization, aggressive data compression will result in very large cost savings. Row-oriented databases can achieve compression factors of about 30% to 50%; however, the addition of necessary indexes and materialized views often swells databases to 2 to 5 times the size of the source data. But since the data in a column tends to be more similar and repetitive than attributes within rows, column databases often achieve much higher levels of compression. They also don't require indexes. The result is normally a 4x to 20x reduction in the amount of storage needed by columnar databases and a commensurate reduction in storage costs.
- Standards-based connectivity. While there are a number of special-purpose file systems that have been developed for the cloud environment that can provide high performance, they lack the standard connectivity needed to support general-purpose business analytics. The broad base of analytic users will use existing commercial ETL and reporting software that depend on SQL, JDBC, ODBC, and other DBMS connectivity standards to load and query cloud databases. Therefore, it's imperative for cloud databases to support these connection standards to enable widespread use of analytic applications.
- "Scaling out," as the cloud itself does
- Running fast without high-end or custom hardware
- Providing high availability in a fluid computing environment
- Minimizing data storage, transfer, and CPU utilization (to keep cloud computing fees low)
- Google’s Enterprise Problem
- Building Video Calling with PubNub and WebRTC
- DataStax Announces New Startup Programme Offering Free Software, As Well As Free Training Courses For Cassandra Users And New Developer Tool
- Get Ready to Think Out (C)loud With Cloud Sherpas’ Upcoming Webinar Series
- Evaluation Report on Virtual Backup Software
- New PubNub App Template for WebRTC
- Strategic Enough to Matter, Code Halos and Mobile Apps
- GAMA : Quatre acteurs clefs, quatre stratégies différentes !
- 7 Christmas Gifts For Your Business
- Box and NSI Partnership Brings the Cloud to Businesses in the Middle East
- Wowza Joins Google Cloud Platform with Google Compute Engine
- The Master Plan for Enterprise Mobility
- WebRTC Summit at Cloud Expo Agenda Announced
- OneLogin Raises $13M to Power Expansion
- Cloud Security Alliance Releases Cloud Controls Matrix, Version 3.0
- Survey Finds Large Enterprises Adopting WebRTC
- WebRTC Summit | WebRTC: Test then Disrupt
- WebRTC Summit Speaker Submissions Open
- WSO2 Expands Identity Management Capabilities Across Cloud, Mobile and Web Applications With the Launch of WSO2 Identity Server 4.5
- Twilio and LiveOps to Deliver WebRTC Deployments
- BMC Software to Exhibit at Cloud Expo Silicon Valley
- Oracle Demonstrates WebRTC Solution with CounterPath's Bria
- OpenStack for the Enterprise – Showcasing the OpenStack Ecosystem
- GENBAND Showcases WebRTC and Cloud
- Where Are RIA Technologies Headed in 2008?
- The Top 250 Players in the Cloud Computing Ecosystem
- Dolphin Announces Open API With Over 50 Add-ons Including Dropbox and Wikipedia
- Personal Branding Checklist
- AJAXWorld 2006 West Power Panel with Google's Adam Bosworth
- Why Microsoft Loves Google's Android
- Google's OpenSocial: A Technical Overview and Critique
- Cloud Expo New York Call for Papers Now Open
- Wal-Mart To Sell $399 Ubuntu Linux-based Laptop with Google Operating System
- i-Technology Blog: Google Trends on Java, McNealy, AJAX, and SOA Give Pause For Thought
- i-Technology Blog: Is There Life Beyond Google?
- Android: Who Hates Google Over the Phone?