Welcome!

Release Management Authors: Pat Romanski, Elizabeth White, David H Deans, Liz McMillan, Jnan Dash

Related Topics: @CloudExpo, Artificial Intelligence, @BigDataExpo, @ThingsExpo

@CloudExpo: Blog Feed Post

“Unlearn” to Unleash Your #DataLake | @CloudExpo @Schmarzo #BigData #AI #DX

The Data Science Process is about exploring, experimenting, and testing new data sources and analytic tools quickly

It takes years – sometimes a lifetime – to perfect certain skills in life: hitting a jump shot off the dribble, nailing that double high C on the trumpet, parallel parking a Ford Expedition. Malcolm Gladwell wrote a book, “Outliers,” discussing the amount of work – 10,000 hours – required to perfect a skill (while the exactness of 10,000 hours has come under debate, it is still a useful point that people need to invest considerable time and effort to master a skill). But once we get comfortable with something that we feel that we have mastered, we become reluctant to change. We are reluctant to unlearn what we’ve taken so long to master.

Changing your point of release on a jump shot or your embouchure for playing lead trumpet is dang hard! Why? Because it is harder to unlearn that it is to learn. It is harder to un-wire all those synoptic nerve endings and deep memories than it was to wire them in the first place. It’s not just a case of thinking faster, smaller or cheaper; it necessitates thinking differently.

For example, why did it take professional basketball so long to understand the game changing potential of the 3-point shot? The 3-point shot was added to the NBA during the 1979-1980 season, but for decades the 3-point shot was more a novelty then a serious game strategy. Pat Riley, the legendary coach of the 3-pointer’s first decade in the league (won NBA Championships in 1982, 1985, 1987 and 1988), called it a “gimmick.” Larry Bird, one of that era’s top players said: “I really don’t like it.”

It’s only been within the past 3 years where the “economics of the 3-point shot” have changed the fundamentals of how to win an NBA Championship (see Figure 1).

Figure 1: NBA 3-point Baskets per Season

NBA Coaches and General Managers just didn’t comprehend the “economics of the 3-point shot” and how the 3-point shot could turn a good shooter into a dominant player; that a 40% 3-point shooting percentage is equivalent to a 60% 2-point shooting percentage from a points / productivity perspective. The economics of the 3-point shot (coupled with rapid ball movement to create uncontested 3-point shots) wasn’t full exploited until the 2015-2016 season by the Golden State Warriors. Their success over the past 3 seasons (3 trips to the NBA finals with 2 championships) shows how much the game of basketball has been changed.

Sometimes it’s necessary to unlearn long held beliefs (i.e. 2-point shooting in a predominately isolation offense game) in order to learn new, more powerful, game changing beliefs (i.e., 3-point shooting in a rapid ball movement offense).

Sticking with our NBA example, Phil Jackson is considered one of the greatest NBA coaches, with 11 NBA World Championships coaching the Chicago Bulls and the Los Angeles Lakers. Phil Jackson mastered the “Triangle Offense” that played to the strengths of the then dominant players Michael Jordan (Chicago Bulls) and Kobe Bryant (Los Angeles Lakers) to win those 11 titles.

However, the game passed Phil Jackson as the economics of the 3-point shot changed how to win. Jackson’s tried-and-true “Triangle Offense” failed with the New York Knicks leading to the team’s dramatic under-performance and ultimately his firing. It serves as a stark reminder of how important it is to be ready to unlearn old skills in order to move forward.

And what holds true for sports, holds even more so for technology and business.

The Challenge of Unlearning
For the first two decades of my career, I worked to perfect the art of data warehousing. I was fortunate to be at Metaphor Computers in the 1980’s where we refined the art of dimensional modeling and star schemas. I had many years working to perfect my star schema and dimensional modeling skills with data warehouse luminaries like Ralph Kimball, Margy Ross, Warren Thornthwaite, and Bob Becker. It became engrained in every customer conversation; I’d built a star schema and the conformed dimensions in my head as the client explained their data analysis requirements.

Then Yahoo happened to me and soon everything that I held as absolute truth was turned upside down. I was thrown into a brave new world of analytics based upon petabytes of semi-structured and unstructured data, hundreds of millions of customers with 70 to 80 dimensions and hundreds of metrics, and the need to make campaign decisions in fractions of a second. There was no way that my batch “slice and dice” business intelligence and highly structured data warehouse approach was going to work in this brave new world of real-time, predictive and prescriptive analytics.

I struggled to unlearn engrained data warehousing concepts in order to embrace this new real-time, predictive and prescriptive world. And this is one of the biggest challenge facing IT leaders today – how to unlearn what they’ve held as gospel and embrace what is new and different. And nowhere do I see that challenge more evident then when I’m discussing Data Science and the Data Lake.

Embracing The “Art of Failure” and The Data Science Process
Nowadays, Chief Information Officers (CIOs) are being asked to lead the digital transformation from a batch world that uses data and analytics to monitor the business to a real-time world that exploits internal and external, structured and unstructured data, to predict what is likely to happen and prescribe recommendations. To power this transition, CIO’s must embrace a new approach for deriving customer, product, and operational insights – the Data Science Process (see Figure 2).

Figure 2:  Data Science Engagement Process

The Data Science Process is about exploring, experimenting, and testing new data sources and analytic tools quickly, failing fast but learning faster. The Data Science process requires business leaders to get comfortable with “good enough” and failing enough times before one becomes comfortable with the analytic results. Predictions are not a perfect world with 100% accuracy. As Yogi Berra famously stated:

“It’s tough to make predictions, especially about the future.”

This highly iterative, fail-fast-but-learn-faster process is the heart of digital transformation – to uncover new customer, product, and operational insights that can optimize key business and operational processes, mitigate regulatory and compliance risks, uncover new revenue streams and create a more compelling, more prescriptive customer engagement. And the platform that is enabling digital transformation is the Data Lake.

The Power of the Data Lake
The data lake exploits the economics of big data; coupling commodity, low-cost servers and storage with open source tools and technologies, is 50x to 100x cheaper to store, manage and analyze data then using traditional, proprietary data warehousing technologies. However, it’s not just cost that makes the data lake a more compelling platform than the data warehouse. The data lake also provides a new way to power the business, based upon new data and analytics capabilities, agility, speed, and flexibility (see Table 1).

Data Warehouse Data Lake
Data structured in heavily-engineered structured dimensional schemas Data structured as-is (structured, semi-structured, and unstructured formats)
Heavily-engineered, pre-processed data ingestion Rapid as-is data ingestion
Generates retrospective reports from historical, operational data sources Generates predictions and prescriptions from a wide variety of internal and external data sources
100% accurate results of past events and performance “Good enough” predictions of future events and performance
Schema-on-load to support the historical reporting on what the business did Schema-on-query to support the rapid data exploration and hypothesis testing
Extremely difficult to ingest and explore new data sources (measured in weeks or months) Easy and fast to ingest and explore new data sources (measured in hours or days)
Monolithic design and implementation (water fall) Natively parallel scale out design and implementation (scrum)
Expensive and proprietary Cheap and open source
Widespread data proliferation (data warehouses and data marts) Single managed source of organizational data
Rigid; hard to change Agile; relatively ease to change

Table 1:  Data Warehouse versus Data Lake

The data lake supports the unique requirements of the data science team to:

  • Rapidly explore and vet new structured and unstructured data sources
  • Experiment with new analytics algorithms and techniques
  • Quantify cause and effect
  • Measure goodness of fit

The data science team needs to be able perform this cycle in hours or days, not weeks or months. The data warehouse cannot support these data science requirements. The data warehouse cannot rapidly exploration the internal and external structured and unstructured data sources. The data warehouse cannot leverage the growing field of deep learning/machine learning/artificial intelligence tools to quantify cause-and-effect. Thinking that the data lake is “cold storage for our data warehouse” – as one data warehouse expert told me – misses the bigger opportunity. That’s yesterday’s “triangle offense” thinking. The world has changed, and just like how the game of basketball is being changed by the “economics of the 3-point shot,” business models are being changed by the “economics of big data.”

But a data lake is more than just a technology stack. To truly exploit the economic potential of the organization’s data, the data lake must come with data management services covering data accuracy, quality, security, completeness and governance. See “Data Lake Plumbers: Operationalizing the Data Lake” for more details (see Figure 3).

Figure 3:  Components of a Data Lake

If the data lake is only going to be used another data repository, then go ahead and toss your data into your unmanageable gaggle of data warehouses and data marts.

BUT if you are looking to exploit the unique characteristics of data and analytics –assets that never deplete, never wear out and can be used across an infinite number of use cases at zero marginal cost – then the data lake is your “collaborative value creation” platform. The data lake becomes that platform that supports the capture, refinement, protection and re-use of your data and analytic assets across the organization.

But one must be ready to unlearn what they held as the gospel truth with respect to data and analytics; to be ready to throw away what they have mastered to embrace new concepts, technologies, and approaches. It’s challenging, but the economics of big data are too compelling to ignore. In the end, the transition will be enlightening and rewarding. I know, because I have made that journey.

The post “Unlearn” to Unleash Your Data Lake appeared first on InFocus Blog | Dell EMC Services.

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

@ThingsExpo Stories
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, will discuss how by using...
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lead...
SYS-CON Events announced today that CAST Software will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CAST was founded more than 25 years ago to make the invisible visible. Built around the idea that even the best analytics on the market still leave blind spots for technical teams looking to deliver better software and prevent outages, CAST provides the software intelligence that matter ...
SYS-CON Events announced today that Daiya Industry will exhibit at the Japanese Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ruby Development Inc. builds new services in short period of time and provides a continuous support of those services based on Ruby on Rails. For more information, please visit https://github.com/RubyDevInc.
As businesses evolve, they need technology that is simple to help them succeed today and flexible enough to help them build for tomorrow. Chrome is fit for the workplace of the future — providing a secure, consistent user experience across a range of devices that can be used anywhere. In her session at 21st Cloud Expo, Vidya Nagarajan, a Senior Product Manager at Google, will take a look at various options as to how ChromeOS can be leveraged to interact with people on the devices, and formats th...
SYS-CON Events announced today that Yuasa System will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Yuasa System is introducing a multi-purpose endurance testing system for flexible displays, OLED devices, flexible substrates, flat cables, and films in smartphones, wearables, automobiles, and healthcare.
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities – ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups. As a result, many firms employ new business models that place enormous impor...
SYS-CON Events announced today that SourceForge has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SourceForge is the largest, most trusted destination for Open Source Software development, collaboration, discovery and download on the web serving over 32 million viewers, 150 million downloads and over 460,000 active development projects each and every month.
SYS-CON Events announced today that Dasher Technologies will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Dasher Technologies, Inc. ® is a premier IT solution provider that delivers expert technical resources along with trusted account executives to architect and deliver complete IT solutions and services to help our clients execute their goals, plans and objectives. Since 1999, we'v...
As popularity of the smart home is growing and continues to go mainstream, technological factors play a greater role. The IoT protocol houses the interoperability battery consumption, security, and configuration of a smart home device, and it can be difficult for companies to choose the right kind for their product. For both DIY and professionally installed smart homes, developers need to consider each of these elements for their product to be successful in the market and current smart homes.
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
SYS-CON Events announced today that TidalScale, a leading provider of systems and services, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale has been involved in shaping the computing landscape. They've designed, developed and deployed some of the most important and successful systems and services in the history of the computing industry - internet, Ethernet, operating s...
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, will discuss how from store operations...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, will lead you through the exciting evolution of the cloud. He'll look at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering ...
Infoblox delivers Actionable Network Intelligence to enterprise, government, and service provider customers around the world. They are the industry leader in DNS, DHCP, and IP address management, the category known as DDI. We empower thousands of organizations to control and secure their networks from the core-enabling them to increase efficiency and visibility, improve customer service, and meet compliance requirements.