Tech News, Magazine & Review WordPress Theme 2017
  • Supply Chain Updates
  • GLOBAL NEWS
  • REGIONAL NEWS
  • Industry Buzz
  • CURRENT ISSUES
No Result
View All Result
  • Supply Chain Updates
  • GLOBAL NEWS
  • REGIONAL NEWS
  • Industry Buzz
  • CURRENT ISSUES
No Result
View All Result
United States Supply Chain Management Council
No Result
View All Result
Home Supply Chain Updates

There’s No Need to Choose

usscmc by usscmc
January 13, 2021
There’s No Need to Choose
Share on FacebookShare on Twitter

(Halawi/Shutterstock)

Particularly with the industry spotlight on Snowflake following its recent IPO, there’s no shortage of discussion right now around cloud data warehouses, cloud data lakes, and how the two overlap – or don’t. For many enterprise data and analytics professionals trying to modernize to support ML and AI, there’s still a good deal of confusion on what each type of data solution offers and where the key differences lie.

In this primer, I’ll look at the strengths of each data platform and what each is built to excel at. While cloud data warehouse and cloud data lakes may solve disparate issues, they can – and executed right, should – complement one another. Used in tandem and backed by the power of the cloud, these two architectures can more fully harness the complete data and analytics picture to deliver the value and business insight that enterprises continue to seek out.

Entering the Cloud Data Warehouse

Cloud data warehouses are a decade-old technology that enables analytics by using a mostly relational processing engine – structuring data via tables and columns. Generally categorized as schema-on-write, writes to the data warehouse must adhere to previously established schema. This is true for any deployment style, including cloud data warehouses.

Naturally, SQL is the universal language of cloud data warehouses. JSON with SQL extensions and similar solutions can also allow for semi-structured data and schema-on-read functionality. However, these solutions add in prohibitively strict ACID transaction overhead. Many non-SQL transactions do not require this: schema-on-read can naturally support these applications, while utilizing less stringent transaction semantics and delivering better performance.

Cloud data warehouses also necessitate data to be cleaned and structured in close alignment with the questions and analysis that business applications are enlisted to solve. Any and all necessary schema changes require a long, intensive, and manual process that includes design work and landing the data in preparation for analysis processes.

A data warehouse sources data from an operational data store and additional files to produce business intelligence

Data Warehouse User-Defined Extensions (UDX)

Data warehouse relational engines enable advanced analysis by allowing application developers to write user-defined functions (UDFs) and user-defined aggregates (UDAs) – collectively known as user-defined extensions (UDXs). Leveraging UDXs can equip business analytics with a feature set surpassing what can be accomplished using standard SQL. UDXs are used in the same way as other standard SQL functions and aggregates in SQL statements. UDXs offer a full range of use cases and levels of complexity, from simply validating URLs all the way through to statistical functions, encryption/decryption, and compression/decompression.

Utilizing a Data Warehouse for Business Intelligence

Cloud data warehouses are commonly tapped to analyze historical data, support business intelligence applications, and fulfill business analysts’ needs for interactive reporting and other ad hoc tasks. For example, a data warehouse might enable a vendor to analyze their product inventory and sales by location, drilling down into data by country, region, and city. The organization can then leverage those insights to better optimize its supply chain and sales processes.

Diving into the Cloud Data Lake

Cloud Data lakes are generalized data processing platforms. They support modernization with a broader range of data and analytics processing needs when compared to SQL-based data warehouses. Data lakes are schema-on-read, with data schema determined as it arrives. Data lakes are built to handle structured, semi-structured, or unstructured data. For enterprises, data lakes offer a singular unified platform that serves a wider swath of use cases, from data science to data engineering, machine learning, and reporting. Data lakes are able to tap into this breadth of tools, analysis, and data options by leveraging SQL, NoSQL, Apache Spark, Apache Flink, and other data processing engines.

Importantly, data lakes are often incorrectly categorized as simple data stores like AWS S3 or Azure ADLS (and early data lakes did focus on large data storage). In fact  now provide the range of storage, processing capabilities, and tools necessary for a complete analytical environment for enterprise modernization. Within cloud data lakes, widely-used SQL processing engines offer support for modern advanced analytics, along with newer open source SQL engines like Impala, Presto, Arrow and others. Apache Spark is popularly used for its in-memory model and ability to process large data sets quickly.

Support for broader data types, use cases, and advanced analytics make data lakes a popular platform for all types of analytics

Cloud data lakes are increasingly deployed for data science, advanced analytics, ML and AI. For example, data can be pre-processed with Python or R to interact with an Apache Spark framework, then fed to a data science application to enable machine learning or predictive analytics. A manufacturer might utilize a cloud data lake to land IoT sensor data from its products in the field, and then perform data analysis to predict failures and address proactive remediation. In a case like that, a cloud data lake is essential to harness semi-structured data and apply the advanced, timely analytics required for success.

Comparing cloud data warehouse and cloud data lake governance models

With schema-on-write data warehouses, schema changes are…expensive. There’s really no great way to get around that. Therefore, data warehouses are governed by strict change control processes, overseeing all schema changes or data additions. By contrast, data lakes tend to have flexible governance models. In practice, a data lake might use a strict governance model for ingestion of core data alongside a more lenient model for quickly-ingested data, such as ad hoc datasets used for more exploratory analysis.

Data Lake and Warehouse Co-Existence

The cloud enables wholly separate provisioning and scaling of storage and compute resources. A new generation of cloud data warehouses and cloud data lakes now make the most of those capabilities to provide analytics in a flexible, scalable and, just as importantly, cost-efficient way. Data on a cloud object store can commonly be shared across both a cloud data warehouse and cloud data lake, with no duplication necessary in order for data ingestion and transformation to proceed.

With the cloud democratizing data and pressure to modernize, enterprises don’t need to choose between the strengths of cloud data warehouses and cloud data lakes: co-existence models allow them to easily realize the best of both worlds.

About the author: Venkat Chandra is a Data Architect at Cazena, which provides instant cloud data lakes for enterprises. Prior to Cazena, he was a Senior Engineer at IBM, working on data warehouses.

Related Items:

Cloud Is the New Center of Gravity for Data Warehousing

Data Lakes Are Legacy Tech, Fivetran CEO Says

Did Dremio Just Make Data Warehouses Obsolete?

 

usscmc

usscmc

Recommended.

A Pineda Procurement Proposal Por Favor

A Pineda Procurement Proposal Por Favor

November 14, 2019

SAP Ariba Live Rides Into Austin

September 20, 2020

Trending.

Port Delays Leave Cargo Ships Stranded off U.S. Pacific Gateways

Port Delays Leave Cargo Ships Stranded off U.S. Pacific Gateways

January 14, 2021
Here’s why dozens of cargo ships are parked for days off Long Beach’s coast • Long Beach Post News

Here’s why dozens of cargo ships are parked for days off Long Beach’s coast • Long Beach Post News

January 16, 2021
Top 5 Globally Recognized Supply Chain Certifications

Top 5 Globally Recognized Supply Chain Certifications

January 14, 2021
Volatile markets for vitamins, amino acids and freight set to continue into 2021

Volatile markets for vitamins, amino acids and freight set to continue into 2021

December 26, 2020
Viral Video Wipes Out Inventory For Local Catnip Company ‘Cat Crack’

Viral Video Wipes Out Inventory For Local Catnip Company ‘Cat Crack’

January 14, 2021
United States Supply Chain Management Council

Categories

  • Global News
  • Supply Chain Updates

Tags

APICS Globally Recognized Supply Chain Certifications IIPMR Certifications International Institute for Procurement and Market Research (IIPMR) ISM Next Level Purchasing Top 5 Supply Chain Certifications top supply chain certifications

Trending

Healthcare Supply Chain Management Market Size, Growth Opportunities, Trends, Key Players and Forecast to 2027 – The Courier

India approves procurement of 83 LCA Mk1A 'Tejas' fighter jets worth Rs 48,000 cr for Indian Air Force – Economic Times

Sentara hires supply chain exec

Container cargo rollovers at top 20 ports up 75% in December 2020

  • Privacy Policy
  • Terms of Use
  • Antispam
  • Disclaimer
  • Contact Us

© 2021 www.usscmc.com

No Result
View All Result
  • Supply Chain Updates
  • GLOBAL NEWS
  • REGIONAL NEWS
  • Industry Buzz
  • CURRENT ISSUES

© 2021 www.usscmc.com

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.