Tech News, Magazine & Review WordPress Theme 2017
  • Home
  • Supply Chain Updates
  • Global News
  • Contact Us
  • Home
  • Supply Chain Updates
  • Global News
  • Contact Us
No Result
View All Result
No Result
View All Result
Home Supply Chain Updates

There’s No Need to Choose

usscmc by usscmc
January 13, 2021
There’s No Need to Choose
Share on FacebookShare on Twitter

(Halawi/Shutterstock)

Particularly with the industry spotlight on Snowflake following its recent IPO, there’s no shortage of discussion right now around cloud data warehouses, cloud data lakes, and how the two overlap – or don’t. For many enterprise data and analytics professionals trying to modernize to support ML and AI, there’s still a good deal of confusion on what each type of data solution offers and where the key differences lie.

In this primer, I’ll look at the strengths of each data platform and what each is built to excel at. While cloud data warehouse and cloud data lakes may solve disparate issues, they can – and executed right, should – complement one another. Used in tandem and backed by the power of the cloud, these two architectures can more fully harness the complete data and analytics picture to deliver the value and business insight that enterprises continue to seek out.

Entering the Cloud Data Warehouse

Cloud data warehouses are a decade-old technology that enables analytics by using a mostly relational processing engine – structuring data via tables and columns. Generally categorized as schema-on-write, writes to the data warehouse must adhere to previously established schema. This is true for any deployment style, including cloud data warehouses.

Naturally, SQL is the universal language of cloud data warehouses. JSON with SQL extensions and similar solutions can also allow for semi-structured data and schema-on-read functionality. However, these solutions add in prohibitively strict ACID transaction overhead. Many non-SQL transactions do not require this: schema-on-read can naturally support these applications, while utilizing less stringent transaction semantics and delivering better performance.

Cloud data warehouses also necessitate data to be cleaned and structured in close alignment with the questions and analysis that business applications are enlisted to solve. Any and all necessary schema changes require a long, intensive, and manual process that includes design work and landing the data in preparation for analysis processes.

A data warehouse sources data from an operational data store and additional files to produce business intelligence

Data Warehouse User-Defined Extensions (UDX)

Data warehouse relational engines enable advanced analysis by allowing application developers to write user-defined functions (UDFs) and user-defined aggregates (UDAs) – collectively known as user-defined extensions (UDXs). Leveraging UDXs can equip business analytics with a feature set surpassing what can be accomplished using standard SQL. UDXs are used in the same way as other standard SQL functions and aggregates in SQL statements. UDXs offer a full range of use cases and levels of complexity, from simply validating URLs all the way through to statistical functions, encryption/decryption, and compression/decompression.

Utilizing a Data Warehouse for Business Intelligence

Cloud data warehouses are commonly tapped to analyze historical data, support business intelligence applications, and fulfill business analysts’ needs for interactive reporting and other ad hoc tasks. For example, a data warehouse might enable a vendor to analyze their product inventory and sales by location, drilling down into data by country, region, and city. The organization can then leverage those insights to better optimize its supply chain and sales processes.

Diving into the Cloud Data Lake

Cloud Data lakes are generalized data processing platforms. They support modernization with a broader range of data and analytics processing needs when compared to SQL-based data warehouses. Data lakes are schema-on-read, with data schema determined as it arrives. Data lakes are built to handle structured, semi-structured, or unstructured data. For enterprises, data lakes offer a singular unified platform that serves a wider swath of use cases, from data science to data engineering, machine learning, and reporting. Data lakes are able to tap into this breadth of tools, analysis, and data options by leveraging SQL, NoSQL, Apache Spark, Apache Flink, and other data processing engines.

Importantly, data lakes are often incorrectly categorized as simple data stores like AWS S3 or Azure ADLS (and early data lakes did focus on large data storage). In fact  now provide the range of storage, processing capabilities, and tools necessary for a complete analytical environment for enterprise modernization. Within cloud data lakes, widely-used SQL processing engines offer support for modern advanced analytics, along with newer open source SQL engines like Impala, Presto, Arrow and others. Apache Spark is popularly used for its in-memory model and ability to process large data sets quickly.

Support for broader data types, use cases, and advanced analytics make data lakes a popular platform for all types of analytics

Cloud data lakes are increasingly deployed for data science, advanced analytics, ML and AI. For example, data can be pre-processed with Python or R to interact with an Apache Spark framework, then fed to a data science application to enable machine learning or predictive analytics. A manufacturer might utilize a cloud data lake to land IoT sensor data from its products in the field, and then perform data analysis to predict failures and address proactive remediation. In a case like that, a cloud data lake is essential to harness semi-structured data and apply the advanced, timely analytics required for success.

Comparing cloud data warehouse and cloud data lake governance models

With schema-on-write data warehouses, schema changes are…expensive. There’s really no great way to get around that. Therefore, data warehouses are governed by strict change control processes, overseeing all schema changes or data additions. By contrast, data lakes tend to have flexible governance models. In practice, a data lake might use a strict governance model for ingestion of core data alongside a more lenient model for quickly-ingested data, such as ad hoc datasets used for more exploratory analysis.

Data Lake and Warehouse Co-Existence

The cloud enables wholly separate provisioning and scaling of storage and compute resources. A new generation of cloud data warehouses and cloud data lakes now make the most of those capabilities to provide analytics in a flexible, scalable and, just as importantly, cost-efficient way. Data on a cloud object store can commonly be shared across both a cloud data warehouse and cloud data lake, with no duplication necessary in order for data ingestion and transformation to proceed.

With the cloud democratizing data and pressure to modernize, enterprises don’t need to choose between the strengths of cloud data warehouses and cloud data lakes: co-existence models allow them to easily realize the best of both worlds.

About the author: Venkat Chandra is a Data Architect at Cazena, which provides instant cloud data lakes for enterprises. Prior to Cazena, he was a Senior Engineer at IBM, working on data warehouses.

Related Items:

Cloud Is the New Center of Gravity for Data Warehousing

Data Lakes Are Legacy Tech, Fivetran CEO Says

Did Dremio Just Make Data Warehouses Obsolete?

 

usscmc

usscmc

No Result
View All Result

Recent Posts

  • How Hapag Lloyd captured a major market share in the Container Shipping Industry in USA
  • Why USA’s East Coast is the Favorite Destination for Manufacturing Companies
  • How Trade Relations Between the USA and UK Improved After Keir Starmer Became Prime Minister
  • Tips and Tricks for Procurement Managers to Handle Their Supplier Woes
  • The Crazy Supply Chain of Walmart Spanning Across the Globe

Recent Comments

  • Top 5 Supply Chain Certifications that are in high demand | Top 5 Certifications on Top 5 Globally Recognized Supply Chain Certifications
  • 3 Best Procurement Certifications that are most valuable | Procurement Newz on Top 5 Globally Recognized Supply Chain Certifications

Archives

  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • September 2019

Categories

  • Global News
  • Supply Chain Updates

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
  • Antispam
  • Contact Us
  • Disclaimer
  • Home
  • Privacy Policy
  • Terms of Use

© 2025 www.usscmc.com

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Home
  • Supply Chain Updates
  • Global News
  • Contact Us

© 2025 www.usscmc.com