Ausnahme gefangen: SSL certificate problem: certificate is not yet valid ๐Ÿ“Œ The Future of the Data Lakehouse โ€“ Open

๐Ÿ  Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeitrรคge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden รœberblick รผber die wichtigsten Aspekte der IT-Sicherheit in einer sich stรคndig verรคndernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch รผbersetzen, erst Englisch auswรคhlen dann wieder Deutsch!

Google Android Playstore Download Button fรผr Team IT Security



๐Ÿ“š The Future of the Data Lakehouse โ€“ Open


๐Ÿ’ก Newskategorie: IT Security Nachrichten
๐Ÿ”— Quelle: cio.com

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. In recent years, the term โ€œdata lakehouseโ€ was coined to describe this architectural pattern of tabular analytics over data in the data lake. In a rush to own this term, many vendors have lost sight of the fact that the openness of a data architecture is what guarantees its durability and longevity.

On data warehouses and data lakes

Data lakes and data warehouses unify large volumes and varieties of data into a central location.ย  But with vastly different architectural worldviews.ย  Warehouses are vertically integrated for SQL Analytics, whereas Lakes prioritize flexibility of analytic methods beyond SQL.

Quest

In order to realize the benefits of both worldsโ€”flexibility of analytics in data lakes, and simple and fast SQL in data warehousesโ€”companies often deployed data lakes to complement their data warehouses, with the data lake feeding a data warehouse system as the last step of an extract, transform, load (ETL) or ELT pipeline. In doing so, theyโ€™ve accepted the resulting lock-in of their data in warehouses.ย 

But there was a better way: enter the Hive Metastore, one of the sleeper hits of the data platform of the last decade. As use cases matured, we saw the need for both efficient, interactive BI analytics and transactional semantics to modify data.

Iterations of the lakehouse

The first generation of the Hive Metastore attempted to address the performance considerations to run SQL efficiently on a data lake. It provided the concept of a database, schemas, and tables for describing the structure of a data lake in a way that let BI tools traverse the data efficiently. It added metadata that described the logical and physical layout of the data, enabling cost-based optimizers, dynamic partition pruning, and a number of key performance improvements targeted at SQL analytics.

The second generation of the Hive Metastore added support for transactional updates with Hive ACID. The lakehouse, while not yet named, was very much thriving. Transactions enabled the use cases of continuous ingest and inserts/updates/deletes (or MERGE), which opened up data warehouse style querying, capabilities, and migrations from other warehousing systems to data lakes. This was enormously valuable for many of our customers.

Projects like Delta Lake took a different approach at solving this problem. Delta Lake added transaction support to the data in a lake. This allowed data curation and brought the possibility to run data warehouse-style analytics to the data lake.

Somewhere along this timeline, the name โ€œdata lakehouseโ€ was coined for this architecture pattern. We believe lakehouses are a great way to succinctly define this pattern and have gained mindshare very quickly among customers and the industry.ย 

What have customers been telling us?

In the last few years, as new data types are born and newer data processing engines have emerged to simplify analytics, companies have come to expect that the best of both worlds truly does requireย analytic engine flexibility. If large and valuable data for the enterprise is managed, then there has to be openness for the business to choose different analytic engines, or even vendors.

The lakehouse pattern, as implemented, had a critical contradiction at heart: while lakes were open, lakehouses were not.

The Hive metastore followed a Hive-first evolution, before adding engines like Impala, Spark, among others. Delta lake had a Spark-heavy evolution; customer options dwindle rapidly if they need freedom to choose a different engine than what is primary to the table format.ย 

Customers demanded more from the start. More formats, more engines, more interoperability. Today, the Hive metastore is used from multiple engines and with multiple storage options. Hive and Spark of course, but also Presto, Impala, and many more. The Hive metastore evolved organically to support these use cases, so integration was often complex and error prone.ย ย 

An open data lakehouse designed with this need for interoperability addresses this architectural problem at its core. It will make those who are โ€œall inโ€ on one platform uncomfortable, but community-driven innovation is about solving real-world problems in pragmatic ways with best-of-breed tools, and overcoming vendor lock-in whether they approve or not.

An open lakehouse, and the birth of Apache Iceberg

Apache Iceberg was built from inception with the goal to be easily interoperable across multiple analytic engines and at a cloud-native scale. Netflix, where this innovation was born, is perhaps the best example of aย 100 PB scale S3 data lakeย that needed to be built into a data warehouse. Theย cloud native table formatย was open sourced into Apache Iceberg by its creators.

Apache Icebergโ€™s real superpower is its community. Organically, over the last three years, Apache Iceberg has added an impressive roster of first-class integrations with a thriving community:

  • Data processing and SQL engines Hive, Impala, Spark, PrestoDB, Trino, Flink
  • Multiple file formats: Parquet, AVRO, ORC
  • Large adopters in the community: Apple, LinkedIn, Adobe, Netflix, Expedia and others
  • Managed services with AWS Athena, Cloudera, EMR, Snowflake, Tencent, Alibaba, Dremio, Starburst

What makes this varied community thrive is the collective need of thousands of companies to ensure that data lakes can evolve to subsume data warehouses, while preserving analytic flexibility and openness across engines. This enables an open lakehouse: one that offers unlimited analytic flexibility for the future.

Quest

How are we embracing Iceberg?

At Cloudera, we are proud of our open-source roots and committed to enriching the community.ย  Since 2021, we have contributed to the growing Iceberg community with hundreds of contributions across Impala, Hive, Spark, and Iceberg. We extended the Hive Metastore and added integrations to our many open-source engines to leverage Iceberg tables. In early 2022, we enabled aย Technical Preview of Apache Iceberg in Cloudera Data Platformย allowing Cloudera customers to realize the value of Icebergโ€™s schema evolution and time travel capabilities in our Data Warehousing, Data Engineering and Machine Learning services.

Our customers have consistently told us that analytic needs evolve rapidly, whether it is modern BI, AI/ML, data science, or more.ย  Choosing an open data lakehouse powered by Apache Iceberg gives companies the freedom of choice for analytics.

If you want to learn more, join us on June 21 on our webinar withย Ryan Blue, co-creator of Apache Iceberg and Anjali Norwood, Big Data Compute Lead at Netflix.

Cloud Management
...



๐Ÿ“Œ Open Lakehouse Engineering/Apache Iceberg Lakehouse Engineering - A Directory of Resources


๐Ÿ“ˆ 48.81 Punkte

๐Ÿ“Œ The Future of the Data Lakehouse โ€“ Open


๐Ÿ“ˆ 37.76 Punkte

๐Ÿ“Œ How the open data lakehouse makes data mesh realโ€”and radically expands data use for business


๐Ÿ“ˆ 36.79 Punkte

๐Ÿ“Œ Big data: ecco come orientarsi tra data warehouse, data lake e data lakehouse


๐Ÿ“ˆ 35.48 Punkte

๐Ÿ“Œ Cloudera Unveils Next Phase of Open Data Lakehouse Focused on Maximizing Customer Data to Unlock Enterprise AI


๐Ÿ“ˆ 33.43 Punkte

๐Ÿ“Œ Open Source first Anniversary Star 1.2K! Review on the anniversary of LakeSoul, the unique open-source Lakehouse


๐Ÿ“ˆ 31.39 Punkte

๐Ÿ“Œ The rise of the data lakehouse: A new era of data value


๐Ÿ“ˆ 28.78 Punkte

๐Ÿ“Œ Privacera connects to Dremioโ€™s data lakehouse to aid data governance


๐Ÿ“ˆ 28.78 Punkte

๐Ÿ“Œ A New Era of Data Analytics: Exploring the Innovative World of Data Lakehouse Architectures


๐Ÿ“ˆ 28.78 Punkte

๐Ÿ“Œ Neue Tools fรผr den Bau von Datenarchitekturen: Databricks stellt Lakehouse-Framework Open-Source


๐Ÿ“ˆ 26.73 Punkte

๐Ÿ“Œ Databricks stellt Lakehouse-Framework Open-Source - computerwoche.de


๐Ÿ“ˆ 26.73 Punkte

๐Ÿ“Œ Dremioโ€™s open lakehouse now supports SQL DML and DDL operations on Apache Iceberg


๐Ÿ“ˆ 26.73 Punkte

๐Ÿ“Œ Databricks, champion of data "lakehouse" model, closes $1B series G funding round


๐Ÿ“ˆ 25.43 Punkte

๐Ÿ“Œ Databricks stellt neue Innovationen fรผr seine Data Lakehouse Plattform vor


๐Ÿ“ˆ 25.43 Punkte

๐Ÿ“Œ Was ist ein Data Lakehouse? - Storage-Insider


๐Ÿ“ˆ 25.43 Punkte

๐Ÿ“Œ Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset


๐Ÿ“ˆ 25.43 Punkte

๐Ÿ“Œ Building the Next-Generation Data Lakehouse: 10X Performance


๐Ÿ“ˆ 25.43 Punkte

๐Ÿ“Œ High-Performance Analytics for the Data Lakehouse


๐Ÿ“ˆ 25.43 Punkte

๐Ÿ“Œ Why a Data Lakehouse Architecture


๐Ÿ“ˆ 25.43 Punkte

๐Ÿ“Œ What is a Data Lakehouse? Definition, Benefits & Features


๐Ÿ“ˆ 25.43 Punkte

๐Ÿ“Œ Delta, Hudi, and Iceberg: The Data Lakehouse Trifecta


๐Ÿ“ˆ 25.43 Punkte

๐Ÿ“Œ Demo: Querying 100TB TPC-DS dataset across MySQL and data lake with HeatWave Lakehouse


๐Ÿ“ˆ 25.43 Punkte

๐Ÿ“Œ Query Engine Photon fรผr alle Lakehouse-Systeme


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Datenanalyse: Oracle springt auf den Lakehouse-Zug auf


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Datenanalyse: Oracle springt auf den Lakehouse-Zug auf


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Schnellere Analysen ohne ETL: Oracle steigt ins Lakehouse-Rennen ein


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Vereinfachtes maschinelles Lernen fรผr Databricks' Lakehouse - IAVCworld


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Databricks raises $1B to accelerate innovation and support rapid adoption of the lakehouse


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Databricks fรผhrt vereinfachtes maschinelles Lernen in Echtzeit fรผr das Lakehouse ein


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Databricks spezifiziert weiter: Ein Lakehouse fรผr die Industrie


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Hunters integrates its SOC Platform with Databricks Lakehouse


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Demo: Introducing MySQL HeatWave Lakehouse on AWS


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Demo: HeatWave AutoML on MySQL HeatWave Lakehouse


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Demo: Analyzing MySQL database export using HeatWave Lakehouse


๐Ÿ“ˆ 22.08 Punkte

๐Ÿ“Œ Hands-On Lab: Win the SailGP Race with MySQL HeatWave Lakehouse


๐Ÿ“ˆ 22.08 Punkte











matomo