Review: Databricks Lakehouse Platform | InfoWorld

Data lakes and data warehouses used to be completely different animals, but now they seem to be merging. A data lake was a single data repository that held all your data for analysis. The data was stored in its native form, at least initially. A data warehouse was an analytic database, usually relational, created from two or more data sources. The data warehouse was typically used to store historical data, most often using a star schema or at least a large set of indexes to support queries.Data lakes contained a very large amount of data and usually resided on Apache Hadoop clusters of commodity computers, using HDFS (Hadoop Distributed File System) and open source analytics frameworks. Originally, analytics meant MapReduce, but Apache Spark made a huge improvement in processing speed. It also supported stream processing and machine learning, as well as analyzing historic data. Data lakes didn’t impose a schema on data until it was used—a process known as schema on read.Data warehouses tended to have less data but it was better curated, with a predetermined schema that was imposed as the data was written (schema on write). Since they were designed primarily for fast analysis, data warehouses used the fastest possible storage, including solid-state disks (SSDs) once they were available, and as much RAM as possible. That made the storage hardware for data warehouses expensive.Databricks was founded by the people behind Apache Spark, and the company still contributes heavily to the open source Spark project. Databricks has also contributed several other products to open source, including MLflow, Delta Lake, Delta Sharing, Redash, and Koalas.This review is about Databricks’ current commercial cloud offering, Databricks Lakehouse Platform. Lakehouse, as you might guess, is a portmanteau of data lake and data warehouse. The platform essentially adds fast SQL, a data catalog, and analytics capabilities to a data lake. It has the functionality of a data warehouse without the need for expensive storage.

Source link

Review: Databricks Lakehouse Platform | InfoWorld

Byadmin

admin

Related Post

Ephemeral environments in cloud-native development

Why JavaScript’s still on top in 2025

Researchers build a bridge from C to Rust and memory safety

You missed

Savannah Marshall wants Claressa Shields rematch – for undisputed heavyweight world championship | Boxing News

Hyper Light Breaker: Watch 2 New Boss Fight Gameplay Videos

Next Week on Xbox: New Games for January 13 to 17

Robert Trent Jones golf club in Virginia to host LIV event | Headlines