Databricks makes bringing data into its ‘lakehouse’ easier

Databricks today announced that launch of its new Data Ingestion Network of partners and the launch of its Databricks Ingest service. The idea here is to make it easier for businesses to combine the best of data warehouses and data lakes into a single platform — a concept Databricks likes to call ‘lakehouse.’

At the core of the company’s lakehouse is Delta Lake, Databricks’ Linux Foundation-managed open-source project that brings a new storage layer to data lakes that helps users manage the lifecycle of their data and ensures data quality through schema enforcement, log records and more. Databricks users can now work with the first five partners in the Ingestion Network — Fivetran, Qlik, Infoworks, StreamSets, Syncsort — to automatically load their data into Delta Lake. To ingest data from these partners, Databricks customers don’t have to set up any triggers or schedules — instead, data automatically flows into Delta Lake.

“Until now, companies have been forced to split up their data into traditional structured data and big data, and use them separately for BI and ML use cases. This results in siloed data in data lakes and data warehouses, slow processing and partial results that are too delayed or too incomplete to be effectively utilized,” says Ali Ghodsi, co-founder and CEO of Databricks. “This is one of the many drivers behind the shift to a Lakehouse paradigm, which aspires to combine the reliability of data warehouses with the scale of data lakes to support every kind of use case. In order for this architecture to work well, it needs to be easy for every type of data to be pulled in. Databricks Ingest is an important step in making that possible.”

Databricks VP or Product Marketing Bharath Gowda also tells me that this will make it easier for businesses to perform analytics on their most recent data and hence be more responsive when new information comes in. He also noted that users will be able to better leverage their structured and unstructured data for building better machine learning models, as well as to perform more traditional analytics on all of their data instead of just a small slice that’s available in their data warehouse.

 



from TechCrunch https://ift.tt/38XI01I

Comments

Popular posts from this blog

Microsoft says it has no plans to add more backward compatible titles for Xbox One, but says Project Scarlett will run games from all four Xbox generations (Tom Warren/The Verge)

SetSail raises $26M Series A for its service that recommends when to pay salespeople, by monitoring the progress of sales across CRM, email, and other systems (Ron Miller/TechCrunch)

Tencent-backed Chinese online education startup Huohua Siwei, which offers K-12 math and science courses, closes its $400M Series E at a $1.5B valuation (Emma Lee/TechNode)