How Open Will Snowflake Go at Knowledge Cloud Summit?


Snowflake is holding its Knowledge Cloud Summit 24 convention subsequent week, and the corporate is predicted to make a slew of bulletins, which it is possible for you to to search out on these Datanami pages. However among the many most carefully watched questions is how far Snowflake will go in embracing the Apache Iceberg desk format and opening itself as much as exterior question engines? And is it attainable that Snowflake might attempt to “out open” its rival Databricks, whose convention is the next week?

Snowflake has advanced significantly because it burst onto the scene a handful of years in the past as a cloud knowledge warehouse. Based by engineers skilled in creating analytics databases, the corporate delivered a top-flight knowledge warehouse within the cloud with full separation of compute and storage, which was really novel on the time. Corporations annoyed with Hadoop flocked to Snowflake, the place they discovered a way more welcoming and pleasant person expertise.

However whereas Snowflake lacked the technical complexity of Hadoop, it additionally lacked the openness of Hadoop. That was a tradeoff that many shoppers have been prepared to make again in 2018, when frustration with Hadoop was nearing its peak. However prospects maybe will not be so prepared to make that tradeoff in 2024, notably as the excessive price of cloud computing has develop into a difficulty with many CFOs.

Snowflake executives initially boasted in regards to the elevated income that its lock-in created, however quickly it realized that prospects have been genuinely involved about being locked in to a proprietary format. It took concrete actions to handle the lock-in in February 2022, when it introduced assist for Apache Iceberg, though Snowflake has but to make Iceberg assist typically accessible.

The battle for lakehouse market dominance is being waged atop the open desk codecs

Iceberg, in fact, is the open desk format developed at Netflix to handle knowledge correctness considerations when accessing knowledge saved in Parquet recordsdata utilizing a number of question engines, together with Hive and Presto. The metadata managed by desk codecs like Iceberg present ACID transactionality to knowledge interactions, assuaging considerations that queries would return incorrect knowledge.

Iceberg wasn’t the primary desk format–that honor goes to Apache Hudi, which engineers developed at Uber to handle knowledge of their Hadoop stack. In the meantime, the parents at Databricks created their personal desk format known as Delta in 2017 and named the info platform it created the info lakehouse.

Two years into the Iceberg experiment, Snowflake has some choices to make. Whereas it permits prospects to retailer knowledge in externally managed Iceberg tables, it doesn’t supply a lot of the info administration capabilities that you’d anticipate from a full-fledged knowledge lakehouse–i.e. issues like desk partitioning, knowledge compaction, and cleanup. Will the corporate announce these throughout Knowledge Cloud Summit this week?

If Snowflake goes all-in on Iceberg, it will assist differentiate it from Databricks, which has put its chips on Delta (though it introduced some capabilities to assist Iceberg and Hudi with its Common Format unveiled final June). The group appears to be coalescing round Iceberg from a reputation viewpoint, which might bolster Snowflake’s place as a top-tier Iceberg-based knowledge lakehouse.

One other query is whether or not Snowflake will permit exterior SQL engines to question Iceberg knowledge that it manages. Snowflake’s proprietary SQL engine is extremely performant when run on knowledge saved within the authentic Snowflake desk format, and the corporate has benchmark outcomes to again that up. However Snowflake doesn’t present plenty of choices on the subject of querying knowledge with different engines.

Snowflake provides the Snowpark API, which helps you to categorical queries utilizing Python, Java, and Scala, however that is extra designed for knowledge engineering and constructing machine studying fashions than SQL question processing. It additionally provides an Apache Spark connector that permits you to learn from and write to Snowflake utilizing Spark 3.2 by way of 3.4 (it additionally provides an Apache Kafka connector). However what prospects might really need is the flexibility to run one other SQL question engine in opposition to their knowledge.

Snowflake returns to San Francisco for Knowledge Cloud Summit 24

One particular person who can be carefully watching Snowflake subsequent week is Justin Borgman, the CEO and founding father of Starburst, which does plenty of work creating the Trino question engine that forked from Presto and runs its personal lakehouse providing that helps Iceberg and Trino. Borgman notes that most of the first workloads run by Netflix after creating Iceberg used Presto.

“We really feel prefer it form of resets the enjoying area,” Borgman says of the influence of Iceberg. “It’s nearly just like the battlefield strikes from this very conventional ‘get knowledge ingested into your proprietary database after which you will have your buyer locked in,’ to extra of a free for all the place the info is unlocked, which is undoubtedly finest for patrons. After which we’ll battle it out on the question engine layer, the execution layer relatively than on the storage layer. And I feel that’s only a actually fascinating growth.”

Borgman is understandably keen to have the ability to get into Snowflake’s huge 9,800-strong buyer base through Iceberg. He claims benchmark exams present Trino outperforming Snowflake’s question engine on Iceberg tables whereas being about one-third of the associated fee. Starburst has a variety of giant prospects, equivalent to Lyft, LinkedIn, and Netflix, utilizing the mix of Trino and Iceberg, he says.

Snowflake might go all-in on Iceberg and open itself as much as exterior question engines, but it surely might nonetheless train some management over buyer workloads by way of different means. As an example, it might require that prospects entry knowledge by way of its knowledge catalog, Borgman predicts.

“They could attempt to lock you in on a number of peripheral options,” he tells Datanami. “However I feel on the finish of the day, prospects gained’t tolerate that. I feel Iceberg is undoubtedly of their finest pursuits, and I feel that they’re going to only begin shifting tons of information into Iceberg format from the place they’ve the chance to decide on a unique question engine.”

Snowflake CEO Sridhar Ramaswamy should stability the corporate’s openness with development

If Snowflake does go absolutely open and permits exterior question engines equivalent to Trino, Presto, Dremio, and even Spark SQL to entry knowledge that it manages for patrons in Iceberg tables, Snowflake prospects gained’t probably transfer all their knowledge to Iceberg directly, says Borgman, who was a 2023 Datanami Individual to Watch. They’ll probably transfer their lowest SLA (service stage settlement) queries into Iceberg first, whereas protecting the extra essential knowledge in Snowflake’s native format and use Snowflake’s native question engine, which is quicker however dearer.

That units up an fascinating dynamic the place Snowflake might probably be hurting its potential to generate revenues whereas giving prospects what they need, which is extra openness. However on the flip aspect, prospects may very well reward Snowflake by shifting extra knowledge into Iceberg and letting Snowflake handle it for them. That would generate greater revenues for the Bozeman, Montana firm, though most likely not on the similar per-customer charge that in the event that they stored all the info locked right into a proprietary format. That’s a rate-of-growth issue that new Snowflake CEO Sridhar Ramaswamy should account for.

When Ramaswamy changed Frank Slootman in February, it was anticipated that the corporate would shift some focus to AI, the place it was seen as trailing its rival Databricks. The corporate’s April launch of its Arctic giant language fashions (LLMs) exhibits the corporate is ready to transfer shortly on that entrance, but it surely’s core aggressive benefit over Databricks stays with SQL-based analytics and knowledge warehousing workloads.

The altering nature of information warehousing presents each challenges and alternatives for Snowflake. “Mainly, the info warehouse now’s absolutely decoupled,” Borgman says. “Snowflake talked rather a lot about that 12 to fifteen years in the past, at any time when they first got here out, of separation of storage and compute. However the important thing was it was all the time their storage and their compute.”

No matter Snowflake chooses to do subsequent week, the cloud giants will undoubtedly be watching. Up to now, they haven’t actually picked sides within the open desk format conflict that’s being waged between Iceberg and Delta, with Hudi in a distant third (though the Apache XTable format developed by Hudi-backer Onehouse threatens to make all of it moot). If the market solidifies behind Iceberg, AWS, Microsoft Azure, and Google Cloud might attempt to minimize out the intermediary by providing their very own soup-to-nuts knowledge lakehouse providing.

Borgman says this appears like a replay of the mid 2010s, when Teradata’s giant knowledge warehousing put in base was slowly eaten into by Hadoop, however with one huge distinction.

“I feel you’re going to see the same form of mannequin play out the place prospects are motivated to attempt to scale back their knowledge warehouse prices and utilizing extra of this lake mannequin,” says Borgman, who was the CEO of Hadoop software program vendor Hadapt when it was acquired by Teradata in 2014. “However I feel one factor that’s completely different this time round is that the engines themselves, like Trino and [Presto], have improved dramatically over what Hive or Impala was again then.”

Associated Gadgets:

Snowflake Seems to AI to Bolster Development

Onehouse Breaks Knowledge Catalog Lock-In with Extra Openness

Teradata Acquires Revelytix, Hadapt

 

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *