How PepsiCo established an enterprise-grade knowledge intelligence platform powered by Databricks Unity Catalog


This weblog is authored by Bhaskar Palit, Senior Director, Knowledge & Analytics, PepsiCo, and Sudipta Das, Knowledge Architect Senior Supervisor, PepsiCo

 

PepsiCo has woven itself into the material of our every day life. Our merchandise are loved by shoppers a couple of billion occasions a day in additional than 200 nations and territories around the globe. PepsiCo generated greater than $91 billion in internet income in 2023, pushed by a complimentary beverage and handy meals portfolio that features Lay’s, Doritos, Cheetos, Gatorade, Pepsi-Cola, Mountain Dew, Quaker and SodaStream. 

PepsiCo has greater than 200,000 merchandise. We function throughout the globe and handle quite a lot of warehouses and suppliers, which all add up to an enormous quantity of information. Having that degree of information element permits us to be extra environment friendly throughout our enterprise provide chain, serving to scale back meals waste, save gasoline prices, and keep forward of buyer demand. 4 years in the past, we launched into a journey to ascertain an enterprise-grade knowledge platform encompassing six important elements: knowledge modeling, knowledge ingestion, knowledge serving, knowledge high quality, knowledge cataloging, and knowledge monitoring throughout 30+ digital merchandise. Our aim was to enhance knowledge high quality and governance, which is how we discovered Databricks Unity Catalog. On this weblog we’re sharing our progress and success thus far. 

To listen to extra, try our session on the Knowledge + AI Summit 2024. 

The Shift from Siloed Analytics to Unified Knowledge Intelligence

Through the years, PepsiCo has expanded its product portfolio, which resulted in knowledge being unfold throughout a number of techniques. This separation, in some instances, led to knowledge sprawl and duplication, a standard problem in giant organizations. To deal with these points, PepsiCo deliberate to unify all its world knowledge below a single knowledge structure. This strategic transfer has had a groundbreaking impression, with knowledge, analytics, and AI enabling workers to boost their efficiency. For instance, by centralizing knowledge, gross sales groups can entry up-to-date data throughout retailer visits, enhancing customer support and enabling instant product suggestions to spice up gross sales.

Moreover, PepsiCo aimed to advance its analytics capabilities by transferring from descriptive to predictive and prescriptive analytics with machine studying and synthetic intelligence. At PepsiCo, knowledge and AI have turn into very important instruments for the enterprise and our workers. It’s a basic a part of PepsiCo’s digital transformation, enhancing our digital assets throughout the board, from the optimum time to plan potatoes to predicting the variety of Doritos luggage to inventory on retailer cabinets. 

We chosen Microsoft Azure as our cloud supplier to fulfill these particular necessities. Given our must course of giant volumes of information effectively, Databricks emerged as a pure selection because of its seamless integration throughout the Azure setting. This integration is essential because it enhances our knowledge processing capabilities. The selection was additionally influenced by the widespread use of Apache Spark™ within the knowledge engineering house and the provision of expert professionals conversant in Databricks. Moreover, Databricks’ open and cloud-agnostic nature provides an additional layer of flexibility, permitting us to function throughout numerous cloud environments with out constraints.

Reworking Knowledge Administration and Governance with Databricks Unity Catalog

PepsiCo is enhancing its enterprise operations from seed to shelf by leveraging hundreds of thousands of information factors every day as merchandise are packaged and transported throughout roughly 1.3 billion miles worldwide, reaching our shoppers over a billion occasions a day. As we handle various knowledge from quite a few world sources, we’re constantly enhancing our centralized knowledge governance system to make sure knowledge accuracy and reliability. By streamlining the setting for our knowledge engineers, we intention to spice up operational effectivity and scalability, supporting our dedication to delivering high quality merchandise to our clients.

To deal with these necessities, we turned to Databricks Unity Catalog, which supplied the answer we wanted to fulfill all our necessities for stringent safety and complex entry controls. Databricks Unity Catalog is now an integral a part of the PepsiCo Knowledge Basis, our centralized world system that consolidates over 6 petabytes of information worldwide. It streamlines the onboarding course of for greater than 1,500 energetic customers and permits unified knowledge discovery for our 30+ digital product groups throughout the globe, supporting each enterprise intelligence and synthetic intelligence functions. For instance, we leverage knowledge to attach with farmers, who play a vital function in PepsiCo’s Optimistic (pep+) ambition to advertise regenerative farming practices throughout 7 million acres by 2030. By offering them with enhanced knowledge and analytics, farmers can use their land and water extra effectively, finally enhancing our provide chain at its supply.

PepsiCo Global Data Platform Architecture
Platform Structure (*HMS = Hive Metastore, UC = Unity Catalog, DBK = Databricks)
Consumption Pattern
Consumption Sample

With Unity Catalog, we now have realized advantages within the following areas particularly:

Knowledge safety:

  • Applied table-level entry management, changing schema-based entry in HMS, which aligns with the least privileged entry management coverage and removes the necessity to preserve 64 AD teams for storage container entry.
  • Enabled granular row and column-level entry for over 50 restricted tables throughout Finance, HR, and R&D knowledge domains.
  • Established volume-level entry management, eliminating the publicity danger of over 100 unsecured DBFS areas.

Auditability:

  • Offered insights into queries run by identities, permitting the platform admin crew to watch over 5,000 queries every day.

Monitoring and Observability:

  • Built-in with Databricks APIs for end-to-end knowledge lineage, enabling the creation of lineage for over 7,000 bronze tables and 1,000 silver tables from 150 totally different knowledge sources.
  • Enabled command-level overview of value consumption for over 2,000 notebooks and generated alerts for notebooks exceeding value thresholds.

Quicker Onboarding with Databricks Unity Catalog

Primarily based on our expertise, Databricks Unity Catalog has confirmed to be a scalable answer for centralized entry administration, knowledge governance, and knowledge lineage administration. Transitioning to Unity Catalog has streamlined our entry management processes, decreasing onboarding time by 30% and enhancing value administration. Moreover, with complete knowledge lineage capabilities, we now have elevated confidence in our knowledge by with the ability to hint its origins and monitor any adjustments in real-time. This transparency permits us to keep up excessive knowledge integrity and reliability.

In the end, Databricks has enabled us to realize better safety, governance and effectivity ranges in an evolving and sophisticated knowledge and AI panorama. 

To be taught extra about our journey, be part of our session, PepsiCo’s Low-Code, World Knowledge Platform powered by Unity Catalog on the Knowledge + AI Summit 2024

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *