Rubrik Launches Annapurna for Unstructured Data AI

Rubrik (NYSE: RBRK), a Security and AI Operations company, has introduced the next stage of Rubrik Annapurna. This specialized unstructured data layer is designed to integrate enterprise data into Data Intelligence platforms of choice by scanning and cataloging unstructured data in place across distributed systems. The technology aims to eliminate the data duplication and Extract, Transform, Load (ETL) overhead that has historically kept enterprise unstructured data out of AI pipelines. This is particularly critical given that unstructured data represents 90% of most modern enterprise footprints, yet often remains siloed, untracked, and unreachable by AI applications due to the limitations of infrastructure-heavy legacy architectures.

Rubrik Annapurna and the Unstructured Data Layer

Rubrik Annapurna operates on the Rubrik Security Cloud management plane to auto-discover, scan, and index billions of files across S3, NAS, and object stores. Rather than requiring weeks of manual engineering, the system can publish a queryable catalog of file metadata directly into a lakehouse in a matter of hours. This allows organizations to identify and pull only the specific subsets of data required for inference, fine-tuning, and training.

According to the company, this approach addresses a historical challenge where organizations were forced to duplicate entire environments into data lakes just to surface the small fraction—often less than 10%—of data actually needed for AI operations. By activating data where it resides, Rubrik states that pipeline costs can scale 1:1 with actual consumption rather than the total size of the data estate. Anneka Gupta, Chief Product Officer at Rubrik, noted that this inverts the traditional model of moving, transforming, and storing data twice, allowing enterprises to align infrastructure costs directly to consumption.

Technical Integration and Governance Framework

The Annapurna layer integrates with existing storage and lakehouse environments without requiring new agents or additional infrastructure. It utilizes a demand-driven economic model where customers pay for the data they pull rather than for the duplication of their full estate. This automation streamlines the handoff to downstream Data Intelligence platform applications, as data engineers can query the index to pinpoint exact file targets and stage only those subsets for workflows.

To maintain security and compliance, the system preserves native source-system access controls within the catalog. This is intended to prevent the security gaps that occur when traditional ETL processes strip access permissions during transit, allowing platforms to continuously enforce controls in downstream workflows. Additionally, the tool leverages a Zero Trust foundation to provide an immutable chain of custody. This ensures that files staged into the managed object store maintain verifiable lineage and versioning from source through AI output. Rubrik stated these provenance capabilities are designed to support compliance programs, including the General Data Protection Regulation (GDPR).

Enterprise Application in Financial Services

The company highlighted the utility of the tool for organizations managing petabytes of distributed and regulated data. Corey West, CTO of Piper Sandler & Co., stated that managing such highly distributed and siloed unstructured data across legacy and modern platforms was previously operationally limiting. West noted that Annapurna's automated approach to mapping, governing, and indexing the estate reduces the friction of cross-functional configurations and data sovereignty requirements. This is achieved without the need for another ETL stack or compromising the organization's compliance posture.

Key Takeaways

Annapurna indexes files across NAS, S3, and object stores and publishes a queryable metadata catalog into a lakehouse without moving source files.
The system preserves native source-system access controls and provides verifiable lineage and versioning to support GDPR and other compliance programs.
The architecture is designed to scale pipeline costs 1:1 with data consumption, removing the need to duplicate entire data environments for AI pipelines.

TechInsyte's Take

Rubrik is positioning Annapurna as a way to bypass the costly ETL processes that typically hinder the use of unstructured data in AI. For infrastructure leaders, the primary value lies in the claim of maintaining native access controls and lineage during the AI staging process. Buyers should monitor how this integration performs across diverse legacy storage environments and whether the "1:1" cost scaling holds true at extreme petabyte scales.

Source: Businesswire