HonuDB Documentation

The HonuDB Database is the first AI native distributed database intended for an audience of AI developers who need to manage multi-modal datasets with snapshots that can map to models and model training. A replicated document database, HonuDB provides rapid data ingestion and collection management for different mimetypes including JSON, Parquet, images, video, and more. With privacy in mind from the start, HonuDB has data governance features such as provenance and lineage tracking (including by geographic location), and fine-grain access controls. Data scientists and machine learning engineers can rely on Honu to manage small to extremely large datasets replicated over multiple geographic areas.

Get Started Now!

Feature overview

Collections & Datasets

Collections allow you to manage related data together; Datasets are snapshots of collections that indicate exactly what data was usd to train a model.

Full Versioning

All objects in the database are fully versioned to prevent an update from changing the view of a dataset from a model perspective.

Provenance Awareness

Regions and unique writers are tracked across all updates so you can monitor how data is changing in your system and implement privacy controls.

Smart Replication

Honu uses reinforcement learning anti-entropy to maximize consistency and scale replication to hundreds of nodes without increasing your cloud costs.

Fine-Grain Access Control

Collections, objects, and datasets have a hierarchical permission model specifically for AI workloads including training and inferencing permissions.

Model Context Protocol

Honu supports the Model Context Protocol so that you can directly add data to your LLM contexts using semantic similarity indexes.