Security & Privacy

Data Security FAQ

Last updated: October 2025 • Version 2.0 (Vanna v2.0.0 Agent Framework)

Overview

This FAQ addresses data security and privacy questions for developers and technical teams implementing Vanna AI. Vanna operates as a modular agent framework that can be deployed in multiple configurations, from fully self-hosted open-source installations to cloud-managed premium services.

Architecture & Deployment Models

What is the architecture of Vanna v2.0?

Vanna v2.0 is a modular agent framework built on clean abstractions:

Core Agent: Orchestrates LLM interactions with tool execution loops, conversation management, and streaming support
Tool System: Extensible tool registry with group-based access control
Storage Layer: Abstract interfaces for conversations, audit logs, and observability data
User Management: User resolution with group-based permissions (RBAC)
LLM Services: Pluggable integrations for Anthropic Claude, OpenAI GPT, and other providers

The framework provides 6 extensibility points: lifecycle hooks, middlewares, error recovery, context enrichers, conversation filters, and observability providers.

What deployment models does Vanna support?

Model	Description	Data Location	Use Case
Self-Hosted	Open-source Python package on your infrastructure	All data stays local	Maximum control, sensitive data, air-gapped environments
Cloud Premium	Fully managed Vanna premium services	Vanna cloud infrastructure	Rapid deployment, managed observability
Hybrid	Python local, premium services for telemetry	Conversations local, telemetry in cloud	Balance between control and managed services

What are the premium backend services?

Vanna's premium backend (written in Go) provides managed services:

Observability: Metrics and distributed tracing for monitoring agent performance
Audit Logging: Centralized audit event storage with query capabilities
Tool Registry: Shared tool schemas and templates across teams
Agent Memory: Semantic search over historical tool usage patterns
Conversation Management: Cloud-based conversation persistence
Analytics: Dashboard aggregations and usage statistics

Current Status: The premium backend is in development/demo stage with in-memory storage. Production deployment requires additional hardening (see "Production Considerations" section below).

Data Handling & Privacy

What data does the open-source Python package handle?

When running self-hosted, the Python package handles:

Database Connections: Connection strings and credentials (stored locally, never transmitted)
Training Data: DDL statements, documentation, SQL examples, and question-SQL pairs
Conversation History: User messages and AI responses
Tool Execution Data: Tool invocations, parameters, and results
Audit Logs: Security events, access checks, and tool usage

Important: In self-hosted mode with local storage (e.g., MemoryConversationStore), all data stays on your infrastructure. No data is transmitted to Vanna servers or third-party services unless you explicitly configure premium integrations.

What data is sent to Vanna's premium services?

When using premium backend services (opt-in), the following data may be transmitted via HTTPS:

Data Type	Sent to Premium?	Purpose
Database credentials	Never	Always stored locally only
Training data (DDL, docs, SQL)	Yes (opt-in)	Enable semantic search and retrieval augmentation
Conversation messages	Yes (opt-in)	Persist conversations across sessions
Tool execution metadata	Yes (opt-in)	Centralized audit logging and analytics
Observability metrics/traces	Yes (opt-in)	Performance monitoring and debugging

Authentication: All premium API requests use Authorization: Bearer {api_key} headers and X-Organization-ID for multi-tenancy isolation.

What data is sent to third-party LLMs (Anthropic, OpenAI)?

Vanna integrates with external LLM providers to generate responses. The following data is transmitted to the LLM provider you configure:

Always Sent:

User questions/messages
System prompts (including tool schemas)
Tool execution results (to provide context for follow-up responses)
Conversation history (for context)

Conditionally Sent:

Training data: DDL statements, documentation snippets, and example SQL from your retrieval augmentation layer

Never Sent:

Database credentials
Raw database connection strings

Transmission Security: All LLM API requests are sent via HTTPS. Data is not stored by Vanna on transmission; it flows directly from your Python environment to the LLM provider per their retention policies (see Anthropic and OpenAI privacy policies).

How are database credentials managed?

Database credentials are:

Stored Locally: Credentials remain in your Python environment (environment variables, config files, or passed programmatically)
Never Transmitted: Credentials are never sent to Vanna servers, premium backend, or LLM providers
Not Logged: Audit logging automatically sanitizes sensitive parameters (password, secret, token, api_key, credential, etc.) before recording events

Best Practice: Use environment variables or secret management services (AWS Secrets Manager, HashiCorp Vault) rather than hardcoding credentials.

Security Controls

What user isolation and access control mechanisms exist?

Vanna v2.0 implements group-based access control (RBAC):

User Model:

class User:
    id: str                          # Unique user identifier
    username: str
    email: str
    group_memberships: List[str]     # e.g., ["admin", "analyst", "viewer"]

Access Control:

Tool Access: Each tool specifies access_groups. Users can only invoke tools where their group memberships intersect with the tool's allowed groups.
UI Feature Access: Sensitive UI features (e.g., viewing tool arguments, error details) can be restricted by group.
Conversation Isolation: All conversation storage operations validate that conversation.user.id == requesting_user.id.

How does audit logging work?

Vanna provides comprehensive audit logging with automatic parameter sanitization:

Events Logged:

Tool Access Checks: User attempts to access tools (granted/denied)
Tool Invocations: Tool name, sanitized parameters, execution timestamp
Tool Results: Success/failure status, execution time, error messages
UI Feature Access: Which users accessed restricted UI features
AI Responses: Response metadata (length, hash, model used)

Parameter Sanitization:

The audit system automatically redacts sensitive patterns:

password, secret, token, api_key
credential, auth, private_key, access_key
Values replaced with [REDACTED]

What observability and monitoring capabilities exist?

The framework includes built-in observability:

Metrics:

Tool execution latency
LLM request duration
Error rates by tool
Conversation length statistics

Distributed Tracing:

Request-level tracing
Tool execution traces
LLM interaction traces
Custom span attributes

Providers: Local (in-memory), Premium (cloud-based), or Custom (implement ObservabilityProvider for Datadog, Prometheus, etc.)

What extensibility points exist for custom security controls?

Vanna v2.0 provides multiple integration points for custom security:

1. Custom User Resolver

Implement authentication (OAuth, JWT, SAML)

2. Custom Audit Logger

Route audit events to your SIEM (Splunk, DataDog, etc.)

3. Custom Conversation Store

Implement encrypted storage

4. Lifecycle Hooks

Inject custom validation/security checks

5. Middlewares

Request/response interception for rate limiting, etc.

Production Deployment Considerations

What are the current limitations of the premium backend?

The premium backend is currently in development/demo stage with the following limitations:

Limitation	Impact	Production Requirement
In-memory storage	Data lost on restart	PostgreSQL, MongoDB
No encryption at rest	Unencrypted data	Database encryption
Wide-open CORS	CSRF risks	Restrict to known domains
No rate limiting	DoS vulnerability	Redis-backed rate limiting
No auth middleware	Open endpoints	JWT/OAuth authentication

Recommendation: For production deployments, use self-hosted mode until premium services complete security hardening, or implement additional security layers (API gateway, VPN, etc.).

What security hardening is recommended for production?

Self-Hosted Deployments:

TLS/HTTPS for all API endpoints
Encryption at rest for conversation storage
Deploy behind firewall/VPN
Use secret management services (AWS Secrets Manager, Vault)
Enable comprehensive audit logging
Regular security log reviews

Premium/Hybrid Deployments (additional):

API key rotation
IP whitelisting
Rate limiting
Data retention policies aligned with GDPR/CCPA

How can I ensure GDPR/CCPA compliance?

Self-Hosted Deployments: You have full control and responsibility for compliance.

Data Minimization: Only collect necessary data
Right to Access: Implement endpoints to export user data
Right to Erasure: Implement deletion via conversation store APIs
Data Retention: Configure automatic cleanup of old conversations
Privacy Policy: Clearly disclose what data is sent to LLM providers

Premium Services Roadmap: Vanna will provide GDPR-compliant data retention, deletion APIs, and data processing agreements for production premium services.

Quick Reference

Question	Self-Hosted	Premium Services
Where are database credentials stored?	Locally only	Never sent to premium
Where are conversations stored?	Local storage (your control)	Vanna cloud (opt-in)
Is data encrypted in transit?	HTTPS (your config)	HTTPS to Vanna APIs
Is data encrypted at rest?	Your implementation	Roadmap (not current)
Can I delete my data?	Yes (via API)	Yes (via API, roadmap: UI)
Is it production-ready?	Yes (with hardening)	No (development stage)

Getting Help

Documentation & Support

Security Issues

Please do not open public GitHub issues for security vulnerabilities.

• Report via GitHub Security Advisories
• Email: security@vanna.ai
• Response time: 48 hours

Ready to get started?

Deploy Vanna with confidence. Choose self-hosted for maximum control or try our managed services.

Try Vanna 2.0 View Documentation