Data Security FAQ
Last updated: October 2025 • Version 2.0 (Vanna v2.0.0 Agent Framework)
Overview
This FAQ addresses data security and privacy questions for developers and technical teams implementing Vanna AI. Vanna operates as a modular agent framework that can be deployed in multiple configurations, from fully self-hosted open-source installations to cloud-managed premium services.
Architecture & Deployment Models
What is the architecture of Vanna v2.0?
Vanna v2.0 is a modular agent framework built on clean abstractions:
- Core Agent: Orchestrates LLM interactions with tool execution loops, conversation management, and streaming support
- Tool System: Extensible tool registry with group-based access control
- Storage Layer: Abstract interfaces for conversations, audit logs, and observability data
- User Management: User resolution with group-based permissions (RBAC)
- LLM Services: Pluggable integrations for Anthropic Claude, OpenAI GPT, and other providers
The framework provides 6 extensibility points: lifecycle hooks, middlewares, error recovery, context enrichers, conversation filters, and observability providers.
What deployment models does Vanna support?
| Model | Description | Data Location | Use Case | 
|---|---|---|---|
| Self-Hosted | Open-source Python package on your infrastructure | All data stays local | Maximum control, sensitive data, air-gapped environments | 
| Cloud Premium | Fully managed Vanna premium services | Vanna cloud infrastructure | Rapid deployment, managed observability | 
| Hybrid | Python local, premium services for telemetry | Conversations local, telemetry in cloud | Balance between control and managed services | 
What are the premium backend services?
Vanna's premium backend (written in Go) provides managed services:
- Observability: Metrics and distributed tracing for monitoring agent performance
- Audit Logging: Centralized audit event storage with query capabilities
- Tool Registry: Shared tool schemas and templates across teams
- Agent Memory: Semantic search over historical tool usage patterns
- Conversation Management: Cloud-based conversation persistence
- Analytics: Dashboard aggregations and usage statistics
Current Status: The premium backend is in development/demo stage with in-memory storage. Production deployment requires additional hardening (see "Production Considerations" section below).
Data Handling & Privacy
What data does the open-source Python package handle?
When running self-hosted, the Python package handles:
- Database Connections: Connection strings and credentials (stored locally, never transmitted)
- Training Data: DDL statements, documentation, SQL examples, and question-SQL pairs
- Conversation History: User messages and AI responses
- Tool Execution Data: Tool invocations, parameters, and results
- Audit Logs: Security events, access checks, and tool usage
Important: In self-hosted mode with local storage (e.g., MemoryConversationStore), all data stays on your infrastructure. No data is transmitted to Vanna servers or third-party services unless you explicitly configure premium integrations.
What data is sent to Vanna's premium services?
When using premium backend services (opt-in), the following data may be transmitted via HTTPS:
| Data Type | Sent to Premium? | Purpose | 
|---|---|---|
| Database credentials | Never | Always stored locally only | 
| Training data (DDL, docs, SQL) | Yes (opt-in) | Enable semantic search and retrieval augmentation | 
| Conversation messages | Yes (opt-in) | Persist conversations across sessions | 
| Tool execution metadata | Yes (opt-in) | Centralized audit logging and analytics | 
| Observability metrics/traces | Yes (opt-in) | Performance monitoring and debugging | 
Authentication: All premium API requests use Authorization: Bearer {api_key} headers and X-Organization-ID for multi-tenancy isolation.
What data is sent to third-party LLMs (Anthropic, OpenAI)?
Vanna integrates with external LLM providers to generate responses. The following data is transmitted to the LLM provider you configure:
Always Sent:
- User questions/messages
- System prompts (including tool schemas)
- Tool execution results (to provide context for follow-up responses)
- Conversation history (for context)
Conditionally Sent:
- Training data: DDL statements, documentation snippets, and example SQL from your retrieval augmentation layer
Never Sent:
- Database credentials
- Raw database connection strings
Transmission Security: All LLM API requests are sent via HTTPS. Data is not stored by Vanna on transmission; it flows directly from your Python environment to the LLM provider per their retention policies (see Anthropic and OpenAI privacy policies).
How are database credentials managed?
Database credentials are:
- Stored Locally: Credentials remain in your Python environment (environment variables, config files, or passed programmatically)
- Never Transmitted: Credentials are never sent to Vanna servers, premium backend, or LLM providers
- Not Logged: Audit logging automatically sanitizes sensitive parameters (password, secret, token, api_key, credential, etc.) before recording events
Best Practice: Use environment variables or secret management services (AWS Secrets Manager, HashiCorp Vault) rather than hardcoding credentials.
Security Controls
What user isolation and access control mechanisms exist?
Vanna v2.0 implements group-based access control (RBAC):
User Model:
class User:
    id: str                          # Unique user identifier
    username: str
    email: str
    group_memberships: List[str]     # e.g., ["admin", "analyst", "viewer"]Access Control:
- Tool Access: Each tool specifies access_groups. Users can only invoke tools where their group memberships intersect with the tool's allowed groups.
- UI Feature Access: Sensitive UI features (e.g., viewing tool arguments, error details) can be restricted by group.
- Conversation Isolation: All conversation storage operations validate that conversation.user.id == requesting_user.id.
How does audit logging work?
Vanna provides comprehensive audit logging with automatic parameter sanitization:
Events Logged:
- Tool Access Checks: User attempts to access tools (granted/denied)
- Tool Invocations: Tool name, sanitized parameters, execution timestamp
- Tool Results: Success/failure status, execution time, error messages
- UI Feature Access: Which users accessed restricted UI features
- AI Responses: Response metadata (length, hash, model used)
Parameter Sanitization:
The audit system automatically redacts sensitive patterns:
- password,- secret,- token,- api_key
- credential,- auth,- private_key,- access_key
- Values replaced with [REDACTED]
What observability and monitoring capabilities exist?
The framework includes built-in observability:
Metrics:
- Tool execution latency
- LLM request duration
- Error rates by tool
- Conversation length statistics
Distributed Tracing:
- Request-level tracing
- Tool execution traces
- LLM interaction traces
- Custom span attributes
Providers: Local (in-memory), Premium (cloud-based), or Custom (implement ObservabilityProvider for Datadog, Prometheus, etc.)
What extensibility points exist for custom security controls?
Vanna v2.0 provides multiple integration points for custom security:
1. Custom User Resolver
Implement authentication (OAuth, JWT, SAML)
2. Custom Audit Logger
Route audit events to your SIEM (Splunk, DataDog, etc.)
3. Custom Conversation Store
Implement encrypted storage
4. Lifecycle Hooks
Inject custom validation/security checks
5. Middlewares
Request/response interception for rate limiting, etc.
Production Deployment Considerations
What are the current limitations of the premium backend?
The premium backend is currently in development/demo stage with the following limitations:
| Limitation | Impact | Production Requirement | 
|---|---|---|
| In-memory storage | Data lost on restart | PostgreSQL, MongoDB | 
| No encryption at rest | Unencrypted data | Database encryption | 
| Wide-open CORS | CSRF risks | Restrict to known domains | 
| No rate limiting | DoS vulnerability | Redis-backed rate limiting | 
| No auth middleware | Open endpoints | JWT/OAuth authentication | 
Recommendation: For production deployments, use self-hosted mode until premium services complete security hardening, or implement additional security layers (API gateway, VPN, etc.).
What security hardening is recommended for production?
Self-Hosted Deployments:
- TLS/HTTPS for all API endpoints
- Encryption at rest for conversation storage
- Deploy behind firewall/VPN
- Use secret management services (AWS Secrets Manager, Vault)
- Enable comprehensive audit logging
- Regular security log reviews
Premium/Hybrid Deployments (additional):
- API key rotation
- IP whitelisting
- Rate limiting
- Data retention policies aligned with GDPR/CCPA
How can I ensure GDPR/CCPA compliance?
Self-Hosted Deployments: You have full control and responsibility for compliance.
- Data Minimization: Only collect necessary data
- Right to Access: Implement endpoints to export user data
- Right to Erasure: Implement deletion via conversation store APIs
- Data Retention: Configure automatic cleanup of old conversations
- Privacy Policy: Clearly disclose what data is sent to LLM providers
Premium Services Roadmap: Vanna will provide GDPR-compliant data retention, deletion APIs, and data processing agreements for production premium services.
Quick Reference
| Question | Self-Hosted | Premium Services | 
|---|---|---|
| Where are database credentials stored? | Locally only | Never sent to premium | 
| Where are conversations stored? | Local storage (your control) | Vanna cloud (opt-in) | 
| Is data encrypted in transit? | HTTPS (your config) | HTTPS to Vanna APIs | 
| Is data encrypted at rest? | Your implementation | Roadmap (not current) | 
| Can I delete my data? | Yes (via API) | Yes (via API, roadmap: UI) | 
| Is it production-ready? | Yes (with hardening) | No (development stage) | 
Getting Help
Documentation & Support
Security Issues
Please do not open public GitHub issues for security vulnerabilities.
- • Report via GitHub Security Advisories
- • Email: security@vanna.ai
- • Response time: 48 hours
Ready to get started?
Deploy Vanna with confidence. Choose self-hosted for maximum control or try our managed services.