Vanna.AI Data Security FAQ

Vanna AI Architecture

Vanna AI Core Python Package

The Vanna AI Core Python Package is a Python package that provides a set of tools for connecting to various databases, generating SQL queries using AI, running SQL queries, generating visualizations, and related functionality. The package is designed to be extensible, allowing users to add or modify functionality as needed.

In order to function, the Python package needs 2 major components - a large language model (LLM) and a retrieval augmentation layer. The LLM is responsible for generating SQL queries from natural language questions, while the retrieval augmentation layer is responsible for providing context to the LLM. The retrieval augmentation layer is trained on a combination of DDL statements, documentation strings, SQL statements, and question-SQL pairs.

You may choose to use the Vanna AI Core Python Package with your own LLM and retrieval augmentation or you may choose to use the Vanna AI Hosted Services, which provide access to an LLM and retrieval augmentation layer.

Code Integrity

The core Python package is an open-source project, and the code is available on GitHub. Code that is contributed to the project is reviewed by the Vanna AI team before being merged into the main codebase. The code is also subject to automated testing and linting to ensure that it meets the project's standards.

Vanna AI Hosted Services

If you use Vanna's hosted services, the training data is stored on Vanna's servers. Some of that data is sent to the LLM for the purpose of generating SQL queries or related functionality.

Data Stored

DDL statements that were used to train the system (e.g. from vn.train(ddl=...) )
Documentation strings that were used to train the system (e.g. from vn.train(documentation=...) )
SQL statements that were used to train the system (e.g. from vn.train(sql=...) )
Question-SQL pairs that were used to train the system (e.g. from vn.train(question=..., sql=...) )

Data Sent to LLM

During each call, a subset of the data stored is sent to the LLM for the purpose of generating SQL queries or related functionality. This data is sent securely over HTTPS.

Database contents are not sent to Vanna's servers or the LLM unless you specifically set the parameter allow_llm_to_see_data = True in the built-in Flask app or use functions like vn.generate_summary explicitly that require the LLM to "see" the data in order to produce an answer. This parameter is set to False by default.

For functionality that requires the LLM to "see" the data, the data is only sent to the LLM and not stored on Vanna's servers.

Database Credentials

Database credentials are only used in the context of the Python package and are not sent to Vanna's servers. They are used to connect to your database and run SQL queries locally wherever the Python package is running.

Third-Party Services

Vanna AI uses the following third-party services for hosting, storage, and other functionality. These services are chosen for their security and reliability.

Microsoft Azure
Google Cloud Platform
Amazon Web Services

Employee Access

Vanna AI employees and contractors do not have direct access to the training data as a matter of everyday business. Access to the training data is restricted to a small number of employees who require access for the purpose of maintaining the system. All employees with access to the training data are required to sign a confidentiality agreement.

If you require support that requires Vanna AI employees to access your training data, you must e-mail support@vanna.ai from the e-mail address associated with your account in order to authorize the support employee to view your training data for the purpose of providing support.