OpenSearch as Vector DB: Supercharge Your LLM

Dejanu Alex
GoPenAI
Published in
3 min readJul 27, 2023

--

Go beyond interactive log analytics and real-time application monitoring, now you can unlock the ability to deploy ML models in OpenSearch.

Amazon OpenSearch Service allows you to deploy a secured OpenSearch cluster in minutes.

OpenSearch Service

Setup:

In this particular case, the OpenSearch 2.7 cluster is backed up by r6gd.4xlarge instances. Since we’re not using ML nodes with NVIDIA® V100 Tensor Core GPUs, we need to change the configuration of ml_commons in order to run our model on our Graviton2-based instances.

Default settings

By using the DevTools we can run queries in the console, first thing is to change the plugin only_run_on_ml_nodesetting to false.

# change the config
PUT _cluster/settings
{
"persistent":{
"plugins.ml_commons.only_run_on_ml_node": false
}
}

After updating the plugin configuration, the next step is to upload a pre-trained model using API (OpenSearch currently only supports TorchScript and ONNX formats). Below is a list of some of the pre-trained models that are supported:

Pre-trained models

Steps:

⚠️When choosing the sizing for the OpenSearch cluster, ensure you correctly size your nodes in order to have enough memory when making ML inferences and avoid CircuitBreakerException.

Most deep learning models are larger than 100 MB, making it difficult to fit them into a single document, therefore OpenSearch splits the model file into smaller chunks to be stored in a model index. Upload the model using the API, in this case, I’ve chosen the pre-trained sentence-transformer model all-MiniLM-L12-v2 .

After uploading the model, OpenSearch responds with the task_id which we’re going to use to get the model_id.

After getting the model_id, we’re going to load the model from the index into the memory POST /_plugins/_ml/models/<model_id>/_load

model load

After the model is loaded successfully we can use the text_embedding algorithm.

POST /_plugins/_ml/_predict/text_embedding/lu14l4kB_GAWF5uBi_Ol
{
"text_docs":[ "sentence to be embedded"],
"return_number": true,
"target_response": ["sentence_embedding"]
}
embedded sentence

Basically, that’s it… for an in-depth explanation of what embedding means check embedding algorithm, and LLM and Vector Databases.

As a quick recap, below are the steps:

# get settings
GET /_cluster/settings?include_defaults=true

# get memory usage per node and per breaker
GET _nodes/stats/breaker

# if you don't use dedicated ML nodes for cluster update setting to false
PUT _cluster/settings
{
"persistent":{
"plugins.ml_commons.only_run_on_ml_node": false
}
}

# upload pre-trained model
POST /_plugins/_ml/models/_upload
{
"name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
"version": "1.0.1",
"model_format": "TORCH_SCRIPT"
}

# get model id using the task_id returned by previous request
GET /_plugins/_ml/tasks/<task_id>

# load model
POST /_plugins/_ml/models/<modelId>/_load

# use the task_id to get the status of model load
GET /_plugins/_ml/tasks/<task_id>

# embed text
POST /_plugins/_ml/_predict/text_embedding/lu14l4kB_GAWF5uBi_Ol
{
"text_docs":[ "test to embedd here"],
"return_number": true,
"target_response": ["sentence_embedding"]
}

--

--

Seasoned DevOps engineer — Jack of all trades master of None