How to use Milvus and Kubernetes for reverse image search

This guide explains how to use Milvus, an open-source vector database, and Managed Kubernetes on Virtuozzo Infrastructure to build a reverse image search engine.

Reverse image search (RIS) helps you search for similar or related images given an input image. Reverse image search is a content-based image retrieval (CBIR) query technique that involves providing the CBIR system with a query image that it will then base its search upon.

The example described in this guide is based on the Milvus vector database, which includes everything you need to perform image search. To learn more about similarity search, refer to the Milvus RIS guide and Milvus Bootcamp.

Milvus architecture

milvus architecture

For more details, refer to the Milvus documentation.

Prerequisites

1. Deploy a Virtuozzo Infrastructure cluster.

2. Create the compute cluster with the Kubernetes and load balancing services.

3. Configure a storage policy named standard for boot volumes on Kubernetes master nodes. Ensure that the selected policy is available for all projects where you are planning to deploy Kubernetes.

4. Create a Kubernetes cluster.

5. Create the default storage class with the storage policy standard. The storage policy must be available in your project.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# cat > storage-class.yaml <<\EOT
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: default
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: cinder.csi.openstack.org
parameters:
  type: standard
EOT

Apply the configuration file:

1
# kubectl create -f storage-class.yaml

6. Create a storage policy with ReadWriteMany (RWX) support to store Milvus logs. In this example, we recommend deploying an NFS server and NFS external provisioner as the easiest way to get such a storage policy.

7. Ensure that you have the credentials (the access key and secret key) to the object storage service. In this guide, we will use the S3 object storage and a bucket provided by Virtuozzo Infrastructure named milvus.

Important: We recommend using a separate Python virtual environment to run the required Jupiter notebook, as it installs multiple Python modules with specific versions and may break other applications you develop. You will need Python version 3.11.

To learn how to manage virtual environments for Visual Studio Code, refer to Python environments in VS Code.

Deploying the Milvus cluster using Helm

1. Download the example configuration file for Milvus cluster deployment:

1
# wget https://virtuozzo-k8s-demo.s3.amazonaws.com/milvus-virtuozzo.yaml

2. Connect to your Kubernetes cluster:

1
# export KUBECONFIG=<your_k8s_kubeconfig_file>

3. Install and configure Helm.

4. Change the Helm configuration file according to your environment. You can edit the milvus-virtuozzo.yaml file obtained in step 1 or create your own by using the command helm show values milvus/milvus > milvus.yaml.

In the configuration file:

  • service.type: LoadBalancer as we recommend using load balancers instead of ingress controllers for this setup
  • service.loadBalancerSourceRanges: 0.0.0.0/24 limits the number of CIDRs to access your Milvus cluster endpoint to the specified IP range
  • log.persistence.enabled: true enables persistent storage for logs
  • log.persistence.persistentVolumeClaim.storageClass: nfs specifies the nfs storage policy to store logs
  • attu.enabled: true installs Attu, a management tool for Milvus
  • attu.service.type: LoadBalancer as we recommend using load balancers instead of ingress controllers for this setup
  • minio.enabled: false as we are going to use a third-party S3 storage instead of MinIO
  • externalS3.enabled: true enables an external S3 connection
  • externalS3.host: "<s3_server_dns_name>" uses your S3 server DNS name as the S3 access point
  • externalS3.port: "443" uses TCP port 443 to access the S3 server
  • externalS3.accessKey: "access_key" uses your S3 access key to access the S3 data
  • externalS3.secretKey: "secret_key" to your S3 secret key to access the S3 data
  • externalS3.useSSL: true enables SSL
  • externalS3.bucketName: "milvus" uses the bucket named milvus

5. Deploy the Milvus cluster by using the Helm chart:

1
# helm install -f milvus-virtuozzo.yaml milvus milvus/milvus

The deployment takes minimum 5 minutes.

6. Once the deployment is complete, check the pods in your Kubernetes cluster. You should have the following pods up and running:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# kubectl get pod
NAME                                         READY   STATUS        RESTARTS        AGE
milvus-attu-5dd7bdcf75-2jw9p                 1/1     Running       0               12m
milvus-datacoord-bc75c755c-hsgjf             1/1     Running       0               12m
milvus-datanode-7c569bb566-hf2x5             1/1     Running       1 (7m55s ago)   12m
milvus-etcd-0                                1/1     Running       0               12m
milvus-etcd-1                                1/1     Running       0               12m
milvus-etcd-2                                1/1     Running       0               12m
milvus-indexcoord-5d4d9db856-wmhxl           1/1     Running       0               12m
milvus-indexnode-58f5f7fc8c-dgslk            1/1     Running       0               12m
milvus-proxy-66f78bd4dd-bnncx                1/1     Running       1 (7m54s ago)   12m
milvus-pulsar-bookie-0                       1/1     Running       0               12m
milvus-pulsar-bookie-1                       1/1     Running       0               12m
milvus-pulsar-bookie-2                       1/1     Running       0               12m
milvus-pulsar-bookie-init-lmml4              0/1     Completed     0               12m
milvus-pulsar-broker-0                       1/1     Running       0               12m
milvus-pulsar-proxy-0                        1/1     Running       0               12m
milvus-pulsar-pulsar-init-f2spk              0/1     Completed     0               12m
milvus-pulsar-recovery-0                     1/1     Running       0               12m
milvus-pulsar-zookeeper-0                    1/1     Running       0               12m
milvus-pulsar-zookeeper-1                    1/1     Running       0               11m
milvus-pulsar-zookeeper-2                    1/1     Running       0               10m
milvus-querycoord-7856c67b47-k65xv           1/1     Running       0               12m
milvus-querynode-84ccf646b5-5rwsf            1/1     Running       0               12m
milvus-rootcoord-d754b8f7c-8vd8z             1/1     Running       0               12m

7. Find out the endpoints for the Milvus services:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# kubectl get service
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                               AGE
kubernetes                ClusterIP      10.254.0.1       <none>           443/TCP                               172d
milvus                    LoadBalancer   10.254.71.213    <milvus_address> 19530:30116/TCP,9091:30672/TCP        5m49s
milvus-attu               LoadBalancer   10.254.61.197    <attu_address>   3000:30474/TCP                        5m49s
milvus-datacoord          ClusterIP      10.254.128.226   <none>           13333/TCP,9091/TCP                    5m49s
milvus-datanode           ClusterIP      None             <none>           9091/TCP                              5m49s
milvus-etcd               ClusterIP      10.254.89.143    <none>           2379/TCP,2380/TCP                     5m49s
milvus-etcd-headless      ClusterIP      None             <none>           2379/TCP,2380/TCP                     5m49s
milvus-indexcoord         ClusterIP      10.254.17.32     <none>           31000/TCP,9091/TCP                    5m49s
milvus-indexnode          ClusterIP      None             <none>           9091/TCP                              5m49s
milvus-pulsar-bookie      ClusterIP      None             <none>           3181/TCP,8000/TCP                     5m49s
milvus-pulsar-broker      ClusterIP      None             <none>           8080/TCP,6650/TCP                     5m49s
milvus-pulsar-proxy       ClusterIP      10.254.213.2     <none>           80/TCP,6650/TCP                       5m49s
milvus-pulsar-recovery    ClusterIP      None             <none>           8000/TCP                              5m49s
milvus-pulsar-zookeeper   ClusterIP      None             <none>           8000/TCP,2888/TCP,3888/TCP,2181/TCP   5m49s
milvus-querycoord         ClusterIP      10.254.141.63    <none>           19531/TCP,9091/TCP                    5m49s
milvus-querynode          ClusterIP      None             <none>           9091/TCP                              5m49s
milvus-rootcoord          ClusterIP      10.254.157.20    <none>           53100/TCP,9091/TCP                    5m49s

You need two endpoints:

  • The external IP address of the milvus service, <milvus_address>, is the endpoint of Milvus cluster that you will use later for testing.
  • The external IP address of the milvus-attu service, <attu_address>, is the Attu IP address.

8. Access Attu at http://<attu_address>:3000/ under the default user root with the password Milvus and check that your Milvus cluster is up and running.

Testing Milvus

1. Download the Jupyter notebook files:

1
2
# wget https://virtuozzo-k8s-demo.s3.amazonaws.com/1_build_image_search_engine.ipynb
# wget https://virtuozzo-k8s-demo.s3.amazonaws.com/2_deep_dive_image_search_attu.ipynb

These files are adapted for the Virtuozzo deployment described above. You can also check the original notebooks. The main difference between the original and Virtuozzo files is that we use the updated versions of Milvus and Gradio with the correct code to run your demo website.

2. Open the Jupyter notebooks by using your favorite IDE, for example, Visual Studio Code or JupyterLab.

3. Change the IP address of the Milvus cluster in both notebooks. To do this, find the Configuration part, and then set HOST to the obtained milvus_address.

Enjoy!