티스토리 뷰

Research (연구 관련)

What is VLM?

홍돌 2024. 5. 9. 07:00

What is VLM (Vision Language Model)?

VLM is a model with a multi-modal architecture that learns to associate information from image and text modalities. The focus of the multi-modal learning is to pre-train an model on vision and language task and improve the downstream task performance such as VQA (Vision Question Answering). 

Why VLM? What are the Use Cases?

1. Image Search and Retrieval / 2. Robotics: Integrating VLMs can allow robots to understand visual instructions or describe their surroundings, enabling more complex object ineractions and collaboration.
Read more at: https://viso.ai/deep-learning/vision-language-models/

What are the types of VLMs?

Depending on a fusion mechanism, there are  "Dual Encoder - CLIP", "Fusion Encoer - ViLBERT", and "Hybrid Encoder - ?". The fusion encoding directly combines the image and text embeddings. The dual Encoder is computationally more efficient and performs similarly with the fusion encoder in cross-modality retrieval tasks (image-text retrieval), and the fusion encoder is more sutiable for more complext tasks such as VQA. 

What is ViLBERT (Vision Language BERT)?

It has two streamlines for vision and language input respectively. It integrates co-attention modules that conditions each other's modalitiy. There are two pre-training tasks for ViLBERT. One is  recovering masked tokens and the other is predicting the alignment of the image and text input pair. 

What is simply discretizing the space of visual inputs via clustering?  (ViLBERT page 3)

 

 

Vision Language Models: Exploring Multimodal AI
Read more at: https://viso.ai/deep-learning/vision-language-models/

ViLBERT: https://arxiv.org/pdf/1908.02265v1

 

'Research (연구 관련)' 카테고리의 다른 글

What is Equivariance in Computer Vision?  (0) 2024.06.28
What is JAX linen (nn) Module?  (2) 2024.06.20
What is Variational Score Distillation?  (0) 2024.04.01
What is index building?  (0) 2024.03.30
What is offset noise?  (0) 2024.03.29
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/11   »
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
글 보관함