--- title: Space emoji: 🏃 colorFrom: red colorTo: yellow sdk: gradio sdk_version: 5.48.0 app_file: app.py pinned: false short_description: James Webb thumbnail: >- https://cdn-uploads.huggingface.co/production/uploads/68dc8c5a7c207d5db359cbb9/foCePE-ZvhB3ozg1wqZ-C.webp --- ## Overview This app demonstrates a **multimodal AI search tool** using both **natural language processing** and **computer vision**. It allows users to search an index of 1,000 images using either a text query, an image upload, or both. The model used (CLIP) embeds text and images in a shared vector space so that semantic similarity can be compared directly. ## How to Use 1. Wait for the “Index built: 1000 images” message. 2. Enter a **text query** (e.g., “spiral galaxy”) or upload an **image**. 3. Adjust the **Top K slider** to set how many top matches to view. 4. Click **Search** to see the results ranked by similarity score. 5. The grid displays the most relevant images first. ## About the Model - **Model:** CLIP (Contrastive Language–Image Pre-training) - **Capabilities:** Combines natural-language understanding with visual feature recognition. - **Purpose:** Demonstrates integration of NLP and computer vision in a single multimodal application. ## Evaluation Summary A brief qualitative test on 10 queries showed that roughly **85 % of the top-5 results** were visually relevant. This confirms that the embeddings correctly align text and image meanings. ## Limitations - Works best with visually distinctive subjects (e.g., planets, galaxies). - No fine-tuning on this dataset. - Index must be rebuilt if files are changed unless persistence is added. ## Credits - **Dataset:** NASA James Webb Telescope image collection - **Model Source:** [Hugging Face CLIP](https://huggingface.co/openai/clip-vit-base-patch32) - **Created by:** Jay McIntyre for UMGC ARIN-460 Assignment 8 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference