docs-update-readme-0624 (#23)

- docs: cherry-pick README from pr/17 e3e8a244 (40aa64361d8a8d16ff9335f623d8fb1d33ee5aa6)
- feat: add .gitignore (a188cd19ef4c2612fc8882f18621e9c82ffebd7d)
- docs: update the transformers and API codes (69ac66d5b083672544fb01ebc97227740153f190)
- docs: update the tech report link (3061fd752721844bbfaf7bb7a566d36ce28e6c06)
- docs: fix the code snippets (5a1b238231f4de5b3d9bdb5c8d9e74eae6883d60)

Files changed (2) hide show

.gitignore +73 -0
README.md +303 -58

.gitignore ADDED Viewed

	@@ -0,0 +1,73 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual Environment
+venv/
+env/
+ENV/
+.env
+.venv
+env.bak/
+venv.bak/
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+.project
+.pydevproject
+.settings/
+# Jupyter Notebook
+.ipynb_checkpoints
+*.ipynb
+# Distribution / packaging
+.Python
+*.manifest
+*.spec
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+# Logs and databases
+*.log
+*.sqlite
+*.db
+# OS generated files
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db

README.md CHANGED Viewed

@@ -1,92 +1,337 @@
-# Jina Embeddings V4
-## Examples
-Encode functions:
-```python
-import torch
-from transformers import AutoModel
-from PIL import Image
-device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-# Load model
-model = AutoModel.from_pretrained('jinaai/jina-embeddings-v4', trust_remote_code=True)
-model = model.to(device)
-# Sample data
-texts = ["Here is some sample code", "This is a matching text"]
-image_paths = ['/<path_to_image>']
-images = [Image.open(path) for path in image_paths]
-# Example 1: Text matching task with single vector embeddings
-# Generate embeddings with dimension truncation (256), decrease max_pixels
-img_embeddings = model.encode_images(images=images, truncate_dim=256, max_pixels=602112, task='text-matching')
-text_embeddings = model.encode_texts(texts=texts, truncate_dim=256, max_length=512, task='text-matching')
-# Example 2: Retrieval task with multi-vector embeddings
-model.set_task(task='retrieval')
-# Generate multi-vector embeddings
-img_embeddings = model.encode_images(images=images, vector_type='multi_vector')
-text_embeddings = model.encode_texts(texts=texts, vector_type='multi_vector', prompt_name='passage')
-# Example 3: Code task with single vector embeddings
-code = ["def hello_world():\n    print('Hello, World!')"]
-code_embeddings = model.encode_texts(texts=code, task='code')
 ```
-Using the model forward:
 ```python
 import torch
-from transformers import AutoModel, AutoProcessor
-from PIL import Image
-device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-# Load model and processor
-model = AutoModel.from_pretrained('jinaai/jina-embeddings-v4', trust_remote_code=True)
-model = model.to(device)
-processor = AutoProcessor.from_pretrained('jinaai/jina-embeddings-v4', trust_remote_code=True)
-# Sample data
-texts = ["Here is some sample code", "This is a matching text"]
-image_paths = ['/<path_to_image>']
-# Process text and images
-text_batch = processor.process_texts(texts=texts, prefix="Query", max_length=512)
-images = [Image.open(path) for path in image_paths]
-image_batch = processor.process_images(images=images)
-# Forward pass
-model.eval()
-with torch.no_grad():
-    text_batch = {k: v.to(device) for k, v in text_batch.items()}
-    image_batch = {k: v.to(device) for k, v in image_batch.items()}
-    with torch.autocast(device_type='cuda' if torch.cuda.is_available() else 'cpu'):
-        # Get embeddings
-        text_embeddings = model.model(**text_batch, task_label='retrieval').single_vec_emb
-        img_embeddings = model.model(**image_batch, task_label='retrieval').single_vec_emb
-```
-Inference via the `SentenceTransformer` library:
 ```python
 from sentence_transformers import SentenceTransformer
-model = SentenceTransformer(
-    'jinaai/jina-embeddings-v4', trust_remote_code=True
 )
-emb = model.encode(['Khinkali is the best'], task='retrieval', prompt_name='query')
-```

+<br><br>
+<p align="center">
+<img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
+</p>
+<p align="center">
+<b>The embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
+</p>
+<p align="center">
+<b>Jina Embeddings v4: Multilingual Multimodal Embeddings</b>
+</p>
+## Quick Start
+[Blog](https://alwaysjudgeabookbyitscover.com/) | [Technical Report](https://arxiv.org/abs/2506.18902) | [API](https://jina.ai/embeddings)
+## Intended Usage & Model Info
+`jina-embeddings-v4` is a multilingual, multimodal embedding model designed for unified representation of text and images.
+The model is specialized for complex document retrieval, including visually rich documents with charts, tables, and illustrations.
+Embeddings produced by `jina-embeddings-v4` serve as the backbone for neural information retrieval and multimodal GenAI applications.
+Built based on [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), `jina-embeddings-v4` has the following features:
+- **Unified embeddings** for text, images, and visual documents, supporting both dense (single-vector) and late-interaction (multi-vector) retrieval.
+- **Multilingual support** (20+ languages) and compatibility with a wide range of domains, including technical and visually complex documents.
+- **Task-specific adapters** for retrieval, text matching, and code-related tasks, which can be selected at inference time.
+- **Flexible embedding size**: dense embeddings are 2048 dimensions by default but can be truncated to as low as 128 with minimal performance loss.
+Summary of features:
+| Feature   | Jina Embeddings V4   |
+|------------|------------|
+| Base Model | Qwen2.5-VL-3B-Instruct |
+| Supported Tasks | `retrieval`, `text-matching`, `code` |
+| Model DType | BFloat 16 |
+| Max Sequence Length | 32768 |
+| Single-Vector Dimension | 2048 |
+| Multi-Vector Dimension | 128 |
+| Matryoshka dimensions | 128, 256, 512, 1024, 2048 |
+| Pooling Strategy | Mean pooling |
+| Attention Mechanism | FlashAttention2 |
+## Training, Data, Parameters
+Please refer to our [technical report of jina-embeddings-v4](https://arxiv.org/abs/2506.18902) for the model and training details.
+## Usage
+<details>
+  <summary>Requirements</a></summary>
+The following Python packages are required:
+- `transformers>=4.52.0`
+- `torch>=2.6.0`
+- `peft>=0.15.2`
+- `torchvision`
+- `pillow`
+### Optional / Recommended
+- **flash-attention**: Installing [flash-attention](https://github.com/Dao-AILab/flash-attention) is recommended for improved inference speed and efficiency, but not mandatory.
+- **sentence-transformers**: If you want to use the model via the `sentence-transformers` interface, install this package as well.
+</details>
+<details>
+  <summary>via <a href="https://jina.ai/embeddings/">Jina AI Embeddings API</a></summary>
+```bash
+curl https://api.jina.ai/v1/embeddings \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $JINA_AI_API_TOKEN" \
+  -d @- <<EOFEOF
+  {
+    "model": "jina-embeddings-v4",
+    "task": "text-matching",
+    "input": [
+        {
+            "text": "غروب جميل على الشاطئ"
+        },
+        {
+            "text": "海滩上美丽的日落"
+        },
+        {
+            "text": "A beautiful sunset over the beach"
+        },
+        {
+            "text": "Un beau coucher de soleil sur la plage"
+        },
+        {
+            "text": "Ein wunderschöner Sonnenuntergang am Strand"
+        },
+        {
+            "text": "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία"
+        },
+        {
+            "text": "समुद्र तट पर एक खूबसूरत सूर्यास्त"
+        },
+        {
+            "text": "Un bellissimo tramonto sulla spiaggia"
+        },
+        {
+            "text": "浜辺に沈む美しい夕日"
+        },
+        {
+            "text": "해변 위로 아름다운 일몰"
+        },
+        {
+            "image": "https://i.ibb.co/nQNGqL0/beach1.jpg"
+        },
+        {
+            "image": "https://i.ibb.co/r5w8hG8/beach2.jpg"
+        }
+    ]
+  }
+EOFEOF
 ```
+</details>
+<details>
+  <summary>via <a href="https://huggingface.co/docs/transformers/en/index">transformers</a></summary>
 ```python
+# !pip install transformers>=4.52.0 torch>=2.6.0 peft>=0.15.2 torchvision pillow
+# !pip install
+from transformers import AutoModel
 import torch
+# Initialize the model
+model = AutoModel.from_pretrained("jinaai/jina-embeddings-v4", trust_remote_code=True)
+model.to("cuda")
+# ========================
+# 1. Retrieval Task
+# ========================
+# Configure truncate_dim, max_length (for texts), max_pixels (for images), vector_type, batch_size in the encode function if needed
+# Encode query
+query_embeddings = model.encode_text(
+    texts=["Overview of climate change impacts on coastal cities"],
+    task="retrieval",
+    prompt_name="query",
+)
+# Encode passage (text)
+passage_embeddings = model.encode_text(
+    texts=[
+        "Climate change has led to rising sea levels, increased frequency of extreme weather events..."
+    ],
+    task="retrieval",
+    prompt_name="passage",
+)
+# Encode image/document
+image_embeddings = model.encode_image(
+    images=["https://i.ibb.co/nQNGqL0/beach1.jpg"],
+    task="retrieval",
+)
+# ========================
+# 2. Text Matching Task
+# ========================
+texts = [
+    "غروب جميل على الشاطئ",  # Arabic
+    "海滩上美丽的日落",  # Chinese
+    "Un beau coucher de soleil sur la plage",  # French
+    "Ein wunderschöner Sonnenuntergang am Strand",  # German
+    "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία",  # Greek
+    "समुद्र तट पर एक खूबसूरत सूर्यास्त",  # Hindi
+    "Un bellissimo tramonto sulla spiaggia",  # Italian
+    "浜辺に沈む美しい夕日",  # Japanese
+    "해변 위로 아름다운 일몰",  # Korean
+]
+text_embeddings = model.encode_text(texts=texts, task="text-matching")
+# ========================
+# 3. Code Understanding Task
+# ========================
+# Encode query
+query_embedding = model.encode_text(
+    texts=["Find a function that prints a greeting message to the console"],
+    task="code",
+    prompt_name="query",
+)
+# Encode code
+code_embeddings = model.encode_text(
+    texts=["def hello_world():\n    print('Hello, World!')"],
+    task="code",
+    prompt_name="passage",
+)
+# ========================
+# 4. Use multivectors
+# ========================
+multivector_embeddings = model.encode_text(
+    texts=texts,
+    task="retrieval",
+    prompt_name="query",
+    return_multivector=True,
+)
+images = ["https://i.ibb.co/nQNGqL0/beach1.jpg", "https://i.ibb.co/r5w8hG8/beach2.jpg"]
+multivector_image_embeddings = model.encode_image(
+    images=images,
+    task="retrieval",
+    return_multivector=True,
+)
+```
+</details>
+<details>
+  <summary>via <a href="https://sbert.net/">sentence-transformers</a></summary>
 ```python
 from sentence_transformers import SentenceTransformer
+# Initialize the model
+model = SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True)
+# ========================
+# 1. Retrieval Task
+# ========================
+# Encode query
+query_embeddings = model.encode(
+    sentences=["Overview of climate change impacts on coastal cities"],
+    task="retrieval",
+    prompt_name="query",
+)
+print(f"query_embeddings.shape = {query_embeddings.shape}")
+# Encode passage (text)
+passage_embeddings = model.encode(
+    sentences=[
+        "Climate change has led to rising sea levels, increased frequency of extreme weather events..."
+    ],
+    task="retrieval",
+    prompt_name="passage",
+)
+print(f"passage_embeddings.shape = {passage_embeddings.shape}")
+# Encode image/document
+image_embeddings = model.encode(
+    sentences=["https://i.ibb.co/nQNGqL0/beach1.jpg"],
+    task="retrieval",
+)
+print(f"image_embeddings.shape = {image_embeddings.shape}")
+# ========================
+# 2. Text Matching Task
+# ========================
+texts = [
+    "غروب جميل على الشاطئ",  # Arabic
+    "海滩上美丽的日落",  # Chinese
+    "Un beau coucher de soleil sur la plage",  # French
+    "Ein wunderschöner Sonnenuntergang am Strand",  # German
+    "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία",  # Greek
+    "समुद्र तट पर एक खूबसूरत सूर्यास्त",  # Hindi
+    "Un bellissimo tramonto sulla spiaggia",  # Italian
+    "浜辺に沈む美しい夕日",  # Japanese
+    "해변 위로 아름다운 일몰",  # Korean
+]
+text_embeddings = model.encode(sentences=texts, task="text-matching")
+# ========================
+# 3. Code Understanding Task
+# ========================
+# Encode query
+query_embeddings = model.encode(
+    sentences=["Find a function that prints a greeting message to the console"],
+    task="code",
+    prompt_name="query",
 )
+# Encode code
+code_embeddings = model.encode(
+    sentences=["def hello_world():\n    print('Hello, World!')"],
+    task="code",
+    prompt_name="passage",
+)
+# ========================
+# 4. Use multivectors
+# ========================
+multivector_text_embeddings = model.encode(
+    sentences=texts,
+    task="retrieval",
+    prompt_name="query",
+    return_multivector=True,
+)
+images = ["https://i.ibb.co/nQNGqL0/beach1.jpg", "https://i.ibb.co/r5w8hG8/beach2.jpg"]
+multivector_image_embeddings = model.encode(
+    sentences=images,
+    task="retrieval",
+    return_multivector=True,
+)
+```
+</details>
+## License
+This model is licensed to download and run under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en). It is available for commercial use via the [Jina Embeddings API](https://jina.ai/embeddings/), [AWS](https://longdogechallenge.com/), [Azure](https://longdogechallenge.com/), and [GCP](https://longdogechallenge.com/). To download for commercial use, please [contact us](https://jina.ai/contact-sales).
+## Contact
+Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
+## Citation
+If you find `jina-embeddings-v4` useful in your research, please cite the following paper: