Cannot use apply_chat_template because this processor does not have a chat template.
#75
by
zhusl-cpu
- opened
My transformers version is 4.55.2
There's a branch in the code of modeling_jina_embeddings_v4.py that causes error
def process_images(
self,
images: Union[List[Image.Image], List[List[Image.Image]]],
) -> BatchFeature:
if isinstance(images[0], list):
images = cast(List[List[Image.Image]], images)
text_doc = []
for i in range(len(images)):
conversation = [
{"role": "user", "content": [{"type": "image"}] * len(images[i])}
]
template = self.apply_chat_template(
conversation, add_generation_prompt=False
)
text_doc.append(template[self.assistant_prefix_len :])
else:
images = cast(List[Image.Image], images)
text_doc = [
"<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Describe the image.<|im_end|>\n"
] * len(images)
# The following code is a hack to make sure the scatter in DDP is done correctly when training on multiple GPUs
batch_doc = self(text=text_doc, images=images, padding="longest", return_tensors="pt") # type: ignore
# Separate pixel_values for each image
offsets = batch_doc["image_grid_thw"][:, 1] * batch_doc["image_grid_thw"][:, 2]
# Pad pixel_values to the same length to be able to make it into a tensor
pixel_values = torch.split(batch_doc["pixel_values"], offsets.tolist())
max_length = max([len(pv) for pv in pixel_values])
pixel_values = [
torch.cat(
[
pv,
torch.zeros(
(max_length - len(pv), pv.shape[1]),
dtype=pv.dtype,
device=pv.device,
),
]
)
for pv in pixel_values
]
batch_doc["pixel_values"] = torch.stack(pixel_values)
return batch_doc
The if isinstance(images[0], list): branch will trigger ValueError: Cannot use apply_chat_template because this processor does not have a chat template.
This branch will be trigger if the input images is a list of list of image urls, e.g.,
image_embeddings = model.encode_image(
images=[[image_url,image_url],[image_url,image_url]],
task="retrieval",
)
I think whether such kind of input is not expexted or not, i.e., if multiple images in one doc is permitted, the code shouldn't be like that.