zucchini-nlp/update_blip.py

pspdada · 2024-10-12T13:18:52Z

I tried to use the model Salesforce/blip2-itm-vit-g, but encountered a warning. After adding the code mentioned here, I received a NotImplementedError when calling the function resize_token_embeddings. What should I do?

zucchini-nlp · 2024-10-14T06:41:03Z

hmm weird, lemme check quickly

pspdada · 2024-10-17T14:51:49Z

I encounter a new issue about it, please check it.
huggingface/transformers#34223

pspdada · 2024-10-28T03:13:35Z

I found that after using this code, a warning also appears in resize_token_embeddings: "The new embeddings will be initialized from a multivariate normal distribution that has the old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use mean_resizing=False."
Do you know how to handle it?

zucchini-nlp · 2024-10-28T06:55:30Z

Yes, we added two different modes for resizing and initalizing from mean is exactly what we want to not mess up the distribution in embedding matrix thus causing gibberish logits/generation. We don't have to handle that and set mean_resizing=False, I guess the warning is there simply to say that resizing technique has changed compared to the prev release

SHYuanBest · 2024-11-02T03:09:46Z

so what modification is needed to be correct? for transformers==4.46.1

zucchini-nlp · 2024-11-02T09:34:22Z

@SHYuanBest the gist should be correct with the latest transformers? Does that throw errors for you?

SHYuanBest · 2024-11-02T11:49:24Z

Yes, there is still a warning jump here.

"Expanding inputs for image tokens in BLIP-2 should be done in processing. Please follow instruction here (https://gist.github.com/zucchini-nlp/e9f20b054fa322f84ac9311d9ab67042) to update your BLIP-2 model. Using processors without these attributes in the config is deprecated and will throw an error in v4.47."

zucchini-nlp · 2024-11-04T12:58:19Z

@SHYuanBest i just tried the following script with the latest main branch and ut gives no warnings anymore. Can you share the reproduction script and the version of transformers you are using?

from PIL import Image
import requests
from transformers import Blip2Processor, Blip2ForConditionalGeneration, AddedToken
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
    "Salesforce/blip2-opt-2.7b", load_in_8bit=True, device_map={"": 0}, torch_dtype=torch.float16
)


processor.num_query_tokens = model.config.num_query_tokens
image_token = AddedToken("<image>", normalized=False, special=True)
processor.tokenizer.add_tokens([image_token], special_tokens=True)

model.resize_token_embeddings(len(processor.tokenizer), pad_to_multiple_of=64) # pad for efficient computation
model.config.image_token_index = len(processor.tokenizer) - 1

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

prompt = "Question: how many cats are there? Answer:"
inputs = processor(images=image, text=prompt, return_tensors="pt").to(device="cuda", dtype=torch.float16)

generated_ids = model.generate(**inputs, max_new_tokens=20)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)

SHYuanBest · 2024-11-04T13:07:01Z

transformers == 4.46.1

from transformers import CLIPProcessor, CLIPModel, Blip2Processor, Blip2ForImageTextRetrieval, AddedToken

blip_model = Blip2ForImageTextRetrieval.from_pretrained("Salesforce/blip2-itm-vit-g", torch_dtype=torch.float16)
blip_processor = Blip2Processor.from_pretrained("Salesforce/blip2-itm-vit-g")
    
blip_processor.num_query_tokens = blip_model.config.num_query_tokens
image_token = AddedToken("<image>", normalized=False, special=True)
blip_processor.tokenizer.add_tokens([image_token], special_tokens=True)

blip_model.resize_token_embeddings(len(blip_processor.tokenizer), pad_to_multiple_of=64) # pad for efficient computation
blip_model.config.image_token_index = len(blip_processor.tokenizer) - 1

Traceback (most recent call last):
  File "get_clip_and_blip_score.py", line 56, in <module>
    blip_model.resize_token_embeddings(len(blip_processor.tokenizer), pad_to_multiple_of=64) # pad for efficient computation
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib/python3.11/site-packages/transformers/modeling_utils.py", line 2117, in resize_token_embeddings
    model_embeds = self._resize_token_embeddings(new_num_tokens, pad_to_multiple_of, mean_resizing)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib/python3.11/site-packages/transformers/modeling_utils.py", line 2141, in _resize_token_embeddings
    old_embeddings = self.get_input_embeddings()
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib/python3.11/site-packages/transformers/modeling_utils.py", line 1873, in get_input_embeddings
    raise NotImplementedError
NotImplementedError

SHYuanBest · 2024-11-04T13:27:07Z

oh, I install the latest main branch transformer using pip install -e . and the problem was solved.

zucchini-nlp · 2024-11-04T13:30:50Z

@SHYuanBest no, you should not. It is a warning not related to these changes and I guess it was there only because new resizing uses a different method than befroe

SHYuanBest · 2024-11-04T13:32:10Z

ok, thanks

	# Load your model and processor and run the following to update BLIP-2 model
	# It will update file in your repo by adding new args in configs and resizing embedding layer
	# Then you'll be able to run BLIP-2 without warnings/errors
	from transformers import AddedToken

	processor.num_query_tokens = model.config.num_query_tokens
	image_token = AddedToken("<image>", normalized=False, special=True)
	processor.tokenizer.add_tokens([image_token], special_tokens=True)

	model.resize_token_embeddings(len(processor.tokenizer), pad_to_multiple_of=64) # pad for efficient computation
	model.config.image_token_index = len(processor.tokenizer) - 1

	model.push_to_hub("YOUR-REPO")
	processor.push_to_hub("YOUR-REPO")

zucchini-nlp/update_blip.py

pspdada commented Oct 12, 2024 •

edited

Loading

zucchini-nlp commented Oct 14, 2024

pspdada commented Oct 17, 2024

pspdada commented Oct 28, 2024

zucchini-nlp commented Oct 28, 2024

SHYuanBest commented Nov 2, 2024 •

edited

Loading

zucchini-nlp commented Nov 2, 2024

SHYuanBest commented Nov 2, 2024

zucchini-nlp commented Nov 4, 2024

SHYuanBest commented Nov 4, 2024 •

edited

Loading

SHYuanBest commented Nov 4, 2024 •

edited

Loading

zucchini-nlp commented Nov 4, 2024

SHYuanBest commented Nov 4, 2024 via email •

edited

Loading

zucchini-nlp/update_blip.py

pspdada commented Oct 12, 2024 • edited Loading

zucchini-nlp commented Oct 14, 2024

pspdada commented Oct 17, 2024

pspdada commented Oct 28, 2024

zucchini-nlp commented Oct 28, 2024

SHYuanBest commented Nov 2, 2024 • edited Loading

zucchini-nlp commented Nov 2, 2024

SHYuanBest commented Nov 2, 2024

zucchini-nlp commented Nov 4, 2024

SHYuanBest commented Nov 4, 2024 • edited Loading

SHYuanBest commented Nov 4, 2024 • edited Loading

zucchini-nlp commented Nov 4, 2024

SHYuanBest commented Nov 4, 2024 via email • edited Loading

pspdada commented Oct 12, 2024 •

edited

Loading

SHYuanBest commented Nov 2, 2024 •

edited

Loading

SHYuanBest commented Nov 4, 2024 •

edited

Loading

SHYuanBest commented Nov 4, 2024 •

edited

Loading

SHYuanBest commented Nov 4, 2024 via email •

edited

Loading