huggingface/number-tokenization-blog · Discussion on comparision with previous work and citation?

Hi, thank you for bringing this to our attention.
This space is a mere blogpost and does not purport to be a full scientific paper, but we appropriately cite previous work that was a strong inspiration for our experimental setup, namely https://arxiv.org/pdf/2402.14903 (which is actually not cited in your own paper).
While your paper is a very interesting work, its focus seems narrower than the blogpost, which additionally tests R2L vs L2R (from a quick read, this comparison seems to be absent from your paper) and how training tokenizers with, for example, BPE will arbitrarily assign some tokens to some numbers but not others.
Additionally, your paper was published on arxiv in late September, around the same time we were already communicating about experiments that made it to the final version of the blogpost (https://x.com/garrethleee/status/1853870656506798454), which itself was published less than a month later. It seems a stretch to claim the (again, very interesting) paper you mention is an "earlier" paper, and not concurrent work, but I am happy to add a citation where it might make sense if you feel strongly about this.
Best