annamodels/LGAI-Embedding-Preview

LGAI-Embedding-Preview

we have trained the LGAI-Embedding-Preview model based on the Mistral-7B LLM model.

The initial goal is to reproduce the baseline model and check the workflow for uploading results:

Checkpoint
technical report

MTEB

Inference is performed with in-context examples for MTEB evaluation.

Model Information

Model Size: 7B
Embedding Dimension: 4096
Max Input Tokens: 32k

Requirements

transformers>=4.48.3

Citation

If you find this repository useful, please consider citing it.

@misc{choi2025lgaiembeddingpreviewtechnicalreport,
      title={LGAI-EMBEDDING-Preview Technical Report}, 
      author={Jooyoung Choi and Hyun Kim and Hansol Jang and Changwook Jun and Kyunghoon Bae and Hyewon Choi and Stanley Jungkyu Choi and Honglak Lee and Chulmin Yun},
      year={2025},
      eprint={2506.07438},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.07438}, 
}