NovoMolGen
Collection
6 items
•
Updated
•
1
NovoMolGen is a family of molecular foundation models trained on 1.5 billion ZINC-22 molecules with Llama architectures and FlashAttention. It achieves state-of-the-art performance on both unconstrained and goal-directed molecule generation tasks.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("chandar-lab/NovoMolGen_300M_SMILES_AtomWise", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("chandar-lab/NovoMolGen_300M_SMILES_AtomWise", trust_remote_code=True)
from accelerate import Accelerator
acc = Accelerator(mixed_precision='bf16')
model = acc.prepare(model)
outputs = model.sample(tokenizer=tokenizer, batch_size=4)
print(outputs['SMILES'])
@article{chitsaz2024novomolgen,
title={NovoMolGen: Rethinking Molecular Language Model Pretraining},
author={Chitsaz, Kamran and Balaji, Roshan and Fournier, Quentin and
Bhatt, Nirav Pravinbhai and Chandar, Sarath},
journal={arXiv preprint},
year={2025},
}