eacortes commited on
Commit
a752b5e
·
verified ·
1 Parent(s): d6a8b09

Update README to include transformers version requirement

Browse files
Files changed (1) hide show
  1. README.md +19 -13
README.md CHANGED
@@ -124,6 +124,12 @@ model-index:
124
  ModChemBERT is a ModernBERT-based chemical language model (CLM), trained on SMILES strings for masked language modeling (MLM) and downstream molecular property prediction (classification & regression).
125
 
126
  ## Usage
 
 
 
 
 
 
127
  ### Load Model
128
  ```python
129
  from transformers import AutoModelForMaskedLM, AutoTokenizer
@@ -146,19 +152,6 @@ fill = pipeline("fill-mask", model=model, tokenizer=tokenizer)
146
  print(fill("c1ccccc1[MASK]"))
147
  ```
148
 
149
- ## Intended Use
150
- * Primary: Research and development for molecular property prediction, experimentation with pooling strategies, and as a foundational model for downstream applications.
151
- * Appropriate for: Binary / multi-class classification (e.g., toxicity, activity) and single-task or multi-task regression (e.g., solubility, clearance) after fine-tuning.
152
- * Not intended for generating novel molecules.
153
-
154
- ## Limitations
155
- - Out-of-domain performance may degrade for: very long (>128 token) SMILES, inorganic / organometallic compounds, polymers, or charged / enumerated tautomers are not well represented in training.
156
- - No guarantee of synthesizability, safety, or biological efficacy.
157
-
158
- ## Ethical Considerations & Responsible Use
159
- - Potential biases arise from training corpora skewed to drug-like space.
160
- - Do not deploy in clinical or regulatory settings without rigorous, domain-specific validation.
161
-
162
  ## Architecture
163
  - Backbone: ModernBERT
164
  - Hidden size: 768
@@ -289,6 +282,19 @@ Optimal parameters (per dataset) for the `MLM + DAPT + TAFT OPT` merged model:
289
 
290
  </details>
291
 
 
 
 
 
 
 
 
 
 
 
 
 
 
292
  ## Hardware
293
  Training and experiments were performed on 2 NVIDIA RTX 3090 GPUs.
294
 
 
124
  ModChemBERT is a ModernBERT-based chemical language model (CLM), trained on SMILES strings for masked language modeling (MLM) and downstream molecular property prediction (classification & regression).
125
 
126
  ## Usage
127
+ Install the `transformers` library starting from v4.56.1:
128
+
129
+ ```bash
130
+ pip install -U transformers>=4.56.1
131
+ ```
132
+
133
  ### Load Model
134
  ```python
135
  from transformers import AutoModelForMaskedLM, AutoTokenizer
 
152
  print(fill("c1ccccc1[MASK]"))
153
  ```
154
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
  ## Architecture
156
  - Backbone: ModernBERT
157
  - Hidden size: 768
 
282
 
283
  </details>
284
 
285
+ ## Intended Use
286
+ * Primary: Research and development for molecular property prediction, experimentation with pooling strategies, and as a foundational model for downstream applications.
287
+ * Appropriate for: Binary / multi-class classification (e.g., toxicity, activity) and single-task or multi-task regression (e.g., solubility, clearance) after fine-tuning.
288
+ * Not intended for generating novel molecules.
289
+
290
+ ## Limitations
291
+ - Out-of-domain performance may degrade for: very long (>128 token) SMILES, inorganic / organometallic compounds, polymers, or charged / enumerated tautomers are not well represented in training.
292
+ - No guarantee of synthesizability, safety, or biological efficacy.
293
+
294
+ ## Ethical Considerations & Responsible Use
295
+ - Potential biases arise from training corpora skewed to drug-like space.
296
+ - Do not deploy in clinical or regulatory settings without rigorous, domain-specific validation.
297
+
298
  ## Hardware
299
  Training and experiments were performed on 2 NVIDIA RTX 3090 GPUs.
300