Model Card for `nexa-mistral-7b-sci`

Model Details

Model Description:
nexa-mistral-7b-sci is a fine-tuned variant of the open-weight Mistral-7B-v0.1 model, optimized for scientific research generation tasks such as hypothesis generation, abstract writing, and methodology completion. Fine-tuning was performed using the PEFT (Parameter-Efficient Fine-Tuning) library with LoRA in 4-bit quantized mode using the bitsandbytes backend.

This model is part of the Nexa Scientific Intelligence (Sci) series, developed for scalable, automated scientific reasoning and domain-specific text generation.

Developed by: Allan (Independent Scientific Intelligence Architect)
Funded by: Self-funded
Shared by: Allan (https://huggingface.co/Allanatrix)
Model type: Decoder-only transformer (causal language model)
Language(s): English (scientific domain-specific vocabulary)
License: Apache 2.0 (inherits from base model)
Fine-tuned from: mistralai/Mistral-7B-v0.1
Repository: https://huggingface.co/Allanatrix/Nexa-Mistral-Sci7b Demo: Coming soon via Hugging Face Spaces or Lambda inference endpoint.

Uses

Direct Use

Scientific hypothesis generation
Abstract and method section synthesis
Domain-specific research writing
Semantic completion of structured research prompts

Downstream Use

Fine-tuning or distillation into smaller expert models
Foundation for test-time reasoning agents
Seed model for bootstrapping larger synthetic scientific corpora

Out-of-Scope Use

General conversation or chat use cases
Non-English scientific domains
Legal, financial, or clinical advice generation

Bias, Risks, and Limitations

While the model performs well on structured scientific input, it inherits biases from its base model (Mistral-7B) and fine-tuning dataset. Results should be evaluated by domain experts before use in high-stakes settings. It may hallucinate plausible but incorrect facts, especially in low-data areas.

Recommendations

Users should:

Validate critical outputs against trusted scientific literature
Avoid deploying in clinical or regulatory environments without further evaluation
Consider additional domain fine-tuning for niche fields

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "allan-wandia/nexa-mistral-7b-sci"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")

prompt = "Generate a novel hypothesis in quantum materials research:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=250)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

Size: 100 million tokens sampled from a 500M+ token corpus
Source: Curated scientific literature, abstracts, methodologies, and domain-labeled corpora (Bio, Physics, QST, Astro)
Labeling: Token-level labels auto-generated via Nexa DataVault tokenizer infrastructure

Preprocessing

Tokenization with sequence truncation to 1024 tokens
Labeled and batched using CPU; inference dispatched to GPU asynchronously

Training Hyperparameters

Base model: mistralai/Mistral-7B-v0.1
Sequence length: 1024
Batch size: 1 (with gradient accumulation)
Gradient Accumulation Steps: 64
Effective Batch Size: 64
Learning rate: 2e-5
Epochs: 2
LoRA: Enabled (PEFT)
Quantization: 4-bit via bitsandbytes
Optimizer: 8-bit AdamW
Framework: Transformers + PEFT + Accelerate

Evaluation

Testing Data

Synthetic scientific prompts across domains (Physics, Biology, Materials Science)

Evaluation Factors

Semantic coherence (BLEU)
Hypothesis novelty (entropy score)
Internal scientific consistency (domain-specific rubric)

Results

Model performs robustly in hypothesis generation and scientific prose tasks. While base coherence is high, novelty depends on prompt diversity. Well-suited as a distiller or inference agent for synthetic scientific corpora generation.

Environmental Impact

Component	Value
Hardware Type	2× NVIDIA T4 GPUs
Hours used	~7.5
Cloud Provider	Kaggle (Google Cloud)
Compute Region	US
Carbon Emitted	Estimate pending (likely < 1kg CO2)

Technical Specifications

Model Architecture

Transformer decoder (Mistral-7B architecture)
LoRA adapters applied to attention and FFN layers
Quantized with bitsandbytes to 4-bit for memory efficiency

Compute Infrastructure

CPU: Intel i5 8th Gen vPro (batch preprocessing)
GPU: 2× NVIDIA T4 (CUDA 12.1)

Software Stack

PEFT 0.12.0
Transformers 4.41.1
Accelerate
TRL
Torch 2.x

Citation

BibTeX:

@misc{nexa-mistral-7b-sci,
  title = {Nexa Mistral 7B Sci},
  author = {Allan Wandia},
  year = {2025},
  howpublished = {\url{https://huggingface.co/allan-Wandia/nexa-mistral-7b-sci}},
  note = {Fine-tuned model for scientific generation tasks}
}

Model Card Contact

For questions, contact Allan via Hugging Face or at: 📫 Email: [allanw.mk@gmail.com]

Model Card Authors

Allan Wandia (Independent ML Engineer and Systems Architect)

Glossary

LoRA: Low-Rank Adaptation
PEFT: Parameter-Efficient Fine-Tuning
BLEU: Bilingual Evaluation Understudy Score
Entropy Score: Metric used to estimate novelty/variation
Safe Tensors: Secure, fast format for model weights

Github Repo and notebook: https://github.com/DarkStarStrix/Nexa_Auto

Downloads last month: -

Model tree for Allanatrix/Nexa-Mistral-Sci7b

Base model

mistralai/Mistral-7B-v0.1

Adapter

(2317)

this model

Dataset used to train Allanatrix/Nexa-Mistral-Sci7b

Collection including Allanatrix/Nexa-Mistral-Sci7b

Nexa_Models

Collection

This is where I keep all of my SciML models • 13 items • Updated Jul 22

Evaluation results

BLEU on Nexa Scientific Tokens
self-reported

10.000
Entropy Novelty on Nexa Scientific Tokens
self-reported

6.000
Internal Consistency on Nexa Scientific Tokens
self-reported

9.000

View on Papers With Code

Allanatrix
/

Nexa-Mistral-Sci7b

Model Card for `nexa-mistral-7b-sci`

Model Details

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Preprocessing

Training Hyperparameters

Evaluation

Testing Data

Evaluation Factors

Results

Environmental Impact

Technical Specifications

Model Architecture

Compute Infrastructure

Software Stack

Citation

Model Card Contact

Model Card Authors

Glossary

Links

Github Repo and notebook: https://github.com/DarkStarStrix/Nexa_Auto

Model tree for Allanatrix/Nexa-Mistral-Sci7b

Dataset used to train Allanatrix/Nexa-Mistral-Sci7b

Collection including Allanatrix/Nexa-Mistral-Sci7b

Nexa_Models

Evaluation results

Model Card for nexa-mistral-7b-sci

Model Details

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Preprocessing

Training Hyperparameters

Evaluation

Testing Data

Evaluation Factors

Results

Environmental Impact

Technical Specifications

Model Architecture

Compute Infrastructure

Software Stack

Citation

Model Card Contact

Model Card Authors

Glossary

Links

Github Repo and notebook: https://github.com/DarkStarStrix/Nexa_Auto

Model tree for Allanatrix/Nexa-Mistral-Sci7b

Dataset used to train Allanatrix/Nexa-Mistral-Sci7b

Collection including Allanatrix/Nexa-Mistral-Sci7b

Evaluation results

Model Card for `nexa-mistral-7b-sci`