Update README.md
Browse files
README.md
CHANGED
|
@@ -13,17 +13,17 @@ tags:
|
|
| 13 |
- Methodology
|
| 14 |
---
|
| 15 |
|
| 16 |
-
#
|
| 17 |
|
| 18 |
-
## Welcome to the
|
| 19 |
|
| 20 |
-
Get ready to supercharge your scientific research with the **
|
| 21 |
|
| 22 |
## Model Overview
|
| 23 |
|
| 24 |
The NexaSci family is a 110 million to 2.2 billion parameter architecture that uses a **Semantic Router** to direct queries to domain-specific expert modules (Physics, Biology, Materials Science). It’s optimized for resource-constrained environments, leveraging advanced training strategies, hardware optimizations, and techniques like reinforcement learning and sparse attention. Below are the current and planned models:
|
| 25 |
|
| 26 |
-
### 1. NexaSci-1 (Still working on this Indefinite timeline)
|
| 27 |
- **Parameters**: ~110 million
|
| 28 |
- **Purpose**: Generates hypotheses and methodological scaffolding for scientific tasks in physics, biology, and materials science.
|
| 29 |
- **Architecture**:
|
|
@@ -32,8 +32,8 @@ The NexaSci family is a 110 million to 2.2 billion parameter architecture that u
|
|
| 32 |
- **Inference & Validation Pipeline**: Aggregates expert outputs and ensures consistency.
|
| 33 |
- **Knowledge Feedback Loop**: Refines routing using reinforcement learning.
|
| 34 |
- **Training**:
|
| 35 |
-
- Pretrained on ~
|
| 36 |
-
- Fine-tuned with QLoRA on
|
| 37 |
- Uses AzureSky Optimizer (Stochastic Approximation + Adam hybrid).
|
| 38 |
- **Use Cases**:
|
| 39 |
- Generate plausible hypotheses (e.g., new material properties).
|
|
@@ -49,7 +49,7 @@ The NexaSci family is a 110 million to 2.2 billion parameter architecture that u
|
|
| 49 |
- Integrates with expert modules for structured, logical outputs.
|
| 50 |
- **Training**:
|
| 51 |
- Trained in three stages: Easy (basic logic), Moderate (complex tasks), Hard (advanced reasoning).
|
| 52 |
-
- Uses ~
|
| 53 |
- Employs AzureSky Optimizer with reinforcement learning fine-tuning.
|
| 54 |
- **Use Cases**:
|
| 55 |
- Solve multi-step physics problems (e.g., astrophysics simulations).
|
|
@@ -64,10 +64,10 @@ The NexaSci family is a 110 million to 2.2 billion parameter architecture that u
|
|
| 64 |
- Includes a **Longform Context Manager** to chunk inputs while preserving semantic coherence.
|
| 65 |
- Scales parameters using mixed precision training and gradient checkpointing.
|
| 66 |
- **Training**:
|
| 67 |
-
- Trained on ~
|
| 68 |
- Uses AzureSky Optimizer with mixed precision (FP16/BF16) and gradient checkpointing.
|
| 69 |
- **Use Cases**:
|
| 70 |
-
- Summarize or analyze long scientific papers (e.g.,
|
| 71 |
- Generate hypotheses from extended contexts (e.g., patent methods).
|
| 72 |
- Support multi-query tasks requiring deep document understanding.
|
| 73 |
|
|
@@ -81,10 +81,8 @@ The NexaSci family is a 110 million to 2.2 billion parameter architecture that u
|
|
| 81 |
The NexaSci family is trained on a **tiered token strategy** to maximize efficiency and domain specificity, as outlined in the architecture document:
|
| 82 |
|
| 83 |
- **Warm Start Corpus** (100M tokens): General language understanding from FineWeb-Edu, OpenWebMath, Wikipedia, and Aristo Science Questions.
|
| 84 |
-
- **Scientific Pretraining Corpus** (
|
| 85 |
-
- **Instruction Fine-Tune Dataset** (
|
| 86 |
-
- **Reasoning Curriculum Dataset** (50-75M tokens, CoT only): SciBench, OpenBookQA, and others for step-by-step reasoning.
|
| 87 |
-
- **Long-Context Corpus** (100-150M tokens, UltraMAX only): Full arXiv papers, NIH grants, and USPTO patents for long-context alignment.
|
| 88 |
|
| 89 |
**Token Efficiency Strategies**:
|
| 90 |
- Entropy scoring to remove low-information samples.
|
|
@@ -93,14 +91,10 @@ The NexaSci family is trained on a **tiered token strategy** to maximize efficie
|
|
| 93 |
- Routing and filtering to activate only relevant expert paths.
|
| 94 |
|
| 95 |
**Total Token Budget**:
|
| 96 |
-
|
| 97 |
-
- NEXA-CoT: ~425-500M tokens
|
| 98 |
-
- NEXA-Ultramax: ~600-650M tokens
|
| 99 |
|
| 100 |
**Hardware**:
|
| 101 |
-
|
| 102 |
-
- GPUs: Dual NVIDIA T4 GPUs (cloud-hosted) at 90%+ capacity.
|
| 103 |
-
- Performance: 47-50 petaflops with an optimized CPU-GPU pipeline.
|
| 104 |
|
| 105 |
**Optimization Techniques**:
|
| 106 |
- Sparse attention, mixed precision training, gradient checkpointing.
|
|
|
|
| 13 |
- Methodology
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# NexaSci Family of Models
|
| 17 |
|
| 18 |
+
## Welcome to the NexaSci Repository!
|
| 19 |
|
| 20 |
+
Get ready to supercharge your scientific research with the **Nexasci family of models**! This Hugging Face repository hosts a powerful suite of Mixture-of-Experts (MoE) models designed to generate hypotheses and methodologies across **physics**, **biology**, and **materials science**. Built with efficiency and scalability in mind, the NexaSci family includes the baseline **NexaSci**, the reasoning-enhanced **NEXASci-1-CoT**, and the long-context powerhouse **NEXA-1-Max**. Whether you’re a researcher tackling complex STEM problems, a data scientist exploring scientific ML, or a student learning about domain-specific AI, this repository is your go-to resource for cutting-edge scientific computation.
|
| 21 |
|
| 22 |
## Model Overview
|
| 23 |
|
| 24 |
The NexaSci family is a 110 million to 2.2 billion parameter architecture that uses a **Semantic Router** to direct queries to domain-specific expert modules (Physics, Biology, Materials Science). It’s optimized for resource-constrained environments, leveraging advanced training strategies, hardware optimizations, and techniques like reinforcement learning and sparse attention. Below are the current and planned models:
|
| 25 |
|
| 26 |
+
### 1. NexaSci-1-Mini (Still working on this Indefinite timeline)
|
| 27 |
- **Parameters**: ~110 million
|
| 28 |
- **Purpose**: Generates hypotheses and methodological scaffolding for scientific tasks in physics, biology, and materials science.
|
| 29 |
- **Architecture**:
|
|
|
|
| 32 |
- **Inference & Validation Pipeline**: Aggregates expert outputs and ensures consistency.
|
| 33 |
- **Knowledge Feedback Loop**: Refines routing using reinforcement learning.
|
| 34 |
- **Training**:
|
| 35 |
+
- Pretrained on ~2B tokens from arXiv, PubMed, and other scientific corpora.
|
| 36 |
+
- Fine-tuned with QLoRA on 500k instruction-style samples.
|
| 37 |
- Uses AzureSky Optimizer (Stochastic Approximation + Adam hybrid).
|
| 38 |
- **Use Cases**:
|
| 39 |
- Generate plausible hypotheses (e.g., new material properties).
|
|
|
|
| 49 |
- Integrates with expert modules for structured, logical outputs.
|
| 50 |
- **Training**:
|
| 51 |
- Trained in three stages: Easy (basic logic), Moderate (complex tasks), Hard (advanced reasoning).
|
| 52 |
+
- Uses ~2B tokens
|
| 53 |
- Employs AzureSky Optimizer with reinforcement learning fine-tuning.
|
| 54 |
- **Use Cases**:
|
| 55 |
- Solve multi-step physics problems (e.g., astrophysics simulations).
|
|
|
|
| 64 |
- Includes a **Longform Context Manager** to chunk inputs while preserving semantic coherence.
|
| 65 |
- Scales parameters using mixed precision training and gradient checkpointing.
|
| 66 |
- **Training**:
|
| 67 |
+
- Trained on ~2B tokens, including a Long-Context Corpus of full arXiv papers and NIH grants.
|
| 68 |
- Uses AzureSky Optimizer with mixed precision (FP16/BF16) and gradient checkpointing.
|
| 69 |
- **Use Cases**:
|
| 70 |
+
- Summarize or analyze long scientific papers (e.g., 120K-token preprints).
|
| 71 |
- Generate hypotheses from extended contexts (e.g., patent methods).
|
| 72 |
- Support multi-query tasks requiring deep document understanding.
|
| 73 |
|
|
|
|
| 81 |
The NexaSci family is trained on a **tiered token strategy** to maximize efficiency and domain specificity, as outlined in the architecture document:
|
| 82 |
|
| 83 |
- **Warm Start Corpus** (100M tokens): General language understanding from FineWeb-Edu, OpenWebMath, Wikipedia, and Aristo Science Questions.
|
| 84 |
+
- **Scientific Pretraining Corpus** (1-2B tokens): Domain-specific data from arXiv (physics), PubMed/BioRxiv (biology), and Materials Project/ChemRxiv (materials science).
|
| 85 |
+
- **Instruction Fine-Tune Dataset** (500K tokens): 5k high-quality instruction-style samples for hypothesis and method generation.
|
|
|
|
|
|
|
| 86 |
|
| 87 |
**Token Efficiency Strategies**:
|
| 88 |
- Entropy scoring to remove low-information samples.
|
|
|
|
| 91 |
- Routing and filtering to activate only relevant expert paths.
|
| 92 |
|
| 93 |
**Total Token Budget**:
|
| 94 |
+
For all models ~2B tokens
|
|
|
|
|
|
|
| 95 |
|
| 96 |
**Hardware**:
|
| 97 |
+
Currently limited here still looking and hunting
|
|
|
|
|
|
|
| 98 |
|
| 99 |
**Optimization Techniques**:
|
| 100 |
- Sparse attention, mixed precision training, gradient checkpointing.
|