Add paper link to model card (#5)
Browse files- Add paper link to model card (797ecea8a3e604c7070e169fc1354b49fce51349)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,15 +1,15 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
tags:
|
| 4 |
-
- finetuned
|
| 5 |
-
- chat
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
- ko
|
| 9 |
- ja
|
| 10 |
- zh
|
| 11 |
-
pipeline_tag: text-generation
|
| 12 |
library_name: transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
# Trillion-7B-preview
|
|
@@ -22,7 +22,7 @@ library_name: transformers
|
|
| 22 |
|
| 23 |
## Introduction
|
| 24 |
|
| 25 |
-
We introduce Trillion-7B-preview, a preview of our latest large language model designed to push the boundaries of multilingual scalability and performance.
|
| 26 |
|
| 27 |
|
| 28 |
When comparing performance to training FLOPs for Trillion-7B-preview with competitive models, our model pushes the Pareto frontier, achieving around 66.5% average performance while using significantly fewer compute (~9.3×10²² FLOPs). It outperforms models like Mistral-7B-Instruct-v0.3 and SOLAR-10.7B-Instruct-v1.0 while remaining competitive with models requiring 3-8× more compute such as Qwen2.5-7B-Instruct and EXAONE-3.5-7.8B-Instruct. For full benchmark results, see tables below.
|
|
@@ -240,4 +240,4 @@ This model repository is licensed under the Apache-2.0 License.
|
|
| 240 |
}
|
| 241 |
```
|
| 242 |
## Contact
|
| 243 |
-
For inquiries, please contact: info@trillionlabs.co
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
- ko
|
| 5 |
- ja
|
| 6 |
- zh
|
|
|
|
| 7 |
library_name: transformers
|
| 8 |
+
license: apache-2.0
|
| 9 |
+
pipeline_tag: text-generation
|
| 10 |
+
tags:
|
| 11 |
+
- finetuned
|
| 12 |
+
- chat
|
| 13 |
---
|
| 14 |
|
| 15 |
# Trillion-7B-preview
|
|
|
|
| 22 |
|
| 23 |
## Introduction
|
| 24 |
|
| 25 |
+
We introduce Trillion-7B-preview, a preview of our latest large language model designed to push the boundaries of multilingual scalability and performance. This model is presented in the paper: [Trillion-7B-preview](https://huggingface.co/papers/2504.15431).
|
| 26 |
|
| 27 |
|
| 28 |
When comparing performance to training FLOPs for Trillion-7B-preview with competitive models, our model pushes the Pareto frontier, achieving around 66.5% average performance while using significantly fewer compute (~9.3×10²² FLOPs). It outperforms models like Mistral-7B-Instruct-v0.3 and SOLAR-10.7B-Instruct-v1.0 while remaining competitive with models requiring 3-8× more compute such as Qwen2.5-7B-Instruct and EXAONE-3.5-7.8B-Instruct. For full benchmark results, see tables below.
|
|
|
|
| 240 |
}
|
| 241 |
```
|
| 242 |
## Contact
|
| 243 |
+
For inquiries, please contact: info@trillionlabs.co
|