张绍磊
commited on
Commit
·
cf6a785
1
Parent(s):
54f2fa1
update
Browse files
README.md
CHANGED
@@ -6,13 +6,15 @@ tags:
|
|
6 |
---
|
7 |
# Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
|
8 |
|
9 |
-
[](https://huggingface.co/ICTNLP/stream-omni-8b)
|
11 |
[](https://huggingface.co/datasets/ICTNLP/InstructOmni)
|
12 |
[](https://github.com/ictnlp/Stream-Omni)
|
13 |
|
14 |
> [**Shaolei Zhang**](https://zhangshaolei1998.github.io/), [**Shoutao Guo**](https://scholar.google.com.hk/citations?user=XwHtPyAAAAAJ), [**Qingkai Fang**](https://fangqingkai.github.io/), [**Yan Zhou**](https://zhouyan19.github.io/zhouyan/), [**Yang Feng**](https://people.ucas.edu.cn/~yangfeng?language=en)\*
|
15 |
|
|
|
16 |
|
17 |
Stream-Omni is an end-to-end language-vision-speech chatbot that simultaneously supports interaction across various modality combinations, with the following features💡:
|
18 |
- **Omni Interaction**: Support any multimodal inputs including text, vision, and speech, and generate both text and speech responses.
|
@@ -32,6 +34,3 @@ Stream-Omni is an end-to-end language-vision-speech chatbot that simultaneously
|
|
32 |
> [!NOTE]
|
33 |
>
|
34 |
> **Stream-Omni can produce intermediate textual results (ASR transcription and text response) during speech interaction, offering users a seamless "see-while-hear" experience.**
|
35 |
-
|
36 |
-
|
37 |
-
The introduction and usage of Stream-Omni refer to [https://github.com/ictnlp/Stream-Omni](https://github.com/ictnlp/Stream-Omni).
|
|
|
6 |
---
|
7 |
# Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
|
8 |
|
9 |
+
[](https://arxiv.org/abs/2506.13642)
|
10 |
+
[](https://github.com/ictnlp/Stream-Omni)
|
11 |
[](https://huggingface.co/ICTNLP/stream-omni-8b)
|
12 |
[](https://huggingface.co/datasets/ICTNLP/InstructOmni)
|
13 |
[](https://github.com/ictnlp/Stream-Omni)
|
14 |
|
15 |
> [**Shaolei Zhang**](https://zhangshaolei1998.github.io/), [**Shoutao Guo**](https://scholar.google.com.hk/citations?user=XwHtPyAAAAAJ), [**Qingkai Fang**](https://fangqingkai.github.io/), [**Yan Zhou**](https://zhouyan19.github.io/zhouyan/), [**Yang Feng**](https://people.ucas.edu.cn/~yangfeng?language=en)\*
|
16 |
|
17 |
+
The introduction and usage of Stream-Omni refer to [https://github.com/ictnlp/Stream-Omni](https://github.com/ictnlp/Stream-Omni).
|
18 |
|
19 |
Stream-Omni is an end-to-end language-vision-speech chatbot that simultaneously supports interaction across various modality combinations, with the following features💡:
|
20 |
- **Omni Interaction**: Support any multimodal inputs including text, vision, and speech, and generate both text and speech responses.
|
|
|
34 |
> [!NOTE]
|
35 |
>
|
36 |
> **Stream-Omni can produce intermediate textual results (ASR transcription and text response) during speech interaction, offering users a seamless "see-while-hear" experience.**
|
|
|
|
|
|