mrfakename commited on
Commit
ea9f157
·
verified ·
1 Parent(s): 2a7fe05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -84
README.md CHANGED
@@ -1,86 +1,5 @@
1
  ---
2
  sdk: gradio
3
- ---
4
- # [SongBloom]: *Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement*
5
-
6
- We propose **SongBloom**, a novel framework for full-length song generation that leverages an interleaved paradigm of autoregressive sketching and diffusion-based refinement. SongBloom employs an autoregressive diffusion model that combines the high fidelity of diffusion models with the scalability of language models.
7
- Specifically, it gradually extends a musical sketch from short to long and refines the details from coarse to fine-grained. The interleaved generation paradigm effectively integrates prior semantic and acoustic context to guide the generation process.
8
- Experimental results demonstrate that SongBloom outperforms existing methods across both subjective and objective metrics and achieves performance comparable to the state-of-the-art commercial music generation platforms.
9
-
10
- ![img](docs/architecture.png)
11
-
12
- Demo page: [https://cypress-yang.github.io/SongBloom_demo](https://cypress-yang.github.io/SongBloom_demo)
13
-
14
- ArXiv: [https://arxiv.org/abs/2506.07634](https://arxiv.org/abs/2506.07634)
15
-
16
- ## Prepare Environments
17
-
18
- ```bash
19
- conda create -n SongBloom python==3.8.12
20
- conda activate SongBloom
21
-
22
- # yum install libsndfile
23
- # pip install torch==2.2.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118 # For different CUDA version
24
- pip install -r requirements.txt
25
- ```
26
-
27
- ## Data Preparation
28
-
29
- A .jsonl file, where each line is a json object:
30
-
31
- ```json
32
- {
33
- "idx": "The index of each sample",
34
- "lyrics": "The lyrics to be generated",
35
- "prompt_wav": "The path of the style prompt audio",
36
- }
37
- ```
38
-
39
- One example can be refered to as: [example/test.jsonl](example/test.jsonl)
40
-
41
- The prompt wav should be a 10-second, 48kHz audio clip.
42
-
43
- The details about lyric format can be found in [docs/lyric_format.md](docs/lyric_format.md).
44
-
45
- ## Inference
46
-
47
- ```bash
48
- source set_env.sh
49
-
50
- python3 infer.py --input-jsonl example/test.jsonl
51
-
52
- # For GPUs with low VRAM like RTX4090, you should set the dtype as bfloat16
53
- python3 infer.py --input-jsonl example/test.jsonl --dtype bfloat16
54
-
55
- # SongBloom also supports flash-attn (optional). To enable it, please install flash-attn (v2.6.3 is used during training) manually and set os.environ['DISABLE_FLASH_ATTN'] = "0" in infer.py:8
56
- ```
57
-
58
- ## Models
59
-
60
- | Name | Size | Max Length | Prompt type | 🤗 |
61
- | -------------------- | ---- | ---------- | ----------- | -------------------------------------------- |
62
- | songbloom_full_150s | 2B | 2m30s | 10s wav | [link](https://huggingface.co/CypressYang/SongBloom) |
63
- | songbloom_mulan_150s | 2B | 2m30s | 10s wav / text description | coming soon |
64
- | ... | | | | |
65
-
66
-
67
-
68
- ## TODO List
69
-
70
- - [ ] Support Text Description
71
- - [ ] Full version
72
-
73
- ## Citation
74
-
75
- ```
76
- @article{yang2025songbloom,
77
- title={SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement},
78
- author={Yang, Chenyu and Wang, Shuai and Chen, Hangting and Tan, Wei and Yu, Jianwei and Li, Haizhou},
79
- journal={arXiv preprint arXiv:2506.07634},
80
- year={2025}
81
- }
82
- ```
83
-
84
- ## License
85
-
86
- SongBloom (codes and weights) is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
 
1
  ---
2
  sdk: gradio
3
+ short_description: Online demo for Apple's DiffuCoder-7B-cpGRPO (Diffusion LLM)
4
+ sdk_version: 5.38.0
5
+ ---