File size: 3,791 Bytes

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# AudioEditingCode Colab Demo\n",
    "\n",
    "This notebook demonstrates how to use the `AudioEditingCode` repository in Google Colab.\n",
    "\n",
    "## 1. Clone the repository\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!git clone https://github.com/HilaManor/AudioEditingCode.git\n",
    "%cd AudioEditingCode\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Install dependencies\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -r requirements.txt\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Demo Usage\n",
    "\n",
    "Here you can add examples of how to use the code. You might need to download some audio files for demonstration.\n",
    "\n",
    "### Download example audio\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3 -O input_audio.mp3\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Text-Based Editing Example\n",
    "\n",
    "This example uses `main_run.py` for text-based audio editing. You will need a Hugging Face token to use models like Stable Audio Open. Please visit [Hugging Face](https://huggingface.co/settings/tokens) to get your token and replace `<YOUR_HF_TOKEN>` below.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "# Replace with your actual Hugging Face token\n",
    "os.environ[\"HF_TOKEN\"] = \"<YOUR_HF_TOKEN>\"\n",
    "\n",
    "!python code/main_run.py \\\n",
    "    --cfg_tar 1.5 \\\n",
    "    --cfg_src 0.5 \\\n",
    "    --init_aud input_audio.mp3 \\\n",
    "    --target_prompt \"a dog barking\" \\\n",
    "    --tstart 100 \\\n",
    "    --model_id cvssp/audioldm-s-full-v2 \\\n",
    "    --results_path results_text_based\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Unsupervised Editing Example\n",
    "\n",
    "First, extract the principal components:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python code/main_pc_extract_inv.py \\\n",
    "    --init_aud input_audio.mp3 \\\n",
    "    --model_id cvssp/audioldm-s-full-v2 \\\n",
    "    --results_path results_unsupervised_extract \\\n",
    "    --drift_start 0 \\\n",
    "    --drift_end 200 \\\n",
    "    --n_evs 5\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, apply the principal components:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python code/main_pc_apply_drift.py \\\n",
    "    --extraction_path results_unsupervised_extract/input_audio_cvssp_audioldm-s-full-v2_inversion_data.pt \\\n",
    "    --drift_start 0 \\\n",
    "    --drift_end 200 \\\n",
    "    --amount 1.0 \\\n",
    "    --evs 0\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}