Spaces:

Agents-MCP-Hackathon
/

IRIS

Sleeping

App Files Files Community

a-zamfir commited on Jun 10

Commit

924e633

1 Parent(s): e42f725

Added application files

Browse files

Files changed (10) hide show

.gitignore +5 -0
README.md +100 -5
app.py +405 -0
config/prompts.py +155 -0
config/settings.py +84 -0
mcp_servers/hyperv_mcp.py +319 -0
requirements.txt +0 -0
services/audio_service.py +113 -0
services/llm_service.py +73 -0
services/screen_service.py +215 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,5 @@

+.env
+*.env
+__pycache__
+*.pyc

README.md CHANGED Viewed

@@ -1,13 +1,108 @@
 ---
 title: IRIS
-emoji: 🏃
-colorFrom: green
-colorTo: gray
 sdk: gradio
 sdk_version: 5.33.1
 app_file: app.py
 pinned: false
-short_description: 'IRIS is an agentic chatbot '
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: IRIS
+emoji: 💬
+colorFrom: yellow
+colorTo: purple
 sdk: gradio
 sdk_version: 5.33.1
 app_file: app.py
 pinned: false
+short_description: IRIS - HuggingFace Hackathon
+tags:
+  - agent-demo-track
 ---
+# IRIS
+## Important
+1. **Watch** IRIS' video overview here: https://www.youtube.com/watch?v=dieWyZZez6o
+2. **IRIS does not work on Spaces! It requires a virtualization environment on either Amazon or Azure (or a local environment) as its MCP server targets Virtual Machines.**
+## Overview
+IRIS is an agentic chatbot proof-of-concept built for the HuggingFace Hackathon. It demonstrates how a multimodal AI assistant can:
+- **Listen** to voice commands (STT)
+- **Speak** AI responses (TTS)
+- **See** user screens and analyze them with a vision model
+- **Act** on infrastructure via a MCP integration
+The goal is to showcase how modern LLMs, audio models, vision models and operator toolchains can be combined into a seamless, voice-driven infrastructure management assistant.
+## Key Goals
+1. **Multimodal Interaction**
+   - Voice: real-time speech-to-text (STT) and text-to-speech (TTS)
+   - Vision: live screen capture + AI analysis
+   - Text: conversational UI backed by an LLM
+2. **Agentic Control**
+   - Automatically detect when to call management tools
+   - Execute Hyper-V VM operations through a RESTful MCP server
+3. **Proof-of-Concept (POC)**
+   - Focus on clarity and modularity
+   - Demonstrate core concepts rather than production-grade polish
+## Functionalities & Offerings
+### 1. Audio Service
+- **STT**: Uses HuggingFace’s Falcon-AI (or OpenAI Whisper) to transcribe user speech.
+- **TTS**: Leverages a HuggingFace TTS model (e.g. `canopylabs/orpheus-3b`) to speak back responses.
+### 2. Text (LLM) Service
+- Built on HuggingFace’s 🧩 InferenceClient or OpenAI fallback.
+- Default model: `Qwen/Qwen2.5-7B-Instruct` (configurable).
+- Handles chat prompt orchestration, reasoning-before-action, and tool-call formatting.
+### 3. Vision & Screen Service
+- Captures your monitor at configurable FPS and resolution.
+- Sends images to a Nebius vision model (`google/gemma-3-27b-it`) with a guided prompt.
+- Parses vision output into “Issue Found / Description / Recommendation”.
+### 4. MCP Integration
+- **Hyper-V MCP Server**: FastAPI service exposing tools to list, query, start, stop, and restart VMs.
+- Agent parses LLM tool calls and invokes them via HTTP.
+- Enables fully automated infrastructure actions in response to user voice commands.
+## Providers & Configuration
+| Service            | Provider / Model                                 |
+|--------------------|--------------------------------------------------|
+| LLM                | HuggingFace Inference (fallback: OpenAI)         |
+| STT                | Falcon-AI (with HF token) or OpenAI Whisper      |
+| TTS                | HF TTS (`canopylabs/orpheus-3b-0.1-ft`)          |
+| Vision             | Nebius (`google/gemma-3-27b-it`)                 |
+| MCP (VM control)   | Custom Hyper-V FastAPI server                    |
+| UI Framework       | Gradio                                           |
+All credentials and endpoints are managed via environment variables in `config/settings.py`.
+## Quickstart
+1. **Configure** `.env` with your HF and (optionally) OpenAI tokens.
+2. **Run** the Hyper-V MCP server:
+```bash
+python hyperv_mcp.py
+```
+3. **Launch** the Gradio app:
+```bash
+python app.py
+```
+4. **Interact** by typing or speaking.
+Click “Start sharing screen” to begin vision analysis.
+Ask IRIS to list VMs, check status, or start a VM by voice.
+IRIS will confirm actions and execute them through the MCP.
+## Contact
+<a.zamfir@hotmail.com>
+LinkedIn: Andrei Zamfir <https://www.linkedin.com/in/andrei-d-zamfir/>

app.py ADDED Viewed

	@@ -0,0 +1,405 @@

+import gradio as gr
+import asyncio
+import logging
+import tempfile
+import json
+import re
+import requests
+from typing import Optional, Dict, Any, List
+from services.audio_service import AudioService
+from services.llm_service import LLMService
+from services.screen_service import ScreenService
+from config.settings import Settings
+from config.prompts import get_generic_prompt, get_vision_prompt
+# Configure root logger
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class MCPRestClient:
+    def __init__(self, base_url: str = "http://localhost:8000"):
+        self.base_url = base_url.rstrip('/')
+    async def initialize(self):
+        """Test connection to MCP server"""
+        try:
+            response = requests.get(f"{self.base_url}/", timeout=5)
+            if response.status_code == 200:
+                logger.info("Successfully connected to MCP server")
+            else:
+                raise ConnectionError(f"MCP server returned status {response.status_code}")
+        except Exception as e:
+            logger.error(f"Failed to connect to MCP server at {self.base_url}: {e}")
+            logger.info("IRIS did not detect any MCP server. If you're running this in a HuggingFace space, please referr to the readme.md documentation.")
+            raise
+    async def get_available_tools(self) -> Dict[str, Dict]:
+        """Get list of available tools from MCP server"""
+        try:
+            response = requests.get(f"{self.base_url}/tools", timeout=5)
+            if response.status_code == 200:
+                data = response.json()
+                tools = {}
+                for tool in data.get("tools", []):
+                    tools[tool["name"]] = {
+                        "description": tool.get("description", ""),
+                        "inputSchema": tool.get("inputSchema", {})
+                    }
+                return tools
+            else:
+                logger.error(f"Failed to get tools: HTTP {response.status_code}")
+                return {}
+        except Exception as e:
+            logger.error(f"Failed to get tools: {e}")
+            return {}
+    async def call_tool(self, tool_name: str, arguments: Dict[str, Any]) -> Any:
+        """Call a tool on the MCP server"""
+        try:
+            payload = {
+                "name": tool_name,
+                "arguments": arguments
+            }
+            response = requests.post(
+                f"{self.base_url}/tools/call",
+                json=payload,
+                timeout=30
+            )
+            if response.status_code == 200:
+                data = response.json()
+                if data.get("success"):
+                    return data.get("result")
+                else:
+                    return {"error": data.get("error", "Unknown error")}
+            else:
+                return {"error": f"HTTP {response.status_code}: {response.text}"}
+        except Exception as e:
+            return {"error": str(e)}
+    async def close(self):
+        """Nothing to close with requests"""
+        pass
+class AgenticChatbot:
+    def __init__(self):
+        self.settings = Settings()
+        # AudioService
+        audio_api_key = (
+            self.settings.hf_token
+            if self.settings.effective_audio_provider == "huggingface"
+            else self.settings.openai_api_key
+        )
+        self.audio_service = AudioService(
+            api_key=audio_api_key,
+            stt_provider="fal-ai",
+            stt_model=self.settings.stt_model,
+            tts_model=self.settings.tts_model,
+        )
+        # LLMService
+        self.llm_service = LLMService(
+            api_key=self.settings.llm_api_key,
+            model_name=self.settings.effective_model_name,
+        )
+        # MCPService - Now using REST client
+        mcp_server_url = getattr(self.settings, 'mcp_server_url', 'http://localhost:8000')
+        self.mcp_service = MCPRestClient(mcp_server_url)
+        # ScreenService
+        self.screen_service = ScreenService(
+            prompt=get_vision_prompt(),
+            model=self.settings.NEBIUS_MODEL,
+            fps=0.05,
+            queue_size=2,
+            monitor=1,
+            compression_quality=self.settings.screen_compression_quality,
+            max_width=self.settings.max_width,
+            max_height=self.settings.max_height,
+        )
+        self.latest_screen_context: str = ""
+        self.conversation_history: List[Dict[str, Any]] = []
+    async def initialize(self):
+        try:
+            await self.mcp_service.initialize()
+            tools = await self.mcp_service.get_available_tools()
+            logger.info(f"Initialized with {len(tools)} MCP tools")
+        except Exception as e:
+            logger.error(f"MCP init failed: {e}")
+    # Screen callbacks
+    def _on_screen_result(self, resp: dict, latency: float, frame_b64: str):
+        try:
+            content = resp.choices[0].message.content
+        except Exception:
+            content = str(resp)
+        self.latest_screen_context = content
+        logger.info(f"[Screen] {latency*1000:.0f}ms → {content}")
+    def _get_conversation_history(self) -> List[Dict[str, str]]:
+        """Return the current conversation history for the screen service"""
+        return self.conversation_history.copy()
+    def start_screen_sharing(self) -> str:
+        self.latest_screen_context = ""
+        # Pass the history getter method to screen service
+        self.screen_service.start(
+            self._on_screen_result,
+            history_getter=self._get_conversation_history  # Use the method reference
+        )
+        return "✅ Screen sharing started."
+    async def stop_screen_sharing(
+        self,
+        history: Optional[List[Dict[str, str]]]
+    ) -> (List[Dict[str, str]], str, Optional[str]):
+        """Stop screen sharing and append an LLM-generated summary to the chat."""
+        # Stop capture
+        self.screen_service.stop()
+        # Get the latest vision context
+        vision_ctx = self.latest_screen_context
+        if vision_ctx and history is not None:
+            # Call process_message with the vision context as user input
+            updated_history, audio_path = await self.process_message(
+                text_input=f"VISION MODEL OUTPUT: {vision_ctx}",
+                audio_input=None,
+                history=history
+            )
+            return updated_history, "🛑 Screen sharing stopped.", audio_path
+        # If no vision context or history, just return
+        return history or [], "🛑 Screen sharing stopped.", None
+    async def execute_tool_calls(self, response_text: str) -> str:
+        """Parse and execute function calls from LLM response using robust regex parsing"""
+        # Clean the response text - remove code blocks and extra formatting
+        cleaned_text = re.sub(r'```[a-zA-Z]*\n?', '', response_text)  # Remove code block markers
+        cleaned_text = re.sub(r'\n```', '', cleaned_text)  # Remove closing code blocks
+        # Pattern for function calls: function_name(arg1="value1", arg2=value2, arg3=true)
+        function_pattern = r'(\w+)\s*\(\s*([^)]*)\s*\)'
+        results = []
+        # Find all function calls in the cleaned response
+        for match in re.finditer(function_pattern, cleaned_text):
+            tool_name = match.group(1)
+            args_str = match.group(2).strip()
+            # Skip if this isn't actually a tool (check against available tools)
+            available_tools = await self.mcp_service.get_available_tools()
+            if tool_name not in available_tools:
+                continue
+            try:
+                # Parse arguments using regex for key=value pairs
+                args = {}
+                if args_str:
+                    # Pattern for key=value pairs, handling quoted strings, numbers, booleans
+                    arg_pattern = r'(\w+)\s*=\s*(?:"([^"]*)"|\'([^\']*)\'|(\w+))'
+                    for arg_match in re.finditer(arg_pattern, args_str):
+                        key = arg_match.group(1)
+                        # Get the value from whichever group matched (quoted or unquoted)
+                        value = (arg_match.group(2) or
+                                arg_match.group(3) or
+                                arg_match.group(4))
+                        # Type conversion for common types
+                        if value.lower() == 'true':
+                            args[key] = True
+                        elif value.lower() == 'false':
+                            args[key] = False
+                        elif value.isdigit():
+                            args[key] = int(value)
+                        elif value.replace('.', '').isdigit():
+                            args[key] = float(value)
+                        else:
+                            args[key] = value
+                # Execute the tool
+                logger.info(f"Executing tool: {tool_name} with args: {args}")
+                result = await self.mcp_service.call_tool(tool_name, args)
+                results.append({
+                    'tool': tool_name,
+                    'args': args,
+                    'result': result
+                })
+            except Exception as e:
+                results.append({
+                    'tool': tool_name,
+                    'args': args if 'args' in locals() else {},
+                    'error': str(e)
+                })
+        # Format results for LLM
+        if not results:
+            return ""
+        formatted_results = []
+        for result in results:
+            if 'error' in result:
+                formatted_results.append(
+                    f"Tool {result['tool']} failed: {result['error']}"
+                )
+            else:
+                formatted_results.append(
+                    f"Tool {result['tool']} executed successfully:\n{json.dumps(result['result'], indent=2)}"
+                )
+        return "\n\n".join(formatted_results)
+    # Chat / tool integration
+    async def generate_response(
+        self,
+        user_input: str,
+        screen_context: str = "",
+        tool_result: str = ""
+    ) -> str:
+        # Retrieve available tools metadata
+        tools = await self.mcp_service.get_available_tools()
+        # Format tool list for prompt
+        tool_desc = "\n".join(f"- {name}: {info.get('description','')}" for name, info in tools.items())
+        # Build messages
+        messages: List[Dict[str, str]] = [
+            {"role": "system", "content": get_generic_prompt()},
+        ]
+        # Inform LLM about tools
+        if tool_desc:
+            messages.append({"role": "system", "content": f"Available tools:\n{tool_desc}"})
+        messages.append({"role": "user",   "content": user_input})
+        if tool_result:
+            messages.append({"role": "assistant", "content": tool_result})
+        return await self.llm_service.get_chat_completion(messages)
+    async def process_message(
+        self,
+        text_input: str,
+        audio_input: Optional[str],
+        history: List[Dict[str, str]]
+    ) -> (List[Dict[str, str]], Optional[str]):
+        # Debug: Log the incoming state
+        logger.info(f"=== PROCESS_MESSAGE START ===")
+        for i, msg in enumerate(history[-3:]):
+            logger.info(f"  {len(history) - 3 + i}: {msg.get('role')} - {msg.get('content', '')[:100]}...")
+        # Update the internal conversation history to match the UI history
+        self.conversation_history = history.copy()
+        # STT
+        transcript = ""
+        if audio_input:
+            transcript = await self.audio_service.speech_to_text(audio_input)
+        user_input = (text_input + " " + transcript).strip()
+        # If no input, return unchanged
+        if not user_input:
+            return history, None
+        # Check if this is a vision model output being processed
+        is_vision_output = user_input.startswith("VISION MODEL OUTPUT:")
+        # Add user message to both histories (ALWAYS add the user input)
+        user_message = {"role": "user", "content": user_input}
+        history.append(user_message)
+        self.conversation_history.append(user_message)
+        # Handle screen context - only for regular user inputs, not vision outputs
+        screen_ctx = ""
+        if not is_vision_output and self.latest_screen_context:
+            screen_ctx = self.latest_screen_context
+            # Clear the screen context after using it to prevent reuse
+            self.latest_screen_context = ""
+        # Get initial LLM response (may include tool calls)
+        assistant_reply = await self.generate_response(user_input, screen_ctx)
+        # Check if response contains function calls and execute them
+        tool_results = await self.execute_tool_calls(assistant_reply)
+        if tool_results:
+            tool_message = {"role": "assistant", "content": tool_results}
+            history.append(tool_message)
+            self.conversation_history.append(tool_message)
+            # Get final response after tool execution
+            assistant_reply = await self.generate_response(user_input, screen_ctx, tool_results)
+        # ALWAYS add the final assistant response to both histories
+        assistant_message = {"role": "assistant", "content": assistant_reply}
+        history.append(assistant_message)
+        self.conversation_history.append(assistant_message)
+        # TTS - only speak the assistant reply for regular inputs
+        audio_path = None
+        audio_bytes = await self.audio_service.text_to_speech(assistant_reply)
+        if audio_bytes:
+            tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
+            tmp.write(audio_bytes)
+            tmp.close()
+            audio_path = tmp.name
+        logger.info(f"=== PROCESS_MESSAGE END ===")
+        return history, audio_path
+    async def cleanup(self):
+        """Cleanup resources"""
+        await self.mcp_service.close()
+# ——————————————————————————————————
+# Gradio interface setup
+# ——————————————————————————————————
+chatbot = AgenticChatbot()
+async def setup_gradio_interface() -> gr.Blocks:
+    await chatbot.initialize()
+    with gr.Blocks(title="Agentic Chatbot", theme=gr.themes.Soft()) as demo:
+        chat = gr.Chatbot(type="messages", label="Conversation")
+        text_input  = gr.Textbox(lines=2, placeholder="Type your message…", label="Text")
+        audio_input = gr.Audio(sources=["microphone"], type="filepath", label="Voice")
+        # Screen-sharing controls
+        screen_status = gr.Textbox(label="Screen Sharing Status", interactive=False)
+        start_btn     = gr.Button("Start sharing screen")
+        stop_btn      = gr.Button("Stop sharing screen")
+        # AI response audio player (including vision TTS)
+        audio_output  = gr.Audio(label="AI Response", autoplay=True)
+        # Message send
+        send_btn = gr.Button("Send", variant="primary")
+        # Wire up buttons
+# Wire up buttons
+        start_btn.click(fn=chatbot.start_screen_sharing, inputs=None, outputs=screen_status)
+        stop_btn.click(fn=chatbot.stop_screen_sharing, inputs=[chat], outputs=[chat, screen_status, audio_output])
+        send_btn.click(
+            chatbot.process_message,
+            inputs=[text_input, audio_input, chat],
+            outputs=[chat, audio_output]
+        )
+        text_input.submit(
+            chatbot.process_message,
+            inputs=[text_input, audio_input, chat],
+            outputs=[chat, audio_output]
+        )
+    return demo
+if __name__ == "__main__":
+    demo = asyncio.run(setup_gradio_interface())
+    demo.launch(server_name="0.0.0.0", server_port=7860)

config/prompts.py ADDED Viewed

	@@ -0,0 +1,155 @@

+'''
+Prompts module for text and vision in the Agentic Chatbot.
+'''
+# Prompt used for all generic text-based interactions
+GENERIC_PROMPT = '''
+You are a Hyper‑V virtual machine management assistant: help users manage Hyper‑V VMs by providing clear guidance and executing management commands when explicitly requested.
+You are able to receive automated image analysis through the usage of a Vision-compatible model. The user is able to share this data to you as text. Accept screen share requests.
+When a user shares screen data, they will provide it as text prefixed with VISION MODEL OUTPUT:—you must treat that as their input and respond accordingly.
+Provide conversational answers for general queries. When users request VM management actions, follow a strict reasoning‑before‑action structure and execute the appropriate tool functions directly. If you receive input beginning with VISION MODEL OUTPUT:, parse only its Recommendation section and return a concise remediation step based solely on that recommendation.
+Tools
+list_vms(): List all virtual machines and their current status
+get_vm_status(vm_name="[VMName]"): Get detailed status for a specific VM
+start_vm(vm_name="[VMName]"): Start a virtual machine
+stop_vm(vm_name="[VMName]", force=[true|false]): Stop a VM (force=true for hard shutdown)
+restart_vm(vm_name="[VMName]", force=[true|false]): Restart a VM
+Steps
+1 Detect Vision Input
+If user input starts with VISION MODEL OUTPUT:, skip normal steps and go to Vision Response.
+2 Understand the user’s request:
+General guidance → respond conversationally.
+VM management action → proceed to step 3.
+3 Plan: Identify which tool(s) to call.
+4 Action: State the action, then place the function call on its own line.
+5 Analysis: After output returns, interpret the results.
+6 Follow‑up: Suggest next steps or ask clarifying questions if needed.
+Vision Response
+Input: Text prefixed with VISION MODEL OUTPUT: containing a “Recommendation:” line.
+Output: A single concise instruction telling the user what to do next, based only on that Recommendation.
+Output Format
+Description of the action, then the exact function call.
+Interpretation of results.
+Follow‑up: optional question or suggestion.
+Vision Response: a short sentence or two reflecting only the vision Recommendation.
+Examples
+Example 1 – Listing VMs
+User: “Show me all my virtual machines.”
+Assistant:
+list_vms()
+Example 2 – Starting a VM
+User: “Please start the Accounting VM.”
+Assistant: start_vm(vm_name="Accounting")
+Example 3 – Vision Input
+User:
+VISION MODEL OUTPUT:
+Issue Found: Yes
+Location: “Accounting” row
+Recommendation: Select the “Accounting” virtual machine and initiate the “Start” action to bring it online.
+Assistant (Vision Response):
+Select the “Accounting” VM and run the Start action to power it on.
+Example 4 – Screen Share Initiation and Vision Flow
+User: “Hello. I have an issue with one of my virtual machines. I’ll share screen so you can see.”
+Assistant:
+Sure — please provide the screen data so I can analyze it and guide you.
+Notes
+Default to graceful shutdown (force=false) unless specified.
+Only execute tool calls when explicitly requested.
+Reasoning must always precede Action; conclusions must appear last.
+'''
+# Prompt used when analyzing visual or screen content
+VISION_PROMPT = '''
+Analyze screen-sharing images to identify and describe issues mentioned in conversation history, focusing on the right side of the screen.
+You are an AI assistant with vision capabilities specialized in analyzing screen-sharing images. Your role is to examine images and identify issues or elements discussed in the conversation history, with particular attention to the right side of the screen where your target area is located.
+Steps
+Review Conversation History: Carefully read through the provided conversation history to understand:
+What issue or problem the user is experiencing
+What specific elements, errors, or concerns they've mentioned
+Their goals and what they're trying to accomplish
+Analyze the Image: Examine the provided screen-sharing image with focus on:
+The right side of the screen (primary target area)
+Visual elements that relate to the user's described issue
+Any error messages, UI problems, or anomalies
+Relevant text, buttons, or interface elements
+Identify the Issue: Based on your analysis:
+Locate the specific issue mentioned by the user
+Note its exact position and visual characteristics
+Gather relevant details about the problem
+Report Findings: Provide clear information about:
+Whether you found the issue
+Exact location and description of the problem
+Any relevant surrounding context or related elements
+Output Format
+Provide a structured response containing:
+Issue Found: Yes/No
+Description: Detailed explanation of what you observe
+Recommendation: Brief suggestion if applicable
+If the issue cannot be located, clearly state this and explain what you were able to observe instead.
+Examples
+Example 1:
+Input: [Conversation history shows user reporting an unreachable virtual machine]
+Output:
+Issue Found: Yes
+Description: The screen share shows a HyperV environment. The referenced VM seems to be powered off.
+Recommendation: The user should click on the "Start" button on the lower side of the right column.
+Always prioritize the right side of the screen as specified, but don't ignore relevant information elsewhere if it relates to the issue
+Be specific about visual elements - colors, text, positioning, and states (enabled/disabled, selected/unselected)
+If multiple potential issues are visible, focus on the one most relevant to the conversation history
+Consider common UI issues: missing elements, misalignment, error states, loading problems, or unexpected behavior
+'''
+def get_generic_prompt() -> str:
+    """Return the generic text prompt."""
+    return GENERIC_PROMPT
+def get_vision_prompt() -> str:
+    """Return the vision analysis prompt."""
+    return VISION_PROMPT

config/settings.py ADDED Viewed

	@@ -0,0 +1,84 @@

+import os
+from dataclasses import dataclass
+from typing import Optional
+from pathlib import Path
+from dotenv import load_dotenv
+load_dotenv()
+@dataclass
+class Settings:
+    """Application-wide configuration settings."""
+    # LLM Provider settings
+    llm_provider: str = os.getenv("LLM_PROVIDER", "auto")
+    # Hugging Face settings
+    hf_token: str = os.getenv("HF_TOKEN", "")
+    hf_chat_model: str = os.getenv("HF_CHAT_MODEL", "Qwen/Qwen2.5-7B-Instruct")
+    hf_temperature: float = 0.001
+    hf_max_new_tokens: int = 512
+    # Model settings
+    model_name: str = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-7B-Instruct")
+    # Audio provider settings
+    audio_provider: str = os.getenv("AUDIO_PROVIDER", "auto")
+    tts_model: str = os.getenv("TTS_MODEL", "canopylabs/orpheus-3b-0.1-ft")
+    stt_model: str = os.getenv("STT_MODEL", "openai/whisper-large-v3")
+    # Screen sharing settings
+    screen_capture_interval: float = float(os.getenv("SCREEN_CAPTURE_INTERVAL", "1.0"))
+    screen_compression_quality: int = int(os.getenv("SCREEN_COMPRESSION_QUALITY", "50"))
+    max_width: int = int(os.getenv("SCREEN_MAX_WIDTH", "3440"))
+    max_height: int = int(os.getenv("SCREEN_MAX_HEIGHT", "1440"))
+    NEBIUS_MODEL: str = os.getenv("NEBIUS_MODEL", "google/gemma-3-27b-it")
+    NEBIUS_API_KEY: str = os.getenv("NEBIUS_API_KEY", "Not found")
+    NEBIUS_BASE_URL: str = os.getenv("NEBIUS_BASE_URL", "https://api.studio.nebius.com/v1/")
+    # Hyper-V settings
+    hyperv_enabled: bool = os.getenv("HYPERV_ENABLED", "false").lower() == "true"
+    hyperv_host: str = os.getenv("HYPERV_HOST", "localhost")
+    hyperv_username: Optional[str] = os.getenv("HYPERV_USERNAME")
+    hyperv_password: Optional[str] = os.getenv("HYPERV_PASSWORD")
+    # Application settings
+    max_conversation_history: int = int(os.getenv("MAX_CONVERSATION_HISTORY", "50"))
+    temp_dir: str = os.getenv("TEMP_DIR", "./temp")
+    log_level: str = os.getenv("LOG_LEVEL", "INFO")
+    def __post_init__(self):
+        # Ensure necessary directories exist
+        Path(self.temp_dir).mkdir(exist_ok=True, parents=True)
+        Path("./config").mkdir(exist_ok=True, parents=True)
+        Path("./logs").mkdir(exist_ok=True, parents=True)
+    def is_hf_token_valid(self) -> bool:
+        return bool(self.hf_token and len(self.hf_token) > 10)
+    @property
+    def effective_llm_provider(self) -> str:
+        if self.llm_provider == "auto":
+            return "huggingface" if self.is_hf_token_valid() else "openai"
+        return self.llm_provider
+    @property
+    def effective_audio_provider(self) -> str:
+        if self.audio_provider == "auto":
+            return "huggingface" if self.is_hf_token_valid() else "openai"
+        return self.audio_provider
+    @property
+    def llm_endpoint(self) -> str:
+        if self.effective_llm_provider == "huggingface":
+            return f"https://api-inference.huggingface.co/models/{self.hf_chat_model}"
+        return self.openai_endpoint
+    @property
+    def llm_api_key(self) -> str:
+        return self.hf_token if self.effective_llm_provider == "huggingface" else self.openai_api_key
+    @property
+    def effective_model_name(self) -> str:
+        return self.hf_chat_model if self.effective_llm_provider == "huggingface" else self.model_name

mcp_servers/hyperv_mcp.py ADDED Viewed

	@@ -0,0 +1,319 @@

+#!/usr/bin/env python
+"""
+Standalone FastAPI MCP server for Hyper-V management.
+Server will be available at http://localhost:8000
+"""
+import asyncio
+import json
+import logging
+import subprocess
+import sys
+import platform
+from typing import Dict, Any, List, Optional
+from dataclasses import dataclass, asdict
+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+import uvicorn
+# Ensure SelectorEventLoopPolicy on Windows
+if platform.system() == "Windows":
+    try:
+        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
+    except AttributeError:
+        pass
+# Setup logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger("hyperv_mcp_server")
+# Pydantic models for API
+class ToolCallRequest(BaseModel):
+    name: str
+    arguments: Dict[str, Any] = {}
+class ToolInfo(BaseModel):
+    name: str
+    description: str
+    inputSchema: Dict[str, Any]
+class ToolsListResponse(BaseModel):
+    tools: List[ToolInfo]
+class ToolCallResponse(BaseModel):
+    success: bool
+    result: Any = None
+    error: str = None
+@dataclass
+class VirtualMachine:
+    name: str
+    state: str
+    status: str
+class HyperVManager:
+    def __init__(self, host: str = "localhost", username: Optional[str] = None, password: Optional[str] = None):
+        self.host = host
+        self.username = username
+        self.password = password
+    def _run_powershell(self, command: str) -> str:
+        """Execute PowerShell command and return output"""
+        try:
+            if self.host == "localhost":
+                proc = subprocess.run(
+                    ["powershell", "-Command", command],
+                    capture_output=True, text=True, shell=True, timeout=30
+                )
+                if proc.returncode != 0:
+                    raise RuntimeError(f"PowerShell error: {proc.stderr}")
+                return proc.stdout.strip()
+            else:
+                raise NotImplementedError("Remote host not supported in this server")
+        except subprocess.TimeoutExpired:
+            raise RuntimeError("PowerShell command timed out")
+        except Exception as e:
+            logger.error(f"PowerShell execution error: {e}")
+            raise
+    async def list_vms(self) -> List[Dict[str, Any]]:
+        """List all virtual machines"""
+        try:
+            cmd = (
+                "Get-VM | Select-Object Name,State,Status | "
+                "ConvertTo-Json -Depth 2"
+            )
+            output = await asyncio.get_event_loop().run_in_executor(
+                None, self._run_powershell, cmd
+            )
+            if not output:
+                return []
+            data = json.loads(output)
+            if isinstance(data, dict):
+                data = [data]
+            vms = []
+            for item in data:
+                vm = VirtualMachine(
+                    name=item.get('Name', ''),
+                    state=item.get('State', ''),
+                    status=item.get('Status', ''),
+                )
+                vms.append(asdict(vm))
+            return vms
+        except Exception as e:
+            logger.error(f"Failed to list VMs: {e}")
+            raise
+    async def get_vm_status(self, vm_name: str) -> Dict[str, Any]:
+        """Get status of a specific virtual machine"""
+        try:
+            cmd = (
+                f"$vm = Get-VM -Name '{vm_name}' -ErrorAction Stop; "
+                "$vm | Select-Object Name,State,Status | "
+                "ConvertTo-Json -Depth 2"
+            )
+            output = await asyncio.get_event_loop().run_in_executor(
+                None, self._run_powershell, cmd
+            )
+            if not output:
+                return {}
+            return json.loads(output)
+        except Exception as e:
+            logger.error(f"Failed to get VM status for {vm_name}: {e}")
+            raise
+    async def start_vm(self, vm_name: str) -> Dict[str, Any]:
+        """Start a virtual machine"""
+        try:
+            cmd = f"Start-VM -Name '{vm_name}' -ErrorAction Stop"
+            await asyncio.get_event_loop().run_in_executor(
+                None, self._run_powershell, cmd
+            )
+            return {"success": True, "message": f"VM '{vm_name}' started successfully"}
+        except Exception as e:
+            logger.error(f"Failed to start VM {vm_name}: {e}")
+            raise
+    async def stop_vm(self, vm_name: str, force: bool = False) -> Dict[str, Any]:
+        """Stop a virtual machine"""
+        try:
+            force_flag = "-Force" if force else ""
+            cmd = f"Stop-VM -Name '{vm_name}' {force_flag} -ErrorAction Stop"
+            await asyncio.get_event_loop().run_in_executor(
+                None, self._run_powershell, cmd
+            )
+            return {"success": True, "message": f"VM '{vm_name}' stopped successfully"}
+        except Exception as e:
+            logger.error(f"Failed to stop VM {vm_name}: {e}")
+            raise
+    async def restart_vm(self, vm_name: str, force: bool = False) -> Dict[str, Any]:
+        """Restart a virtual machine"""
+        try:
+            force_flag = "-Force" if force else ""
+            cmd = f"Restart-VM -Name '{vm_name}' {force_flag} -ErrorAction Stop"
+            await asyncio.get_event_loop().run_in_executor(
+                None, self._run_powershell, cmd
+            )
+            return {"success": True, "message": f"VM '{vm_name}' restarted successfully"}
+        except Exception as e:
+            logger.error(f"Failed to restart VM {vm_name}: {e}")
+            raise
+# Initialize FastAPI app and Hyper-V manager
+app = FastAPI(title="Hyper-V MCP Server", version="1.0.0")
+hyperv_manager = HyperVManager()
+# Tool definitions
+TOOLS = {
+    "list_vms": {
+        "name": "list_vms",
+        "description": "List all virtual machines on the Hyper-V host",
+        "inputSchema": {
+            "type": "object",
+            "properties": {},
+            "required": []
+        }
+    },
+    "get_vm_status": {
+        "name": "get_vm_status",
+        "description": "Get detailed status information for a specific virtual machine",
+        "inputSchema": {
+            "type": "object",
+            "properties": {
+                "vm_name": {"type": "string", "description": "Name of the virtual machine"}
+            },
+            "required": ["vm_name"]
+        }
+    },
+    "start_vm": {
+        "name": "start_vm",
+        "description": "Start a virtual machine",
+        "inputSchema": {
+            "type": "object",
+            "properties": {
+                "vm_name": {"type": "string", "description": "Name of the virtual machine to start"}
+            },
+            "required": ["vm_name"]
+        }
+    },
+    "stop_vm": {
+        "name": "stop_vm",
+        "description": "Stop a virtual machine",
+        "inputSchema": {
+            "type": "object",
+            "properties": {
+                "vm_name": {"type": "string", "description": "Name of the virtual machine to stop"},
+                "force": {"type": "boolean", "description": "Force stop the VM", "default": False}
+            },
+            "required": ["vm_name"]
+        }
+    },
+    "restart_vm": {
+        "name": "restart_vm",
+        "description": "Restart a virtual machine",
+        "inputSchema": {
+            "type": "object",
+            "properties": {
+                "vm_name": {"type": "string", "description": "Name of the virtual machine to restart"},
+                "force": {"type": "boolean", "description": "Force restart the VM", "default": False}
+            },
+            "required": ["vm_name"]
+        }
+    },
+}
+# API Endpoints
+@app.get("/")
+async def root():
+    """Health check endpoint"""
+    return {"status": "Hyper-V MCP Server is running", "version": "1.0.0"}
+@app.get("/tools", response_model=ToolsListResponse)
+async def list_tools():
+    """List all available tools"""
+    tools = [ToolInfo(**tool_info) for tool_info in TOOLS.values()]
+    return ToolsListResponse(tools=tools)
+@app.post("/tools/call", response_model=ToolCallResponse)
+async def call_tool(request: ToolCallRequest):
+    """Execute a tool with given arguments"""
+    try:
+        tool_name = request.name
+        arguments = request.arguments
+        if tool_name not in TOOLS:
+            raise HTTPException(status_code=404, detail=f"Tool '{tool_name}' not found")
+        # Get the corresponding method from HyperVManager
+        if not hasattr(hyperv_manager, tool_name):
+            raise HTTPException(status_code=500, detail=f"Method '{tool_name}' not implemented")
+        method = getattr(hyperv_manager, tool_name)
+        # Call the method with arguments
+        if arguments:
+            result = await method(**arguments)
+        else:
+            result = await method()
+        return ToolCallResponse(success=True, result=result)
+    except Exception as e:
+        logger.error(f"Tool execution error: {e}")
+        return ToolCallResponse(success=False, error=str(e))
+# Additional convenience endpoints
+@app.get("/vms")
+async def get_vms():
+    """Convenience endpoint to list VMs"""
+    try:
+        result = await hyperv_manager.list_vms()
+        return {"success": True, "vms": result}
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+@app.get("/vms/{vm_name}")
+async def get_vm(vm_name: str):
+    """Convenience endpoint to get VM status"""
+    try:
+        result = await hyperv_manager.get_vm_status(vm_name)
+        return {"success": True, "vm": result}
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+@app.post("/vms/{vm_name}/start")
+async def start_vm_endpoint(vm_name: str):
+    """Convenience endpoint to start a VM"""
+    try:
+        result = await hyperv_manager.start_vm(vm_name)
+        return result
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+@app.post("/vms/{vm_name}/stop")
+async def stop_vm_endpoint(vm_name: str, force: bool = False):
+    """Convenience endpoint to stop a VM"""
+    try:
+        result = await hyperv_manager.stop_vm(vm_name, force)
+        return result
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+if __name__ == "__main__":
+    print("Starting Hyper-V MCP Server...")
+    print("Server will be available at: http://localhost:8000")
+    print("API documentation at: http://localhost:8000/docs")
+    uvicorn.run(
+        app,
+        host="0.0.0.0",
+        port=8000,
+        log_level="info"
+    )

requirements.txt ADDED Viewed

Binary file (3.51 kB). View file

services/audio_service.py ADDED Viewed

	@@ -0,0 +1,113 @@

+import io
+import base64
+import logging
+import tempfile
+import asyncio
+from typing import Optional, Union
+from pathlib import Path
+from huggingface_hub import InferenceClient
+from config.settings import Settings
+# Configure logger for detailed debugging
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.DEBUG)
+ch = logging.StreamHandler()
+ch.setLevel(logging.DEBUG)
+formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
+ch.setFormatter(formatter)
+logger.addHandler(ch)
+class AudioService:
+    def __init__(
+        self,
+        api_key: str,
+        stt_provider: str = "fal-ai",
+        stt_model: str = "openai/whisper-large-v3",
+        tts_model: str = "canopylabs/orpheus-3b-0.1-ft",
+    ):
+        """
+        AudioService with separate providers for ASR and TTS.
+        :param api_key: Hugging Face API token
+        :param stt_provider: Provider for speech-to-text (e.g., "fal-ai")
+        :param stt_model: ASR model ID
+        :param tts_model: TTS model ID
+        """
+        self.api_key = api_key
+        self.stt_model = stt_model
+        self.tts_model = tts_model
+        # Speech-to-Text client
+        logger.debug(f"Initializing ASR client with provider={stt_provider}")
+        self.asr_client = InferenceClient(
+            provider=stt_provider,
+            api_key=self.api_key,
+        )
+        # Text-to-Speech client (no provider needed, use token parameter)
+        logger.debug(f"Initializing TTS client with default provider")
+        self.tts_client = InferenceClient(token=self.api_key)
+        logger.info(f"AudioService configured: ASR model={self.stt_model} via {stt_provider}, TTS model={self.tts_model} via default provider.")
+    async def speech_to_text(self, audio_file: Union[str, bytes, io.BytesIO]) -> str:
+        """
+        Convert speech to text using the configured ASR provider.
+        """
+        # Prepare input path
+        if isinstance(audio_file, str):
+            input_path = audio_file
+            logger.debug(f"Using existing file for ASR: {input_path}")
+        else:
+            data = audio_file.getvalue() if isinstance(audio_file, io.BytesIO) else audio_file
+            tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
+            tmp.write(data)
+            tmp.close()
+            input_path = tmp.name
+            logger.debug(f"Wrote audio to temp file for ASR: {input_path}")
+        # Call ASR synchronously in executor
+        try:
+            logger.info(f"Calling ASR model={self.stt_model}")
+            result = await asyncio.get_event_loop().run_in_executor(
+                None,
+                lambda: self.asr_client.automatic_speech_recognition(
+                    input_path,
+                    model=self.stt_model,
+                )
+            )
+            # Parse result
+            transcript = result.get("text") if isinstance(result, dict) else getattr(result, "text", "")
+            logger.info(f"ASR success, transcript length={len(transcript)}")
+            logger.debug(f"Transcript preview: {transcript[:100]}")
+            return transcript or ""
+        except Exception as e:
+            logger.error(f"ASR error: {e}", exc_info=True)
+            return ""
+    async def text_to_speech(self, text: str) -> Optional[bytes]:
+        """
+        Convert text to speech using the configured TTS provider.
+        """
+        if not text.strip():
+            logger.debug("Empty text input for TTS. Skipping generation.")
+            return None
+        def _call_tts():
+            """Wrapper function to handle StopIteration properly."""
+            try:
+                return self.tts_client.text_to_speech(text, model=self.tts_model)
+            except StopIteration as e:
+                # Convert StopIteration to RuntimeError to prevent Future issues
+                raise RuntimeError(f"StopIteration in TTS call: {e}")
+        try:
+            logger.info(f"Calling TTS model={self.tts_model}, text length={len(text)}")
+            audio = await asyncio.get_event_loop().run_in_executor(None, _call_tts)
+            logger.info(f"TTS success, received {len(audio)} bytes")
+            return audio
+        except Exception as e:
+            logger.error(f"TTS error: {e}", exc_info=True)
+            return None

services/llm_service.py ADDED Viewed

	@@ -0,0 +1,73 @@

+import logging
+from typing import Dict, List, Optional
+from dataclasses import dataclass
+from huggingface_hub import InferenceClient
+from config.settings import Settings
+# Configure logger for detailed debugging
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.DEBUG)
+ch = logging.StreamHandler()
+ch.setLevel(logging.DEBUG)
+formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
+ch.setFormatter(formatter)
+logger.addHandler(ch)
+@dataclass
+class LLMConfig:
+    api_key: str
+    model_name: str
+    temperature: float = 0.01
+    max_tokens: int = 512
+class LLMService:
+    def __init__(
+        self,
+        api_key: Optional[str] = None,
+        model_name: Optional[str] = None,
+    ):
+        """
+        LLMService that uses HuggingFace InferenceClient for chat completions.
+        """
+        settings = Settings()
+        # Use provided values or fall back to settings
+        key = api_key or settings.hf_token
+        name = model_name or settings.effective_model_name
+        self.config = LLMConfig(
+            api_key=key,
+            model_name=name,
+            temperature=settings.hf_temperature,
+            max_tokens=settings.hf_max_new_tokens,
+        )
+        # Initialize the InferenceClient
+        self.client = InferenceClient(token=self.config.api_key)
+    async def get_chat_completion(self, messages: List[Dict[str, str]]) -> str:
+        """
+        Return the assistant response for a chat-style messages array.
+        """
+        logger.debug(f"Chat completion request with model: {self.config.model_name}")
+        try:
+            # Use chat_completion method
+            response = self.client.chat_completion(
+                messages=messages,
+                model=self.config.model_name,
+                max_tokens=self.config.max_tokens,
+                temperature=self.config.temperature
+            )
+            # Extract the content from the response
+            content = response.choices[0].message.content
+            logger.debug(f"Chat completion response: {content[:200]}")
+            return content
+        except Exception as e:
+            logger.error(f"Chat completion error: {str(e)}")
+            raise Exception(f"HF chat completion error: {str(e)}")

services/screen_service.py ADDED Viewed

	@@ -0,0 +1,215 @@

+import threading
+import queue
+import time
+import base64
+import io
+import logging
+from typing import Callable, Optional, List, Dict
+import mss
+import numpy as np
+from PIL import Image
+from openai import OpenAI
+from config.settings import Settings
+logger = logging.getLogger(__name__)
+class ScreenService:
+    def __init__(
+        self,
+        prompt: str,
+        model: str,
+        fps: float = 0.5,
+        queue_size: int = 2,
+        monitor: int = 1,
+        max_width: int = 3440,
+        max_height: int = 1440,
+        compression_quality: int = 100,
+        image_format: str = "PNG",
+    ):
+        """
+        :param prompt: Vision model instruction
+        :param model: Nebius model name
+        :param fps: Capture frames per second
+        :param queue_size: Internal buffer size
+        :param monitor: MSS monitor index
+        :param max_width/max_height: Max resolution for resizing
+        :param compression_quality: JPEG quality (1-100)
+        :param image_format: "JPEG" or "PNG" (PNG is lossless)
+        """
+        self.prompt = prompt
+        self.model = model
+        self.fps = fps
+        self.queue: queue.Queue = queue.Queue(maxsize=queue_size)
+        self.monitor = monitor
+        self.max_width = max_width
+        self.max_height = max_height
+        self.compression_quality = compression_quality
+        self.image_format = image_format.upper()
+        self._stop_event = threading.Event()
+        self._producer: Optional[threading.Thread] = None
+        self._consumer: Optional[threading.Thread] = None
+        # Nebius client
+        self.client = OpenAI(
+            base_url=Settings.NEBIUS_BASE_URL,
+            api_key=Settings.NEBIUS_API_KEY
+        )
+    def _process_image(self, img: Image.Image) -> Image.Image:
+        # Convert to RGB if needed
+        if img.mode != "RGB":
+            img = img.convert("RGB")
+        w, h = img.size
+        ar = w / h
+        # Resize maintaining aspect ratio if above max
+        if w > self.max_width or h > self.max_height:
+            if ar > 1:
+                new_w = min(w, self.max_width)
+                new_h = int(new_w / ar)
+            else:
+                new_h = min(h, self.max_height)
+                new_w = int(new_h * ar)
+            img = img.resize((new_w, new_h), Image.Resampling.LANCZOS)
+        return img
+    def _image_to_base64(self, img: Image.Image) -> str:
+        buf = io.BytesIO()
+        if self.image_format == "PNG":
+            img.save(buf, format="PNG")
+        else:
+            img.save(
+                buf,
+                format="JPEG",
+                quality=self.compression_quality,
+                optimize=True
+            )
+        data = buf.getvalue()
+        return base64.b64encode(data).decode("utf-8")
+    def _capture_loop(self):
+        with mss.mss() as sct:
+            mon = sct.monitors[self.monitor]
+            interval = 1.0 / self.fps if self.fps > 0 else 0
+            while not self._stop_event.is_set():
+                t0 = time.time()
+                frame = np.array(sct.grab(mon))
+                pil = Image.fromarray(frame)
+                pil = self._process_image(pil)
+                b64 = self._image_to_base64(pil)
+                try:
+                    self.queue.put_nowait((t0, b64))
+                except queue.Full:
+                    self.queue.get_nowait()
+                    self.queue.put_nowait((t0, b64))
+                if interval:
+                    time.sleep(interval)
+    def _flatten_conversation_history(self, history: List[Dict[str, str]]) -> str:
+        """Flatten conversation history into a readable format for the vision model"""
+        if not history:
+            return "No previous conversation."
+        # Filter out system messages and vision outputs to avoid confusion
+        filtered_history = []
+        for msg in history:
+            role = msg.get('role', '')
+            content = msg.get('content', '')
+            # Skip system messages and previous vision outputs
+            if role == 'system':
+                continue
+            if content.startswith('VISION MODEL OUTPUT:'):
+                continue
+            if 'screen' in content.lower() and 'sharing' in content.lower():
+                continue
+            filtered_history.append(msg)
+        # Take only the last 10 exchanges to keep context manageable
+        if len(filtered_history) > 20:  # 10 user + 10 assistant messages
+            filtered_history = filtered_history[-20:]
+        # Format the conversation
+        formatted_lines = []
+        for msg in filtered_history:
+            role = msg.get('role', 'unknown')
+            content = msg.get('content', '')
+            # Truncate very long messages
+            if len(content) > 200:
+                content = content[:200] + "..."
+            if role == 'user':
+                formatted_lines.append(f"User: {content}")
+            elif role == 'assistant':
+                formatted_lines.append(f"Assistant: {content}")
+        return "\n".join(formatted_lines) if formatted_lines else "No relevant conversation history."
+    def _inference_loop(
+        self,
+        callback: Callable[[Dict, float, str], None],
+        history_getter: Callable[[], List[Dict[str, str]]]
+    ):
+        while not self._stop_event.is_set():
+            try:
+                t0, frame_b64 = self.queue.get(timeout=1)
+            except queue.Empty:
+                continue
+            # Get and flatten the conversation history
+            history = history_getter()
+            flattened_history = self._flatten_conversation_history(history)
+            # Create the full prompt with system instructions and conversation context
+            full_prompt = f"{self.prompt}\n\nCONVERSATION CONTEXT:\n{flattened_history}"
+            for i, msg in enumerate(history):
+                content_preview = msg.get('content', '')[:100] + "..." if len(msg.get('content', '')) > 100 else msg.get('content', '')
+            user_message = {
+                "role": "user",
+                "content": [
+                    {"type": "text", "text": full_prompt},
+                    {"type": "image_url", "image_url": {"url": f"data:image/{self.image_format.lower()};base64,{frame_b64}"}}
+                ]
+            }
+            try:
+                resp = self.client.chat.completions.create(
+                    model=self.model,
+                    messages=[user_message]
+                )
+                latency = time.time() - t0
+                callback(resp, latency, frame_b64)
+            except Exception as e:
+                logger.error(f"Nebius inference error: {e}")
+    def start(
+        self,
+        callback: Callable[[Dict, float, str], None],
+        history_getter: Callable[[], List[Dict[str, str]]]
+    ) -> None:
+        if self._producer and self._producer.is_alive():
+            return
+        self._stop_event.clear()
+        self._producer = threading.Thread(target=self._capture_loop, daemon=True)
+        self._consumer = threading.Thread(
+            target=self._inference_loop,
+            args=(callback, history_getter),
+            daemon=True
+        )
+        self._producer.start()
+        self._consumer.start()
+        logger.info("ScreenService started.")
+    def stop(self) -> None:
+        self._stop_event.set()
+        if self._producer:
+            self._producer.join(timeout=1.0)
+        if self._consumer:
+            self._consumer.join(timeout=1.0)
+        logger.info("ScreenService stopped.")