Spaces:

SreekarB
/

SLPAnalysis

Sleeping

App Files Files Community

SreekarB commited on May 28

Commit

a9de5f0

verified ·

1 Parent(s): dd1f25c

Upload 13 files

Browse files

Files changed (11) hide show

README_APPS.md +76 -0
experimental_casl_app.py +1444 -0
full_casl_app.py +684 -0
moderate_casl_app.py +838 -0
reference_files/CLEANUP_PLAN.md +73 -0
reference_files/casl_analysis.py +0 -0
reference_files/copy_of_casl_analysis.py +1491 -0
reference_files/requirements_improved.txt +12 -0
reference_files/simple_app.py +1208 -0
requirements.txt +6 -9
simple_casl_app.py +187 -0

README_APPS.md ADDED Viewed

	@@ -0,0 +1,76 @@

+# 🗣️ CASL Analysis Tool - App Options
+## 📂 Main Applications (Choose One for Deployment)
+### 1. **`simple_casl_app.py`** ⭐ RECOMMENDED
+- **Lines**: 186
+- **Features**: File upload → LLM analysis → Results display
+- **Best for**: Quick deployment, reliable functionality
+- **Dependencies**: Minimal (gradio, boto3)
+- **Complexity**: ⭐ Simple
+### 2. **`moderate_casl_app.py`**
+- **Lines**: 760
+- **Features**: Analysis + Audio transcription + PDF export
+- **Best for**: Balanced features without complexity
+- **Dependencies**: Moderate (+ speech_recognition, reportlab)
+- **Complexity**: ⭐⭐ Moderate
+### 3. **`full_casl_app.py`**
+- **Lines**: 683
+- **Features**: Complete interface + visualizations + records
+- **Best for**: Full-featured deployment
+- **Dependencies**: Full set (+ matplotlib, numpy, pandas)
+- **Complexity**: ⭐⭐⭐ Advanced
+### 4. **`experimental_casl_app.py`**
+- **Lines**: 1443
+- **Features**: Enhanced analytics + patient database + advanced visualizations
+- **Best for**: Research/experimental features
+- **Dependencies**: Extended (+ seaborn, typing)
+- **Complexity**: ⭐⭐⭐⭐ Experimental
+## 📚 Reference Files
+### **`aphasia_analysis_app_code.py`**
+- **Purpose**: Reference implementation with working Bedrock API calls
+- **Contains**: Correct model format, API structure
+- **Use**: Copy Bedrock call patterns from this file
+## 🗂️ Reference Files (Archived)
+Located in `/reference_files/` folder:
+- Original implementations and variations
+- Legacy code for reference
+- Alternative approaches
+## 🚀 Quick Start
+### For HuggingFace Spaces:
+1. **Choose your app** (recommend `simple_casl_app.py`)
+2. **Update README.md**:
+   ```yaml
+   app_file: simple_casl_app.py
+   ```
+3. **Deploy** with `requirements.txt`
+### Local Testing:
+```bash
+python simple_casl_app.py      # Simplest
+python moderate_casl_app.py    # Balanced
+python full_casl_app.py        # Complete
+python experimental_casl_app.py # Advanced
+```
+## 🎯 Deployment Recommendations
+| Use Case | Recommended App | Why |
+|----------|----------------|-----|
+| **Quick Demo** | `simple_casl_app.py` | Fast, reliable, minimal dependencies |
+| **Production** | `moderate_casl_app.py` | Good features, stable |
+| **Research** | `full_casl_app.py` | Complete functionality |
+| **Development** | `experimental_casl_app.py` | Latest features |
+## 📋 Current README.md Configuration
+- Currently points to: `app.py` (needs update)
+- Should point to your chosen app file

experimental_casl_app.py ADDED Viewed

	@@ -0,0 +1,1444 @@

+import gradio as gr
+import boto3
+import json
+import pandas as pd
+import matplotlib.pyplot as plt
+import numpy as np
+import re
+import logging
+import os
+import pickle
+import csv
+from PIL import Image
+import io
+import uuid
+from datetime import datetime
+import tempfile
+import time
+import seaborn as sns
+from typing import Dict, List, Tuple, Optional
+# Try to import ReportLab (needed for PDF generation)
+try:
+    from reportlab.lib.pagesizes import letter, A4
+    from reportlab.lib import colors
+    from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle, Image as RLImage
+    from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
+    from reportlab.lib.units import inch
+    REPORTLAB_AVAILABLE = True
+except ImportError:
+    REPORTLAB_AVAILABLE = False
+# Try to import PyPDF2 (needed for PDF reading)
+try:
+    import PyPDF2
+    PYPDF2_AVAILABLE = True
+except ImportError:
+    PYPDF2_AVAILABLE = False
+# Try to import speech recognition for local audio processing
+try:
+    import speech_recognition as sr
+    import pydub
+    SPEECH_RECOGNITION_AVAILABLE = True
+except ImportError:
+    SPEECH_RECOGNITION_AVAILABLE = False
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# AWS credentials for Bedrock API (optional - app works without AWS)
+AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY", "")
+AWS_SECRET_KEY = os.getenv("AWS_SECRET_KEY", "")
+AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
+# Initialize AWS clients if credentials are available
+bedrock_client = None
+if AWS_ACCESS_KEY and AWS_SECRET_KEY:
+    try:
+        bedrock_client = boto3.client(
+            'bedrock-runtime',
+            aws_access_key_id=AWS_ACCESS_KEY,
+            aws_secret_access_key=AWS_SECRET_KEY,
+            region_name=AWS_REGION
+        )
+        logger.info("Bedrock client initialized successfully")
+    except Exception as e:
+        logger.error(f"Failed to initialize AWS Bedrock client: {str(e)}")
+# Enhanced sample transcripts for different scenarios
+SAMPLE_TRANSCRIPTS = {
+    "Beach Trip (Child)": """*PAR: today I would &-um like to talk about &-um a fun trip I took last &-um summer with my family.
+*PAR: we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually.
+*PAR: there was lots of &-um &-um swimming and &-um sun.
+*PAR: we [/] we stayed for &-um three no [//] four days in a &-um hotel near the water [: ocean] [*].
+*PAR: my favorite part was &-um building &-um castles with sand.
+*PAR: sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built.
+*PAR: my brother he [//] he helped me dig a big hole.
+*PAR: we saw [/] saw fishies [: fish] [*] swimming in the water.
+*PAR: sometimes I wonder [/] wonder where fishies [: fish] [*] go when it's cold.
+*PAR: maybe they have [/] have houses under the water.
+*PAR: after swimming we [//] I eat [: ate] [*] &-um ice cream with &-um chocolate things on top.
+*PAR: what do you call those &-um &-um sprinkles! that's the word.
+*PAR: my mom said to &-um that I could have &-um two scoops next time.
+*PAR: I want to go back to the beach [/] beach next year.""",
+    "School Day (Adolescent)": """*PAR: yesterday was &-um kind of a weird day at school.
+*PAR: I had this big test in math and I was like really nervous about it.
+*PAR: when I got there [//] when I got to class the teacher said we could use calculators.
+*PAR: I was like &-oh &-um that's good because I always mess up the &-um the calculations.
+*PAR: there was this one problem about &-um what do you call it &-um geometry I think.
+*PAR: I couldn't remember the formula for [//] I mean I knew it but I just couldn't think of it.
+*PAR: so I raised my hand and asked the teacher and she was really nice about it.
+*PAR: after the test me and my friends went to lunch and we talked about how we did.
+*PAR: everyone was saying it was hard but I think I did okay.
+*PAR: oh and then in English class we had to read our essays out loud.
+*PAR: I hate doing that because I get really nervous and I start talking fast.
+*PAR: but the teacher said mine was good which made me feel better.""",
+    "Adult Stroke Recovery": """*PAR: I &-um I want to talk about &-uh my &-um recovery.
+*PAR: it's been &-um [//] it's hard to &-um to find the words sometimes.
+*PAR: before the &-um the stroke I was &-um working at the &-uh at the bank.
+*PAR: now I have to &-um practice speaking every day with my therapist.
+*PAR: my wife she [//] she helps me a lot at home.
+*PAR: we do &-um exercises together like &-uh reading and &-um talking about pictures.
+*PAR: sometimes I get frustrated because I know what I want to say but &-um the words don't come out right.
+*PAR: but I'm getting better little by little.
+*PAR: the doctor says I'm making good progress.
+*PAR: I hope to go back to work someday but right now I'm focusing on &-um getting better."""
+}
+# ===============================
+# Database and Storage Functions
+# ===============================
+# Create data directories if they don't exist
+DATA_DIR = os.environ.get("DATA_DIR", "patient_data")
+RECORDS_FILE = os.path.join(DATA_DIR, "patient_records.csv")
+ANALYSES_DIR = os.path.join(DATA_DIR, "analyses")
+DOWNLOADS_DIR = os.path.join(DATA_DIR, "downloads")
+AUDIO_DIR = os.path.join(DATA_DIR, "audio")
+def ensure_data_dirs():
+    """Ensure data directories exist with enhanced error handling"""
+    global DOWNLOADS_DIR, AUDIO_DIR, ANALYSES_DIR, RECORDS_FILE
+    try:
+        os.makedirs(DATA_DIR, exist_ok=True)
+        os.makedirs(ANALYSES_DIR, exist_ok=True)
+        os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        os.makedirs(AUDIO_DIR, exist_ok=True)
+        logger.info(f"Data directories created: {DATA_DIR}")
+        # Create records file if it doesn't exist
+        if not os.path.exists(RECORDS_FILE):
+            with open(RECORDS_FILE, 'w', newline='', encoding='utf-8') as f:
+                writer = csv.writer(f)
+                writer.writerow([
+                    "ID", "Name", "Record ID", "Age", "Gender",
+                    "Assessment Date", "Clinician", "Analysis Date", "File Path",
+                    "Summary Score", "Notes"
+                ])
+    except Exception as e:
+        logger.warning(f"Could not create data directories: {str(e)}")
+        # Fallback to tmp directory for cloud environments
+        temp_base = os.path.join(tempfile.gettempdir(), "casl_data")
+        DOWNLOADS_DIR = os.path.join(temp_base, "downloads")
+        AUDIO_DIR = os.path.join(temp_base, "audio")
+        ANALYSES_DIR = os.path.join(temp_base, "analyses")
+        RECORDS_FILE = os.path.join(temp_base, "patient_records.csv")
+        os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        os.makedirs(AUDIO_DIR, exist_ok=True)
+        os.makedirs(ANALYSES_DIR, exist_ok=True)
+        if not os.path.exists(RECORDS_FILE):
+            with open(RECORDS_FILE, 'w', newline='', encoding='utf-8') as f:
+                writer = csv.writer(f)
+                writer.writerow([
+                    "ID", "Name", "Record ID", "Age", "Gender",
+                    "Assessment Date", "Clinician", "Analysis Date", "File Path",
+                    "Summary Score", "Notes"
+                ])
+        logger.info(f"Using temporary directories: {temp_base}")
+# Initialize data directories
+ensure_data_dirs()
+def save_patient_record(patient_info: Dict, analysis_results: Dict, transcript: str) -> Optional[str]:
+    """Save patient record to storage with enhanced data structure"""
+    try:
+        record_id = str(uuid.uuid4())
+        # Extract patient information
+        name = patient_info.get("name", "")
+        patient_id = patient_info.get("record_id", "")
+        age = patient_info.get("age", "")
+        gender = patient_info.get("gender", "")
+        assessment_date = patient_info.get("assessment_date", "")
+        clinician = patient_info.get("clinician", "")
+        notes = patient_info.get("notes", "")
+        # Calculate summary score (average of CASL domain scores)
+        summary_score = calculate_summary_score(analysis_results)
+        # Create filename for the analysis data
+        filename = f"analysis_{record_id}.pkl"
+        filepath = os.path.join(ANALYSES_DIR, filename)
+        # Save enhanced analysis data
+        analysis_data = {
+            "patient_info": patient_info,
+            "analysis_results": analysis_results,
+            "transcript": transcript,
+            "timestamp": datetime.now().isoformat(),
+            "summary_score": summary_score,
+            "version": "2.0"  # For future compatibility
+        }
+        with open(filepath, 'wb') as f:
+            pickle.dump(analysis_data, f)
+        # Add record to CSV file
+        with open(RECORDS_FILE, 'a', newline='', encoding='utf-8') as f:
+            writer = csv.writer(f)
+            writer.writerow([
+                record_id, name, patient_id, age, gender,
+                assessment_date, clinician, datetime.now().strftime('%Y-%m-%d'),
+                filepath, summary_score, notes
+            ])
+        return record_id
+    except Exception as e:
+        logger.error(f"Error saving patient record: {str(e)}")
+        return None
+def calculate_summary_score(analysis_results: Dict) -> float:
+    """Calculate an overall summary score from CASL domain scores"""
+    try:
+        # Extract CASL scores from results
+        casl_data = analysis_results.get('casl_data', '')
+        scores = []
+        # Look for standard scores in the CASL data
+        score_pattern = r'Standard Score \((\d+)\)'
+        matches = re.findall(score_pattern, casl_data)
+        if matches:
+            scores = [int(score) for score in matches]
+            return round(sum(scores) / len(scores), 1)
+        return 85.0  # Default score if parsing fails
+    except Exception:
+        return 85.0
+def get_all_patient_records() -> List[Dict]:
+    """Return a list of all patient records with enhanced filtering"""
+    try:
+        records = []
+        ensure_data_dirs()
+        if not os.path.exists(RECORDS_FILE):
+            return records
+        with open(RECORDS_FILE, 'r', newline='', encoding='utf-8') as f:
+            reader = csv.reader(f)
+            header = next(reader, None)
+            if not header:
+                return records
+            for row in reader:
+                if len(row) < 9:
+                    continue
+                file_path = row[8] if len(row) > 8 else ""
+                file_exists = os.path.exists(file_path) if file_path else False
+                summary_score = row[9] if len(row) > 9 else "N/A"
+                notes = row[10] if len(row) > 10 else ""
+                record = {
+                    "id": row[0],
+                    "name": row[1],
+                    "record_id": row[2],
+                    "age": row[3],
+                    "gender": row[4],
+                    "assessment_date": row[5],
+                    "clinician": row[6],
+                    "analysis_date": row[7],
+                    "file_path": file_path,
+                    "summary_score": summary_score,
+                    "notes": notes,
+                    "status": "Valid" if file_exists else "Missing File"
+                }
+                records.append(record)
+        # Sort by analysis date (most recent first)
+        records.sort(key=lambda x: x.get('analysis_date', ''), reverse=True)
+        return records
+    except Exception as e:
+        logger.error(f"Error getting patient records: {str(e)}")
+        return []
+# ===============================
+# Enhanced Utility Functions
+# ===============================
+def read_pdf(file_path: str) -> str:
+    """Read text from a PDF file with better error handling"""
+    if not PYPDF2_AVAILABLE:
+        return "Error: PDF reading requires PyPDF2 library. Install with: pip install PyPDF2"
+    try:
+        with open(file_path, 'rb') as file:
+            pdf_reader = PyPDF2.PdfReader(file)
+            text = ""
+            for page_num, page in enumerate(pdf_reader.pages):
+                try:
+                    text += page.extract_text() + "\n"
+                except Exception as e:
+                    logger.warning(f"Error reading page {page_num}: {str(e)}")
+                    continue
+            return text.strip()
+    except Exception as e:
+        logger.error(f"Error reading PDF: {str(e)}")
+        return f"Error reading PDF: {str(e)}"
+def read_cha_file(file_path: str) -> str:
+    """Enhanced CHA file parser with better CHAT format support"""
+    try:
+        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+            content = f.read()
+        # Extract participant lines (starting with *PAR: or *CHI:)
+        participant_lines = []
+        investigator_lines = []
+        for line in content.splitlines():
+            line = line.strip()
+            if line.startswith('*PAR:') or line.startswith('*CHI:'):
+                participant_lines.append(line)
+            elif line.startswith('*INV:') or line.startswith('*EXA:'):
+                investigator_lines.append(line)
+        # Combine participant and investigator lines in chronological order
+        all_lines = []
+        for line in content.splitlines():
+            line = line.strip()
+            if line.startswith('*PAR:') or line.startswith('*CHI:') or line.startswith('*INV:') or line.startswith('*EXA:'):
+                all_lines.append(line)
+        if all_lines:
+            return '\n'.join(all_lines)
+        elif participant_lines:
+            return '\n'.join(participant_lines)
+        else:
+            return content
+    except Exception as e:
+        logger.error(f"Error reading CHA file: {str(e)}")
+        return ""
+def process_upload(file) -> str:
+    """Enhanced file processing with support for multiple formats"""
+    if file is None:
+        return ""
+    file_path = file.name
+    file_ext = os.path.splitext(file_path)[1].lower()
+    try:
+        if file_ext == '.pdf':
+            return read_pdf(file_path)
+        elif file_ext == '.cha':
+            return read_cha_file(file_path)
+        elif file_ext in ['.txt', '.doc', '.docx']:
+            # For .doc/.docx, you might want to add python-docx support
+            with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+                return f.read()
+        else:
+            # Try to read as text file
+            with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+                content = f.read()
+                if len(content.strip()) == 0:
+                    return "Error: File appears to be empty or in an unsupported format."
+                return content
+    except Exception as e:
+        logger.error(f"Error processing uploaded file: {str(e)}")
+        return f"Error reading file: {str(e)}"
+# ===============================
+# Enhanced Audio Processing (Local)
+# ===============================
+def transcribe_audio_local(audio_path: str) -> str:
+    """Local audio transcription using speech_recognition library"""
+    if not SPEECH_RECOGNITION_AVAILABLE:
+        return generate_demo_transcription()
+    try:
+        r = sr.Recognizer()
+        # Convert audio to WAV if needed
+        if not audio_path.endswith('.wav'):
+            try:
+                audio = pydub.AudioSegment.from_file(audio_path)
+                wav_path = audio_path.rsplit('.', 1)[0] + '.wav'
+                audio.export(wav_path, format="wav")
+                audio_path = wav_path
+            except Exception as e:
+                logger.error(f"Error converting audio: {str(e)}")
+                return f"Error: Could not process audio file. {str(e)}"
+        # Transcribe audio
+        with sr.AudioFile(audio_path) as source:
+            audio_data = r.record(source)
+            try:
+                text = r.recognize_google(audio_data)
+                return format_transcription_as_chat(text)
+            except sr.UnknownValueError:
+                return "Error: Could not understand audio"
+            except sr.RequestError as e:
+                return f"Error: Could not request results; {e}"
+    except Exception as e:
+        logger.error(f"Error in local transcription: {str(e)}")
+        return generate_demo_transcription()
+def format_transcription_as_chat(text: str) -> str:
+    """Format transcribed text into CHAT format"""
+    # Split text into sentences and format as participant speech
+    sentences = re.split(r'[.!?]+', text)
+    chat_lines = []
+    for sentence in sentences:
+        sentence = sentence.strip()
+        if sentence:
+            chat_lines.append(f"*PAR: {sentence}.")
+    return '\n'.join(chat_lines)
+def generate_demo_transcription() -> str:
+    """Generate a demo transcription when real transcription isn't available"""
+    return """*PAR: today I want to tell you about my favorite toy.
+*PAR: it's a &-um teddy bear that I got for my birthday.
+*PAR: he has &-um brown fur and a red bow.
+*PAR: I like to sleep with him every night.
+*PAR: sometimes I take him to school in my backpack.
+*INV: what's your teddy bear's name?
+*PAR: his name is &-um Brownie because he's brown.
+*PAR: he makes me feel &-um safe when I'm scared."""
+# ===============================
+# Enhanced AI Analysis Functions
+# ===============================
+def call_bedrock(prompt: str, max_tokens: int = 4096) -> str:
+    """Enhanced Bedrock API call with better error handling"""
+    if not bedrock_client:
+        logger.info("Bedrock client not available, using enhanced demo response")
+        return generate_enhanced_demo_response(prompt)
+    try:
+        body = json.dumps({
+            "anthropic_version": "bedrock-2023-05-31",
+            "max_tokens": max_tokens,
+            "messages": [{"role": "user", "content": prompt}],
+            "temperature": 0.3,
+            "top_p": 0.9
+        })
+        response = bedrock_client.invoke_model(
+            body=body,
+            modelId='anthropic.claude-3-sonnet-20240229-v1:0',
+            accept='application/json',
+            contentType='application/json'
+        )
+        response_body = json.loads(response.get('body').read())
+        return response_body['content'][0]['text']
+    except Exception as e:
+        logger.error(f"Error in call_bedrock: {str(e)}")
+        return generate_enhanced_demo_response(prompt)
+def generate_enhanced_demo_response(prompt: str) -> str:
+    """Generate sophisticated demo responses based on transcript analysis"""
+    # Analyze the transcript in the prompt to generate more realistic responses
+    transcript_match = re.search(r'TRANSCRIPT:\s*(.*?)(?=\n\n|\Z)', prompt, re.DOTALL)
+    transcript = transcript_match.group(1) if transcript_match else ""
+    # Count various speech patterns
+    um_count = len(re.findall(r'&-um|&-uh', transcript))
+    revision_count = len(re.findall(r'\[//\]', transcript))
+    repetition_count = len(re.findall(r'\[/\]', transcript))
+    error_count = len(re.findall(r'\[\*\]', transcript))
+    # Generate scores based on patterns found
+    fluency_score = max(70, 100 - (um_count * 2))
+    syntactic_score = max(70, 100 - (error_count * 3))
+    semantic_score = max(75, 105 - (revision_count * 2))
+    # Convert to percentiles
+    fluency_percentile = int(np.interp(fluency_score, [70, 85, 100, 115], [5, 16, 50, 84]))
+    syntactic_percentile = int(np.interp(syntactic_score, [70, 85, 100, 115], [5, 16, 50, 84]))
+    semantic_percentile = int(np.interp(semantic_score, [70, 85, 100, 115], [5, 16, 50, 84]))
+    # Determine performance levels
+    def get_performance_level(score):
+        if score < 70: return "Well Below Average"
+        elif score < 85: return "Below Average"
+        elif score < 115: return "Average"
+        elif score < 130: return "Above Average"
+        else: return "Well Above Average"
+    response = f"""<SPEECH_FACTORS_START>
+Difficulty producing fluent speech: {um_count + revision_count}, {100 - fluency_percentile}
+Examples:
+- Direct quotes showing disfluencies from transcript
+- Pauses and hesitations noted
+Word retrieval issues: {um_count // 2 + 1}, {90 - semantic_percentile}
+Examples:
+- Word-finding difficulties observed
+- Circumlocutions and fillers
+Grammatical errors: {error_count}, {85 - syntactic_percentile}
+Examples:
+- Morphological and syntactic errors identified
+- Verb tense and agreement issues
+Repetitions and revisions: {repetition_count + revision_count}, {80 - fluency_percentile}
+Examples:
+- Self-corrections and repairs noted
+- Repetitive patterns observed
+<SPEECH_FACTORS_END>
+<CASL_SKILLS_START>
+Lexical/Semantic Skills: Standard Score ({semantic_score}), Percentile Rank ({semantic_percentile}%), {get_performance_level(semantic_score)}
+Examples:
+- Vocabulary usage and word selection patterns
+- Semantic precision and concept expression
+Syntactic Skills: Standard Score ({syntactic_score}), Percentile Rank ({syntactic_percentile}%), {get_performance_level(syntactic_score)}
+Examples:
+- Sentence structure and grammatical accuracy
+- Morphological skill demonstration
+Supralinguistic Skills: Standard Score ({fluency_score}), Percentile Rank ({fluency_percentile}%), {get_performance_level(fluency_score)}
+Examples:
+- Discourse organization and coherence
+- Pragmatic language use and narrative skills
+<CASL_SKILLS_END>
+<TREATMENT_RECOMMENDATIONS_START>
+- Target word-finding strategies with semantic feature analysis and phonemic cuing
+- Implement sentence formulation exercises focusing on grammatical accuracy
+- Practice narrative structure with visual supports and story grammar elements
+- Use self-monitoring techniques to increase awareness of communication breakdowns
+- Incorporate fluency shaping strategies to reduce disfluencies and improve flow
+<TREATMENT_RECOMMENDATIONS_END>
+<EXPLANATION_START>
+The language sample demonstrates patterns consistent with a mild-to-moderate language disorder affecting primarily expressive skills. Word-finding difficulties and syntactic challenges are evident, while overall communicative intent remains clear. The presence of self-corrections indicates good metalinguistic awareness, which is a positive prognostic indicator for treatment.
+<EXPLANATION_END>
+<ADDITIONAL_ANALYSIS_START>
+Strengths include maintained topic coherence and attempt at complex narrative structure. Areas of concern center on retrieval efficiency and grammatical formulation. The pattern suggests intact receptive language with specific expressive challenges that would benefit from targeted intervention focusing on lexical access and syntactic formulation.
+<ADDITIONAL_ANALYSIS_END>
+<DIAGNOSTIC_IMPRESSIONS_START>
+Based on comprehensive analysis, this profile suggests a specific language impairment affecting expressive domains more significantly than receptive abilities. The combination of word-finding difficulties, grammatical errors, and disfluencies indicates need for structured language intervention with focus on lexical organization, syntactic practice, and metacognitive strategy development.
+<DIAGNOSTIC_IMPRESSIONS_END>
+<ERROR_EXAMPLES_START>
+Word-finding difficulties:
+- Examples of circumlocutions and word substitutions
+- Pause patterns before content words
+Grammatical errors:
+- Specific morphological and syntactic errors
+- Verb tense and agreement difficulties
+Fluency disruptions:
+- Repetitions, revisions, and false starts
+- Filled and unfilled pause patterns
+<ERROR_EXAMPLES_END>"""
+    return response
+def parse_casl_response(response: str) -> Dict:
+    """Enhanced parsing of LLM response with better error handling and structure"""
+    # Extract sections using improved regex patterns
+    sections = {
+        'speech_factors': extract_section(response, 'SPEECH_FACTORS'),
+        'casl_data': extract_section(response, 'CASL_SKILLS'),
+        'treatment_suggestions': extract_section(response, 'TREATMENT_RECOMMENDATIONS'),
+        'explanation': extract_section(response, 'EXPLANATION'),
+        'additional_analysis': extract_section(response, 'ADDITIONAL_ANALYSIS'),
+        'diagnostic_impressions': extract_section(response, 'DIAGNOSTIC_IMPRESSIONS'),
+        'specific_errors': extract_section(response, 'ERROR_EXAMPLES')
+    }
+    # Create structured analysis
+    structured_data = process_speech_factors(sections['speech_factors'])
+    casl_structured = process_casl_skills(sections['casl_data'])
+    # Build comprehensive report
+    full_report = build_comprehensive_report(sections)
+    return {
+        'speech_factors': structured_data['dataframe'],
+        'casl_data': casl_structured['dataframe'],
+        'treatment_suggestions': parse_treatment_recommendations(sections['treatment_suggestions']),
+        'explanation': sections['explanation'],
+        'additional_analysis': sections['additional_analysis'],
+        'diagnostic_impressions': sections['diagnostic_impressions'],
+        'specific_errors': structured_data['errors'],
+        'full_report': full_report,
+        'raw_response': response,
+        'summary_scores': casl_structured['summary']
+    }
+def extract_section(text: str, section_name: str) -> str:
+    """Extract content between section markers"""
+    pattern = re.compile(f"<{section_name}_START>(.*?)<{section_name}_END>", re.DOTALL)
+    match = pattern.search(text)
+    return match.group(1).strip() if match else ""
+def process_speech_factors(factors_text: str) -> Dict:
+    """Process speech factors into structured format"""
+    data = {
+        'Factor': [],
+        'Occurrences': [],
+        'Severity': [],
+        'Examples': []
+    }
+    errors = {}
+    lines = factors_text.split('\n')
+    current_factor = None
+    for line in lines:
+        line = line.strip()
+        if not line:
+            continue
+        # Look for factor pattern: "Factor name: count, percentile"
+        factor_match = re.match(r'([^:]+):\s*(\d+),\s*(\d+)', line)
+        if factor_match:
+            factor = factor_match.group(1).strip()
+            occurrences = int(factor_match.group(2))
+            severity = int(factor_match.group(3))
+            data['Factor'].append(factor)
+            data['Occurrences'].append(occurrences)
+            data['Severity'].append(severity)
+            data['Examples'].append("")  # Will be filled later
+            current_factor = factor
+        elif line.startswith('- ') and current_factor:
+            # This is an example for the current factor
+            example = line[2:].strip()
+            if example:
+                # Update the last added example
+                if data['Examples'] and current_factor in data['Factor']:
+                    idx = data['Factor'].index(current_factor)
+                    if not data['Examples'][idx]:
+                        data['Examples'][idx] = example
+                    else:
+                        data['Examples'][idx] += f"; {example}"
+                errors[current_factor] = example
+    return {
+        'dataframe': pd.DataFrame(data),
+        'errors': errors
+    }
+def process_casl_skills(casl_text: str) -> Dict:
+    """Process CASL skills into structured format"""
+    data = {
+        'Domain': ['Lexical/Semantic', 'Syntactic', 'Supralinguistic'],
+        'Standard Score': [85, 85, 85],  # Default values
+        'Percentile': [16, 16, 16],
+        'Performance Level': ['Below Average', 'Below Average', 'Below Average'],
+        'Examples': ['', '', '']
+    }
+    lines = casl_text.split('\n')
+    for line in lines:
+        line = line.strip()
+        if not line:
+            continue
+        # Look for domain scores
+        score_match = re.search(r'(Lexical/Semantic|Syntactic|Supralinguistic)\s+Skills:\s+Standard Score \((\d+)\),\s+Percentile Rank \((\d+)%\),\s+(.+)', line)
+        if score_match:
+            domain = score_match.group(1)
+            score = int(score_match.group(2))
+            percentile = int(score_match.group(3))
+            level = score_match.group(4).strip()
+            if domain == 'Lexical/Semantic':
+                idx = 0
+            elif domain == 'Syntactic':
+                idx = 1
+            elif domain == 'Supralinguistic':
+                idx = 2
+            else:
+                continue
+            data['Standard Score'][idx] = score
+            data['Percentile'][idx] = percentile
+            data['Performance Level'][idx] = level
+    # Calculate summary statistics
+    avg_score = sum(data['Standard Score']) / len(data['Standard Score'])
+    avg_percentile = sum(data['Percentile']) / len(data['Percentile'])
+    return {
+        'dataframe': pd.DataFrame(data),
+        'summary': {
+            'average_score': round(avg_score, 1),
+            'average_percentile': round(avg_percentile, 1),
+            'overall_level': get_performance_level(avg_score)
+        }
+    }
+def get_performance_level(score: float) -> str:
+    """Determine performance level from standard score"""
+    if score < 70:
+        return "Well Below Average"
+    elif score < 85:
+        return "Below Average"
+    elif score < 115:
+        return "Average"
+    elif score < 130:
+        return "Above Average"
+    else:
+        return "Well Above Average"
+def parse_treatment_recommendations(treatment_text: str) -> List[str]:
+    """Parse treatment recommendations into a list"""
+    recommendations = []
+    lines = treatment_text.split('\n')
+    for line in lines:
+        line = line.strip()
+        if line.startswith('- '):
+            recommendations.append(line[2:])
+        elif line.startswith('• '):
+            recommendations.append(line[2:])
+        elif line and not line.startswith('#'):
+            recommendations.append(line)
+    return [rec for rec in recommendations if rec]
+def build_comprehensive_report(sections: Dict) -> str:
+    """Build a comprehensive formatted report"""
+    report = """# Speech Language Assessment Report
+## Speech Factors Analysis
+{speech_factors}
+## CASL Skills Assessment
+{casl_data}
+## Treatment Recommendations
+{treatment_suggestions}
+## Clinical Explanation
+{explanation}
+""".format(**sections)
+    if sections['additional_analysis']:
+        report += f"\n## Additional Analysis\n\n{sections['additional_analysis']}"
+    if sections['diagnostic_impressions']:
+        report += f"\n## Diagnostic Impressions\n\n{sections['diagnostic_impressions']}"
+    if sections['specific_errors']:
+        report += f"\n## Detailed Error Examples\n\n{sections['specific_errors']}"
+    return report
+def create_enhanced_visualizations(speech_factors_df: pd.DataFrame, casl_data_df: pd.DataFrame) -> plt.Figure:
+    """Create enhanced visualizations with better styling"""
+    # Set professional styling
+    plt.style.use('default')
+    sns.set_palette("husl")
+    fig = plt.figure(figsize=(15, 10))
+    # Create a 2x2 grid
+    gs = fig.add_gridspec(2, 2, hspace=0.3, wspace=0.3)
+    # Speech factors bar chart
+    ax1 = fig.add_subplot(gs[0, 0])
+    if not speech_factors_df.empty:
+        factors_sorted = speech_factors_df.sort_values('Occurrences', ascending=True)
+        bars = ax1.barh(factors_sorted['Factor'], factors_sorted['Occurrences'],
+                       color=sns.color_palette("viridis", len(factors_sorted)))
+        ax1.set_title('Speech Factors Frequency', fontsize=12, fontweight='bold')
+        ax1.set_xlabel('Occurrences')
+        # Add value labels
+        for i, bar in enumerate(bars):
+            width = bar.get_width()
+            ax1.text(width + 0.1, bar.get_y() + bar.get_height()/2,
+                    f'{width:.0f}', ha='left', va='center')
+    # CASL scores
+    ax2 = fig.add_subplot(gs[0, 1])
+    if not casl_data_df.empty:
+        bars = ax2.bar(casl_data_df['Domain'], casl_data_df['Standard Score'],
+                      color=sns.color_palette("muted", len(casl_data_df)))
+        ax2.set_title('CASL Domain Scores', fontsize=12, fontweight='bold')
+        ax2.set_ylabel('Standard Score')
+        ax2.axhline(y=100, color='red', linestyle='--', alpha=0.7, label='Average (100)')
+        ax2.axhline(y=85, color='orange', linestyle='--', alpha=0.7, label='Below Average (85)')
+        ax2.legend()
+        # Add score labels
+        for i, bar in enumerate(bars):
+            height = bar.get_height()
+            ax2.text(bar.get_x() + bar.get_width()/2, height + 1,
+                    f'{height:.0f}', ha='center', va='bottom')
+    # Severity heatmap
+    ax3 = fig.add_subplot(gs[1, :])
+    if not speech_factors_df.empty:
+        # Create a severity matrix
+        severity_data = speech_factors_df[['Factor', 'Severity']].set_index('Factor')
+        severity_matrix = severity_data.T
+        im = ax3.imshow([severity_data['Severity'].values], cmap='RdYlBu_r', aspect='auto')
+        ax3.set_xticks(range(len(severity_data)))
+        ax3.set_xticklabels(severity_data.index, rotation=45, ha='right')
+        ax3.set_yticks([])
+        ax3.set_title('Severity Percentiles (Higher = More Severe)', fontsize=12, fontweight='bold')
+        # Add colorbar
+        cbar = plt.colorbar(im, ax=ax3, orientation='horizontal', pad=0.1, shrink=0.8)
+        cbar.set_label('Severity Percentile')
+        # Add text annotations
+        for i, severity in enumerate(severity_data['Severity'].values):
+            ax3.text(i, 0, f'{severity}%', ha='center', va='center',
+                    color='white' if severity > 50 else 'black', fontweight='bold')
+    plt.tight_layout()
+    return fig
+def analyze_transcript_enhanced(transcript: str, age: int, gender: str) -> Dict:
+    """Enhanced transcript analysis with comprehensive assessment"""
+    # Enhanced CASL analysis prompt
+    prompt = f"""
+    You are an expert speech-language pathologist conducting a comprehensive CASL-2 assessment.
+    Analyze this transcript for a {age}-year-old {gender} patient.
+    TRANSCRIPT:
+    {transcript}
+    Provide a detailed analysis following this exact format with specific section markers:
+    <SPEECH_FACTORS_START>
+    [For each factor, provide: Factor name: count, severity_percentile
+    Then list 2-3 specific examples with "- " bullets]
+    Difficulty producing fluent speech: X, Y
+    Examples:
+    - "exact quote from transcript"
+    - "another exact quote"
+    Word retrieval issues: X, Y
+    Examples:
+    - "exact quote showing word-finding difficulty"
+    - "another example"
+    [Continue for all relevant factors...]
+    <SPEECH_FACTORS_END>
+    <CASL_SKILLS_START>
+    Lexical/Semantic Skills: Standard Score (X), Percentile Rank (Y%), Performance Level
+    Examples:
+    - "specific example of vocabulary use"
+    Syntactic Skills: Standard Score (X), Percentile Rank (Y%), Performance Level
+    Examples:
+    - "specific grammatical pattern example"
+    Supralinguistic Skills: Standard Score (X), Percentile Rank (Y%), Performance Level
+    Examples:
+    - "discourse organization example"
+    <CASL_SKILLS_END>
+    <TREATMENT_RECOMMENDATIONS_START>
+    - Specific, actionable treatment recommendation
+    - Another targeted intervention strategy
+    - Additional therapeutic approach
+    <TREATMENT_RECOMMENDATIONS_END>
+    <EXPLANATION_START>
+    Comprehensive clinical explanation of findings and their significance.
+    <EXPLANATION_END>
+    <ADDITIONAL_ANALYSIS_START>
+    Additional insights for treatment planning and prognosis.
+    <ADDITIONAL_ANALYSIS_END>
+    <DIAGNOSTIC_IMPRESSIONS_START>
+    Summary of diagnostic findings with specific evidence and recommendations.
+    <DIAGNOSTIC_IMPRESSIONS_END>
+    <ERROR_EXAMPLES_START>
+    Organized listing of all specific error examples by category.
+    <ERROR_EXAMPLES_END>
+    Be sure to:
+    1. Use exact quotes from the transcript as evidence
+    2. Provide realistic standard scores (70-130 range, mean=100, SD=15)
+    3. Calculate appropriate percentiles
+    4. Give specific, evidence-based treatment recommendations
+    5. Consider the patient's age and developmental expectations
+    """
+    # Get analysis from AI or demo
+    response = call_bedrock(prompt)
+    # Parse and structure the response
+    results = parse_casl_response(response)
+    return results
+# ===============================
+# Enhanced PDF Export Functions
+# ===============================
+def export_enhanced_pdf(results: Dict, patient_info: Dict) -> str:
+    """Create enhanced PDF report with professional styling"""
+    if not REPORTLAB_AVAILABLE:
+        return "ERROR: PDF export requires ReportLab library. Install with: pip install reportlab"
+    try:
+        # Generate filename
+        patient_name = patient_info.get("name", "Unknown")
+        safe_name = re.sub(r'[^\w\s-]', '', patient_name).strip()
+        if not safe_name:
+            safe_name = f"analysis_{datetime.now().strftime('%Y%m%d%H%M%S')}"
+        ensure_data_dirs()
+        pdf_path = os.path.join(DOWNLOADS_DIR, f"{safe_name}_CASL_Report.pdf")
+        # Create document with better styling
+        doc = SimpleDocTemplate(pdf_path, pagesize=A4,
+                               rightMargin=72, leftMargin=72,
+                               topMargin=72, bottomMargin=18)
+        styles = getSampleStyleSheet()
+        # Custom styles
+        title_style = ParagraphStyle(
+            'CustomTitle',
+            parent=styles['Heading1'],
+            fontSize=18,
+            spaceAfter=30,
+            alignment=1,  # Center
+            textColor=colors.navy
+        )
+        heading_style = ParagraphStyle(
+            'CustomHeading',
+            parent=styles['Heading2'],
+            fontSize=14,
+            spaceAfter=12,
+            textColor=colors.darkblue,
+            borderWidth=1,
+            borderColor=colors.lightgrey,
+            borderPadding=5,
+            backColor=colors.lightgrey
+        )
+        story = []
+        # Title page
+        story.append(Paragraph("COMPREHENSIVE SPEECH-LANGUAGE ASSESSMENT", title_style))
+        story.append(Paragraph("CASL-2 Analysis Report", styles['Heading2']))
+        story.append(Spacer(1, 20))
+        # Patient information table
+        patient_data = []
+        for key, value in patient_info.items():
+            if value:
+                display_key = key.replace('_', ' ').title()
+                patient_data.append([display_key + ":", str(value)])
+        if patient_data:
+            patient_table = Table(patient_data, colWidths=[150, 300])
+            patient_table.setStyle(TableStyle([
+                ('BACKGROUND', (0, 0), (0, -1), colors.lightgrey),
+                ('TEXTCOLOR', (0, 0), (0, -1), colors.black),
+                ('ALIGN', (0, 0), (0, -1), 'RIGHT'),
+                ('FONTNAME', (0, 0), (0, -1), 'Helvetica-Bold'),
+                ('FONTSIZE', (0, 0), (-1, -1), 10),
+                ('GRID', (0, 0), (-1, -1), 1, colors.black),
+                ('VALIGN', (0, 0), (-1, -1), 'TOP'),
+            ]))
+            story.append(patient_table)
+            story.append(Spacer(1, 20))
+        # Add sections
+        sections = [
+            ("Speech Factors Analysis", results.get('speech_factors', pd.DataFrame())),
+            ("CASL Skills Assessment", results.get('casl_data', pd.DataFrame())),
+            ("Treatment Recommendations", results.get('treatment_suggestions', [])),
+            ("Clinical Explanation", results.get('explanation', "")),
+            ("Additional Analysis", results.get('additional_analysis', "")),
+            ("Diagnostic Impressions", results.get('diagnostic_impressions', ""))
+        ]
+        for section_title, content in sections:
+            story.append(Paragraph(section_title, heading_style))
+            if isinstance(content, pd.DataFrame) and not content.empty:
+                # Convert DataFrame to table
+                table_data = [content.columns.tolist()] + content.values.tolist()
+                table = Table(table_data)
+                table.setStyle(TableStyle([
+                    ('BACKGROUND', (0, 0), (-1, 0), colors.grey),
+                    ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
+                    ('ALIGN', (0, 0), (-1, -1), 'LEFT'),
+                    ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
+                    ('FONTSIZE', (0, 0), (-1, -1), 9),
+                    ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
+                    ('BACKGROUND', (0, 1), (-1, -1), colors.beige),
+                    ('GRID', (0, 0), (-1, -1), 1, colors.black)
+                ]))
+                story.append(table)
+            elif isinstance(content, list):
+                for item in content:
+                    story.append(Paragraph(f"• {item}", styles['Normal']))
+            elif isinstance(content, str) and content:
+                story.append(Paragraph(content, styles['Normal']))
+            story.append(Spacer(1, 12))
+        # Footer
+        story.append(Spacer(1, 30))
+        footer_text = f"Report generated on {datetime.now().strftime('%B %d, %Y at %I:%M %p')}"
+        story.append(Paragraph(footer_text, styles['Normal']))
+        # Build PDF
+        doc.build(story)
+        logger.info(f"Enhanced PDF report saved: {pdf_path}")
+        return pdf_path
+    except Exception as e:
+        logger.error(f"Error creating enhanced PDF: {str(e)}")
+        return f"Error creating PDF: {str(e)}"
+# ===============================
+# Enhanced Gradio Interface
+# ===============================
+def create_enhanced_interface():
+    """Create the enhanced Gradio interface with improved UX"""
+    # Custom CSS for better styling
+    custom_css = """
+    .gradio-container {
+        font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
+    }
+    .tab-nav {
+        background-color: #f8f9fa;
+    }
+    .output-markdown {
+        background-color: #f8f9fa;
+        border: 1px solid #dee2e6;
+        border-radius: 0.375rem;
+        padding: 1rem;
+    }
+    """
+    with gr.Blocks(title="Enhanced CASL Analysis Tool", css=custom_css, theme=gr.themes.Soft()) as app:
+        gr.Markdown("""
+        # 🗣️ Enhanced CASL Analysis Tool
+        **Comprehensive Assessment of Spoken Language (CASL-2)**
+        Professional speech-language assessment tool with advanced analytics and reporting capabilities.
+        """)
+        with gr.Tabs() as main_tabs:
+            # Enhanced Analysis Tab
+            with gr.TabItem("📊 Analysis", id=0):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### 👤 Patient Information")
+                        patient_name = gr.Textbox(
+                            label="Patient Name",
+                            placeholder="Enter patient name"
+                        )
+                        record_id = gr.Textbox(
+                            label="Medical Record ID",
+                            placeholder="Enter medical record ID"
+                        )
+                        with gr.Row():
+                            age = gr.Number(
+                                label="Age (years)",
+                                value=8,
+                                minimum=1,
+                                maximum=120
+                            )
+                            gender = gr.Radio(
+                                ["male", "female", "other"],
+                                label="Gender",
+                                value="male"
+                            )
+                        assessment_date = gr.Textbox(
+                            label="Assessment Date",
+                            placeholder="MM/DD/YYYY",
+                            value=datetime.now().strftime('%m/%d/%Y')
+                        )
+                        clinician_name = gr.Textbox(
+                            label="Clinician Name",
+                            placeholder="Enter clinician name"
+                        )
+                        clinical_notes = gr.Textbox(
+                            label="Clinical Notes",
+                            placeholder="Additional observations or context",
+                            lines=2
+                        )
+                        gr.Markdown("### 📝 Speech Transcript")
+                        # Sample transcript selection
+                        sample_selector = gr.Dropdown(
+                            choices=list(SAMPLE_TRANSCRIPTS.keys()),
+                            label="Load Sample Transcript"
+                        )
+                        file_upload = gr.File(
+                            label="Upload Transcript File",
+                            file_types=[".txt", ".cha", ".pdf"]
+                        )
+                        transcript = gr.Textbox(
+                            label="Speech Transcript (CHAT format preferred)",
+                            placeholder="Enter or upload transcript...",
+                            lines=12
+                        )
+                        with gr.Row():
+                            analyze_btn = gr.Button(
+                                "🔍 Analyze Transcript",
+                                variant="primary"
+                            )
+                            save_record_btn = gr.Button(
+                                "💾 Save Record",
+                                variant="secondary"
+                            )
+                    with gr.Column(scale=1):
+                        gr.Markdown("### 📈 Analysis Results")
+                        # Results tabs
+                        with gr.Tabs():
+                            with gr.TabItem("📋 Report"):
+                                analysis_output = gr.Markdown(
+                                    label="Analysis Report"
+                                )
+                            with gr.TabItem("📊 Visualizations"):
+                                plot_output = gr.Plot(
+                                    label="Analysis Plots"
+                                )
+                            with gr.TabItem("📑 Data Tables"):
+                                with gr.Row():
+                                    factors_table = gr.Dataframe(
+                                        label="Speech Factors",
+                                        interactive=False
+                                    )
+                                with gr.Row():
+                                    casl_table = gr.Dataframe(
+                                        label="CASL Domain Scores",
+                                        interactive=False
+                                    )
+                        # Export options
+                        gr.Markdown("### 📤 Export Options")
+                        with gr.Row():
+                            if REPORTLAB_AVAILABLE:
+                                export_pdf_btn = gr.Button(
+                                    "📄 Export PDF Report",
+                                    variant="secondary"
+                                )
+                            else:
+                                gr.Markdown("⚠️ PDF export unavailable - install ReportLab")
+                            export_csv_btn = gr.Button(
+                                "📊 Export Data (CSV)",
+                                variant="secondary"
+                            )
+                        export_status = gr.Markdown("")
+            # Enhanced Transcription Tab
+            with gr.TabItem("🎤 Transcription", id=1):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### 🎵 Audio Processing")
+                        gr.Markdown("""
+                        Upload audio recordings for automatic transcription.
+                        Supports various audio formats and provides CHAT-formatted output.
+                        """)
+                        transcription_age = gr.Number(
+                            label="Patient Age",
+                            value=8,
+                            minimum=1,
+                            maximum=120
+                        )
+                        audio_input = gr.Audio(
+                            type="filepath",
+                            label="Audio Recording"
+                        )
+                        transcribe_btn = gr.Button(
+                            "🎧 Transcribe Audio",
+                            variant="primary"
+                        )
+                    with gr.Column(scale=1):
+                        transcription_output = gr.Textbox(
+                            label="Transcription Result",
+                            placeholder="Transcribed text will appear here...",
+                            lines=15
+                        )
+                        transcription_status = gr.Markdown("")
+                        with gr.Row():
+                            copy_to_analysis_btn = gr.Button(
+                                "📋 Use for Analysis",
+                                variant="secondary"
+                            )
+                            save_transcription_btn = gr.Button(
+                                "💾 Save Transcription",
+                                variant="secondary"
+                            )
+            # Enhanced Records Management Tab
+            with gr.TabItem("📚 Records", id=2):
+                gr.Markdown("### 🗃️ Patient Records Management")
+                with gr.Row():
+                    refresh_records_btn = gr.Button(
+                        "🔄 Refresh Records",
+                        variant="secondary"
+                    )
+                    delete_record_btn = gr.Button(
+                        "🗑️ Delete Selected",
+                        variant="stop"
+                    )
+                records_table = gr.Dataframe(
+                    label="Patient Records",
+                    headers=["ID", "Name", "Age", "Gender", "Date", "Clinician", "Score", "Status"],
+                    interactive=True,
+                    wrap=True
+                )
+                selected_record_info = gr.Markdown("")
+                with gr.Row():
+                    load_record_btn = gr.Button(
+                        "📂 Load Selected Record",
+                        variant="primary"
+                    )
+                    export_records_btn = gr.Button(
+                        "📊 Export All Records",
+                        variant="secondary"
+                    )
+        # ===============================
+        # Event Handlers
+        # ===============================
+        def load_sample_transcript(sample_name):
+            if sample_name in SAMPLE_TRANSCRIPTS:
+                return SAMPLE_TRANSCRIPTS[sample_name]
+            return ""
+        def perform_analysis(transcript_text, age_val, gender_val, name, record_id, clinician, assessment_date, notes):
+            if not transcript_text or len(transcript_text.strip()) < 20:
+                return "❌ Error: Please provide a longer transcript (at least 20 characters)", None, None, None
+            try:
+                # Perform enhanced analysis
+                results = analyze_transcript_enhanced(transcript_text, age_val, gender_val)
+                # Create visualizations
+                if not results['speech_factors'].empty or not results['casl_data'].empty:
+                    fig = create_enhanced_visualizations(results['speech_factors'], results['casl_data'])
+                else:
+                    fig = None
+                return (
+                    results['full_report'],
+                    fig,
+                    results['speech_factors'],
+                    results['casl_data']
+                )
+            except Exception as e:
+                logger.exception("Error during analysis")
+                return f"❌ Error during analysis: {str(e)}", None, None, None
+        def save_patient_record_handler(name, record_id, age_val, gender_val, assessment_date, clinician, notes, transcript_text, analysis_report):
+            if not name or not transcript_text or not analysis_report:
+                return "❌ Error: Missing required information for saving record"
+            try:
+                patient_info = {
+                    "name": name,
+                    "record_id": record_id,
+                    "age": age_val,
+                    "gender": gender_val,
+                    "assessment_date": assessment_date,
+                    "clinician": clinician,
+                    "notes": notes
+                }
+                # For saving, we need to re-parse the analysis
+                # This is a simplified version - in practice you'd store the full results
+                results = {"full_report": analysis_report}
+                saved_id = save_patient_record(patient_info, results, transcript_text)
+                if saved_id:
+                    return f"✅ Record saved successfully! ID: {saved_id}"
+                else:
+                    return "❌ Error: Failed to save record"
+            except Exception as e:
+                return f"❌ Error saving record: {str(e)}"
+        def transcribe_audio_handler(audio_path, age_val):
+            if not audio_path:
+                return "Please upload an audio file first.", "❌ No audio file provided"
+            try:
+                result = transcribe_audio_local(audio_path)
+                if SPEECH_RECOGNITION_AVAILABLE:
+                    status = "✅ Transcription completed using local speech recognition"
+                else:
+                    status = "ℹ️ Demo transcription (install speech_recognition for real transcription)"
+                return result, status
+            except Exception as e:
+                error_msg = f"❌ Transcription failed: {str(e)}"
+                return f"Error: {str(e)}", error_msg
+        def load_records():
+            records = get_all_patient_records()
+            if not records:
+                return []
+            # Format for display
+            display_records = []
+            for record in records:
+                display_records.append([
+                    record['id'][:8] + "...",  # Truncated ID
+                    record['name'],
+                    record['age'],
+                    record['gender'],
+                    record['assessment_date'],
+                    record['clinician'],
+                    record.get('summary_score', 'N/A'),
+                    record['status']
+                ])
+            return display_records
+        # Connect event handlers
+        sample_selector.change(load_sample_transcript, sample_selector, transcript)
+        file_upload.upload(process_upload, file_upload, transcript)
+        analyze_btn.click(
+            perform_analysis,
+            inputs=[transcript, age, gender, patient_name, record_id, clinician_name, assessment_date, clinical_notes],
+            outputs=[analysis_output, plot_output, factors_table, casl_table]
+        )
+        save_record_btn.click(
+            save_patient_record_handler,
+            inputs=[patient_name, record_id, age, gender, assessment_date, clinician_name, clinical_notes, transcript, analysis_output],
+            outputs=[export_status]
+        )
+        transcribe_btn.click(
+            transcribe_audio_handler,
+            inputs=[audio_input, transcription_age],
+            outputs=[transcription_output, transcription_status]
+        )
+        copy_to_analysis_btn.click(
+            lambda x: (x, gr.update(selected=0)),
+            inputs=[transcription_output],
+            outputs=[transcript, main_tabs]
+        )
+        refresh_records_btn.click(
+            load_records,
+            outputs=[records_table]
+        )
+        # Load records on startup
+        app.load(load_records, outputs=[records_table])
+    return app
+if __name__ == "__main__":
+    # Check dependencies and provide helpful messages
+    missing_deps = []
+    if not REPORTLAB_AVAILABLE:
+        missing_deps.append("reportlab (for PDF export)")
+    if not PYPDF2_AVAILABLE:
+        missing_deps.append("PyPDF2 (for PDF reading)")
+    if not SPEECH_RECOGNITION_AVAILABLE:
+        missing_deps.append("speech_recognition & pydub (for audio transcription)")
+    if missing_deps:
+        print("📋 Optional dependencies not found:")
+        for dep in missing_deps:
+            print(f"  - {dep}")
+        print("\nThe app will work with reduced functionality. Install missing packages for full features.")
+    if not AWS_ACCESS_KEY or not AWS_SECRET_KEY:
+        print("ℹ️  AWS credentials not configured - using demo mode for AI analysis.")
+        print("   Set AWS_ACCESS_KEY and AWS_SECRET_KEY environment variables for full functionality.")
+    print("🚀 Starting Enhanced CASL Analysis Tool...")
+    app = create_enhanced_interface()
+    app.launch(
+        show_api=False,
+        server_name="0.0.0.0",  # For cloud deployment
+        server_port=7860,       # Standard Gradio port
+        share=False
+    )

full_casl_app.py ADDED Viewed

	@@ -0,0 +1,684 @@

+import gradio as gr
+import boto3
+import json
+import numpy as np
+import re
+import logging
+import os
+from datetime import datetime
+import tempfile
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Try to import optional dependencies
+try:
+    from reportlab.lib.pagesizes import letter
+    from reportlab.lib import colors
+    from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
+    from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
+    REPORTLAB_AVAILABLE = True
+except ImportError:
+    REPORTLAB_AVAILABLE = False
+    logger.info("ReportLab not available - PDF export disabled")
+try:
+    import speech_recognition as sr
+    import pydub
+    SPEECH_RECOGNITION_AVAILABLE = True
+except ImportError:
+    SPEECH_RECOGNITION_AVAILABLE = False
+    logger.info("Speech recognition not available - audio transcription will use demo mode")
+# AWS credentials (optional)
+AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY", "")
+AWS_SECRET_KEY = os.getenv("AWS_SECRET_KEY", "")
+AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
+# Initialize AWS client if available
+bedrock_client = None
+if AWS_ACCESS_KEY and AWS_SECRET_KEY:
+    try:
+        bedrock_client = boto3.client(
+            'bedrock-runtime',
+            aws_access_key_id=AWS_ACCESS_KEY,
+            aws_secret_access_key=AWS_SECRET_KEY,
+            region_name=AWS_REGION
+        )
+        logger.info("Bedrock client initialized successfully")
+    except Exception as e:
+        logger.error(f"Failed to initialize AWS Bedrock client: {str(e)}")
+else:
+    logger.info("AWS credentials not configured - using demo mode")
+# Data directories
+DATA_DIR = os.environ.get("DATA_DIR", "patient_data")
+def ensure_data_dirs():
+    """Ensure data directories exist"""
+    try:
+        os.makedirs(DATA_DIR, exist_ok=True)
+        logger.info(f"Data directories created: {DATA_DIR}")
+    except Exception as e:
+        logger.warning(f"Could not create data directories: {str(e)}")
+        logger.info("Using temporary directory for data storage")
+ensure_data_dirs()
+# Sample transcripts
+SAMPLE_TRANSCRIPTS = {
+    "Beach Trip (Child)": """*PAR: today I would &-um like to talk about &-um a fun trip I took last &-um summer with my family.
+*PAR: we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually.
+*PAR: there was lots of &-um &-um swimming and &-um sun.
+*PAR: we [/] we stayed for &-um three no [//] four days in a &-um hotel near the water [: ocean] [*].
+*PAR: my favorite part was &-um building &-um castles with sand.
+*PAR: sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built.
+*PAR: my brother he [//] he helped me dig a big hole.
+*PAR: we saw [/] saw fishies [: fish] [*] swimming in the water.
+*PAR: sometimes I wonder [/] wonder where fishies [: fish] [*] go when it's cold.
+*PAR: maybe they have [/] have houses under the water.
+*PAR: after swimming we [//] I eat [: ate] [*] &-um ice cream with &-um chocolate things on top.
+*PAR: what do you call those &-um &-um sprinkles! that's the word.
+*PAR: my mom said to &-um that I could have &-um two scoops next time.
+*PAR: I want to go back to the beach [/] beach next year.""",
+    "School Day (Adolescent)": """*PAR: yesterday was &-um kind of a weird day at school.
+*PAR: I had this big test in math and I was like really nervous about it.
+*PAR: when I got there [//] when I got to class the teacher said we could use calculators.
+*PAR: I was like &-oh &-um that's good because I always mess up the &-um the calculations.
+*PAR: there was this one problem about &-um what do you call it &-um geometry I think.
+*PAR: I couldn't remember the formula for [//] I mean I knew it but I just couldn't think of it.
+*PAR: so I raised my hand and asked the teacher and she was really nice about it.
+*PAR: after the test me and my friends went to lunch and we talked about how we did.
+*PAR: everyone was saying it was hard but I think I did okay.
+*PAR: oh and then in English class we had to read our essays out loud.
+*PAR: I hate doing that because I get really nervous and I start talking fast.
+*PAR: but the teacher said mine was good which made me feel better.""",
+    "Adult Recovery": """*PAR: I &-um I want to talk about &-uh my &-um recovery.
+*PAR: it's been &-um [//] it's hard to &-um to find the words sometimes.
+*PAR: before the &-um the stroke I was &-um working at the &-uh at the bank.
+*PAR: now I have to &-um practice speaking every day with my therapist.
+*PAR: my wife she [//] she helps me a lot at home.
+*PAR: we do &-um exercises together like &-uh reading and &-um talking about pictures.
+*PAR: sometimes I get frustrated because I know what I want to say but &-um the words don't come out right.
+*PAR: but I'm getting better little by little.
+*PAR: the doctor says I'm making good progress.
+*PAR: I hope to go back to work someday but right now I'm focusing on &-um getting better."""
+}
+def call_bedrock(prompt, max_tokens=4096):
+    """Call AWS Bedrock API with correct format or return demo response"""
+    if not bedrock_client:
+        return generate_demo_response(prompt)
+    try:
+        body = json.dumps({
+            "anthropic_version": "bedrock-2023-05-31",
+            "max_tokens": max_tokens,
+            "top_k": 250,
+            "stop_sequences": [],
+            "temperature": 0.3,
+            "top_p": 0.9,
+            "messages": [
+                {
+                    "role": "user",
+                    "content": [
+                        {
+                            "type": "text",
+                            "text": prompt
+                        }
+                    ]
+                }
+            ]
+        })
+        # Use the correct model ID
+        modelId = 'anthropic.claude-3-5-sonnet-20240620-v1:0'
+        response = bedrock_client.invoke_model(
+            body=body,
+            modelId=modelId,
+            accept='application/json',
+            contentType='application/json'
+        )
+        response_body = json.loads(response.get('body').read())
+        return response_body['content'][0]['text']
+    except Exception as e:
+        logger.error(f"Error calling Bedrock: {str(e)}")
+        return generate_demo_response(prompt)
+def generate_demo_response(prompt):
+    """Generate demo analysis response based on transcript patterns"""
+    # Extract transcript from prompt
+    transcript_match = re.search(r'TRANSCRIPT:\s*(.*?)(?=\n\n|\Z)', prompt, re.DOTALL)
+    transcript = transcript_match.group(1) if transcript_match else ""
+    # Count speech patterns
+    um_count = len(re.findall(r'&-um|&-uh', transcript))
+    revision_count = len(re.findall(r'\[//\]', transcript))
+    repetition_count = len(re.findall(r'\[/\]', transcript))
+    error_count = len(re.findall(r'\[\*\]', transcript))
+    # Generate realistic scores based on patterns
+    fluency_score = max(70, 100 - (um_count * 2))
+    syntactic_score = max(70, 100 - (error_count * 3))
+    semantic_score = max(75, 105 - (revision_count * 2))
+    # Convert to percentiles
+    fluency_percentile = int(np.interp(fluency_score, [70, 85, 100, 115], [5, 16, 50, 84]))
+    syntactic_percentile = int(np.interp(syntactic_score, [70, 85, 100, 115], [5, 16, 50, 84]))
+    semantic_percentile = int(np.interp(semantic_score, [70, 85, 100, 115], [5, 16, 50, 84]))
+    def get_performance_level(score):
+        if score < 70: return "Well Below Average"
+        elif score < 85: return "Below Average"
+        elif score < 115: return "Average"
+        else: return "Above Average"
+    return f"""<SPEECH_FACTORS_START>
+Difficulty producing fluent speech: {um_count + revision_count}, {100 - fluency_percentile}
+Examples:
+- Frequent use of fillers (&-um, &-uh) observed throughout transcript
+- Self-corrections and revisions interrupt speech flow
+Word retrieval issues: {um_count // 2 + 1}, {90 - semantic_percentile}
+Examples:
+- Hesitations and pauses before content words noted
+- Circumlocutions and word-finding difficulties evident
+Grammatical errors: {error_count}, {85 - syntactic_percentile}
+Examples:
+- Morphological errors marked with [*] in transcript
+- Verb tense and agreement inconsistencies observed
+Repetitions and revisions: {repetition_count + revision_count}, {80 - fluency_percentile}
+Examples:
+- Self-corrections marked with [//] throughout sample
+- Word and phrase repetitions marked with [/] noted
+<SPEECH_FACTORS_END>
+<CASL_SKILLS_START>
+Lexical/Semantic Skills: Standard Score ({semantic_score}), Percentile Rank ({semantic_percentile}%), {get_performance_level(semantic_score)}
+Examples:
+- Vocabulary diversity and semantic precision assessed
+- Word-finding strategies and retrieval patterns analyzed
+Syntactic Skills: Standard Score ({syntactic_score}), Percentile Rank ({syntactic_percentile}%), {get_performance_level(syntactic_score)}
+Examples:
+- Sentence structure complexity and grammatical accuracy evaluated
+- Morphological skill development measured
+Supralinguistic Skills: Standard Score ({fluency_score}), Percentile Rank ({fluency_percentile}%), {get_performance_level(fluency_score)}
+Examples:
+- Discourse organization and narrative coherence reviewed
+- Pragmatic language use and communication effectiveness assessed
+<CASL_SKILLS_END>
+<TREATMENT_RECOMMENDATIONS_START>
+- Implement word-finding strategies with semantic feature analysis and phonemic cuing
+- Practice sentence formulation exercises targeting grammatical accuracy and complexity
+- Use narrative structure activities with visual supports to improve discourse organization
+- Incorporate self-monitoring techniques to increase awareness of speech patterns
+- Apply fluency shaping strategies to reduce disfluencies and improve communication flow
+<TREATMENT_RECOMMENDATIONS_END>
+<EXPLANATION_START>
+The language sample demonstrates patterns consistent with expressive language challenges affecting fluency, word retrieval, and syntactic formulation. The presence of self-corrections indicates preserved metalinguistic awareness, which is a positive prognostic indicator. Intervention should focus on strengthening lexical access, grammatical formulation, and discourse-level skills while building on existing self-monitoring abilities.
+<EXPLANATION_END>"""
+def parse_casl_response(response):
+    """Parse structured response into components"""
+    def extract_section(text, section_name):
+        pattern = re.compile(f"<{section_name}_START>(.*?)<{section_name}_END>", re.DOTALL)
+        match = pattern.search(text)
+        return match.group(1).strip() if match else ""
+    sections = {
+        'speech_factors': extract_section(response, 'SPEECH_FACTORS'),
+        'casl_data': extract_section(response, 'CASL_SKILLS'),
+        'treatment_suggestions': extract_section(response, 'TREATMENT_RECOMMENDATIONS'),
+        'explanation': extract_section(response, 'EXPLANATION')
+    }
+    # Build formatted report
+    full_report = f"""# Speech Language Assessment Report
+## Speech Factors Analysis
+{sections['speech_factors']}
+## CASL Skills Assessment
+{sections['casl_data']}
+## Treatment Recommendations
+{sections['treatment_suggestions']}
+## Clinical Explanation
+{sections['explanation']}
+"""
+    return {
+        'speech_factors': sections['speech_factors'],
+        'casl_data': sections['casl_data'],
+        'treatment_suggestions': sections['treatment_suggestions'],
+        'explanation': sections['explanation'],
+        'full_report': full_report,
+        'raw_response': response
+    }
+def analyze_transcript(transcript, age, gender):
+    """Analyze transcript using CASL framework"""
+    prompt = f"""
+    You are an expert speech-language pathologist conducting a comprehensive CASL-2 assessment.
+    Analyze this transcript for a {age}-year-old {gender} patient.
+    TRANSCRIPT:
+    {transcript}
+    Provide detailed analysis in this exact format:
+    <SPEECH_FACTORS_START>
+    Difficulty producing fluent speech: X, Y
+    Examples:
+    - "exact quote from transcript showing disfluency"
+    - "another example with specific evidence"
+    Word retrieval issues: X, Y
+    Examples:
+    - "quote showing word-finding difficulty"
+    - "example of circumlocution or pause"
+    Grammatical errors: X, Y
+    Examples:
+    - "quote showing morphological error"
+    - "example of syntactic difficulty"
+    Repetitions and revisions: X, Y
+    Examples:
+    - "quote showing self-correction"
+    - "example of repetition or revision"
+    <SPEECH_FACTORS_END>
+    <CASL_SKILLS_START>
+    Lexical/Semantic Skills: Standard Score (X), Percentile Rank (Y%), Performance Level
+    Examples:
+    - "specific vocabulary usage example"
+    - "semantic precision demonstration"
+    Syntactic Skills: Standard Score (X), Percentile Rank (Y%), Performance Level
+    Examples:
+    - "grammatical structure example"
+    - "morphological skill demonstration"
+    Supralinguistic Skills: Standard Score (X), Percentile Rank (Y%), Performance Level
+    Examples:
+    - "discourse organization example"
+    - "narrative coherence demonstration"
+    <CASL_SKILLS_END>
+    <TREATMENT_RECOMMENDATIONS_START>
+    - Specific, evidence-based treatment recommendation
+    - Another targeted intervention strategy
+    - Additional therapeutic approach with clear rationale
+    <TREATMENT_RECOMMENDATIONS_END>
+    <EXPLANATION_START>
+    Comprehensive clinical explanation of findings, their significance for diagnosis and prognosis, and relationship to functional communication needs.
+    <EXPLANATION_END>
+    Requirements:
+    1. Use exact quotes from the transcript as evidence
+    2. Provide realistic standard scores (70-130 range, mean=100, SD=15)
+    3. Calculate appropriate percentiles based on age norms
+    4. Give specific, actionable treatment recommendations
+    5. Consider developmental expectations for the patient's age
+    """
+    response = call_bedrock(prompt)
+    return parse_casl_response(response)
+def process_upload(file):
+    """Process uploaded transcript file"""
+    if file is None:
+        return ""
+    file_path = file.name
+    file_ext = os.path.splitext(file_path)[1].lower()
+    try:
+        if file_ext == '.cha':
+            # Process CHAT format file
+            with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+                content = f.read()
+            # Extract participant lines
+            par_lines = []
+            inv_lines = []
+            for line in content.splitlines():
+                line = line.strip()
+                if line.startswith('*PAR:') or line.startswith('*CHI:'):
+                    par_lines.append(line)
+                elif line.startswith('*INV:') or line.startswith('*EXA:'):
+                    inv_lines.append(line)
+            # Combine all relevant lines
+            all_lines = []
+            for line in content.splitlines():
+                line = line.strip()
+                if any(line.startswith(prefix) for prefix in ['*PAR:', '*CHI:', '*INV:', '*EXA:']):
+                    all_lines.append(line)
+            return '\n'.join(all_lines) if all_lines else content
+        else:
+            # Read as plain text
+            with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+                return f.read()
+    except Exception as e:
+        logger.error(f"Error reading uploaded file: {str(e)}")
+        return f"Error reading file: {str(e)}"
+def transcribe_audio(audio_path):
+    """Transcribe audio file to CHAT format"""
+    if not audio_path:
+        return "Please upload an audio file first.", "❌ No audio file provided"
+    if SPEECH_RECOGNITION_AVAILABLE:
+        try:
+            r = sr.Recognizer()
+            # Convert to WAV if needed
+            wav_path = audio_path
+            if not audio_path.endswith('.wav'):
+                try:
+                    audio = pydub.AudioSegment.from_file(audio_path)
+                    wav_path = audio_path.rsplit('.', 1)[0] + '.wav'
+                    audio.export(wav_path, format="wav")
+                except Exception as e:
+                    logger.warning(f"Audio conversion failed: {e}")
+            # Transcribe
+            with sr.AudioFile(wav_path) as source:
+                audio_data = r.record(source)
+                text = r.recognize_google(audio_data)
+                # Format as CHAT
+                sentences = re.split(r'[.!?]+', text)
+                chat_lines = []
+                for sentence in sentences:
+                    sentence = sentence.strip()
+                    if sentence:
+                        chat_lines.append(f"*PAR: {sentence}.")
+                result = '\n'.join(chat_lines)
+                return result, "✅ Transcription completed successfully"
+        except sr.UnknownValueError:
+            return "Could not understand audio clearly", "❌ Speech not recognized"
+        except sr.RequestError as e:
+            return f"Error with speech recognition service: {e}", "❌ Service error"
+        except Exception as e:
+            logger.error(f"Transcription error: {e}")
+            return f"Error during transcription: {str(e)}", f"❌ Transcription failed"
+    else:
+        # Demo transcription
+        demo_text = """*PAR: this is a demonstration transcription.
+*PAR: to enable real audio processing install speech_recognition and pydub.
+*PAR: the demo shows how transcribed text would appear in CHAT format."""
+        return demo_text, "ℹ️ Demo mode - install speech_recognition for real audio processing"
+def create_interface():
+    """Create the main Gradio interface"""
+    with gr.Blocks(title="CASL Analysis Tool", theme=gr.themes.Soft()) as app:
+        gr.Markdown("""
+        # 🗣️ CASL Analysis Tool
+        **Comprehensive Assessment of Spoken Language (CASL-2)**
+        Professional speech-language assessment tool for clinical practice and research.
+        Supports transcript analysis, audio transcription, and comprehensive reporting.
+        """)
+        with gr.Tabs():
+            # Main Analysis Tab
+            with gr.TabItem("📊 Analysis"):
+                with gr.Row():
+                    with gr.Column():
+                        gr.Markdown("### 👤 Patient Information")
+                        patient_name = gr.Textbox(
+                            label="Patient Name",
+                            placeholder="Enter patient name"
+                        )
+                        record_id = gr.Textbox(
+                            label="Medical Record ID",
+                            placeholder="Enter medical record ID"
+                        )
+                        with gr.Row():
+                            age = gr.Number(
+                                label="Age (years)",
+                                value=8,
+                                minimum=1,
+                                maximum=120
+                            )
+                            gender = gr.Radio(
+                                ["male", "female", "other"],
+                                label="Gender",
+                                value="male"
+                            )
+                        assessment_date = gr.Textbox(
+                            label="Assessment Date",
+                            placeholder="MM/DD/YYYY",
+                            value=datetime.now().strftime('%m/%d/%Y')
+                        )
+                        clinician_name = gr.Textbox(
+                            label="Clinician Name",
+                            placeholder="Enter clinician name"
+                        )
+                        gr.Markdown("### 📝 Speech Transcript")
+                        sample_selector = gr.Dropdown(
+                            choices=list(SAMPLE_TRANSCRIPTS.keys()),
+                            label="Load Sample Transcript",
+                            placeholder="Choose a sample to load"
+                        )
+                        file_upload = gr.File(
+                            label="Upload Transcript File",
+                            file_types=[".txt", ".cha"]
+                        )
+                        transcript = gr.Textbox(
+                            label="Speech Transcript (CHAT format preferred)",
+                            placeholder="Enter transcript text or load from samples/file...",
+                            lines=12
+                        )
+                        analyze_btn = gr.Button(
+                            "🔍 Analyze Transcript",
+                            variant="primary"
+                        )
+                    with gr.Column():
+                        gr.Markdown("### 📈 Analysis Results")
+                        analysis_output = gr.Markdown(
+                            label="Comprehensive CASL Analysis Report",
+                            value="Analysis results will appear here after clicking 'Analyze Transcript'..."
+                        )
+                        gr.Markdown("### 📤 Export Options")
+                        if REPORTLAB_AVAILABLE:
+                            export_btn = gr.Button("📄 Export as PDF", variant="secondary")
+                            export_status = gr.Markdown("")
+                        else:
+                            gr.Markdown("⚠️ PDF export unavailable (ReportLab not installed)")
+            # Audio Transcription Tab
+            with gr.TabItem("🎤 Audio Transcription"):
+                with gr.Row():
+                    with gr.Column():
+                        gr.Markdown("### 🎵 Audio Processing")
+                        gr.Markdown("""
+                        Upload audio recordings for automatic transcription into CHAT format.
+                        Supports common audio formats (.wav, .mp3, .m4a, .ogg, etc.)
+                        """)
+                        audio_input = gr.Audio(
+                            type="filepath",
+                            label="Audio Recording"
+                        )
+                        transcribe_btn = gr.Button(
+                            "🎧 Transcribe Audio",
+                            variant="primary"
+                        )
+                    with gr.Column():
+                        transcription_output = gr.Textbox(
+                            label="Transcription Result (CHAT Format)",
+                            placeholder="Transcribed text will appear here...",
+                            lines=15
+                        )
+                        transcription_status = gr.Markdown("")
+                        copy_to_analysis_btn = gr.Button(
+                            "📋 Use for Analysis",
+                            variant="secondary"
+                        )
+            # Information Tab
+            with gr.TabItem("ℹ️ About"):
+                gr.Markdown("""
+                ## About the CASL Analysis Tool
+                This tool provides comprehensive speech-language assessment using the CASL-2 (Comprehensive Assessment of Spoken Language) framework.
+                ### Features:
+                - **Speech Factor Analysis**: Automated detection of disfluencies, word retrieval issues, grammatical errors, and repetitions
+                - **CASL-2 Domains**: Assessment of Lexical/Semantic, Syntactic, and Supralinguistic skills
+                - **Professional Scoring**: Standard scores, percentiles, and performance levels
+                - **Audio Transcription**: Convert speech recordings to CHAT format transcripts
+                - **Treatment Recommendations**: Evidence-based intervention suggestions
+                ### Supported Formats:
+                - **Text Files**: .txt format with manual transcript entry
+                - **CHAT Files**: .cha format following CHILDES conventions
+                - **Audio Files**: .wav, .mp3, .m4a, .ogg for automatic transcription
+                ### CHAT Format Guidelines:
+                - Use `*PAR:` for patient utterances
+                - Use `*INV:` for investigator/clinician utterances
+                - Mark filled pauses as `&-um`, `&-uh`
+                - Mark repetitions with `[/]`
+                - Mark revisions with `[//]`
+                - Mark errors with `[*]`
+                ### Usage Tips:
+                1. Load a sample transcript to see the expected format
+                2. Enter patient information for context-appropriate analysis
+                3. Upload or type transcript in CHAT format for best results
+                4. Review analysis results and treatment recommendations
+                5. Export professional PDF reports for clinical documentation
+                ### Technical Notes:
+                - **Demo Mode**: Works without external dependencies using simulated analysis
+                - **Enhanced Mode**: Requires AWS Bedrock credentials for AI-powered analysis
+                - **Audio Processing**: Requires speech_recognition library for real transcription
+                - **PDF Export**: Requires ReportLab library for professional reports
+                For support or questions, please refer to the documentation.
+                """)
+        # Event Handlers
+        def load_sample_transcript(sample_name):
+            """Load selected sample transcript"""
+            if sample_name and sample_name in SAMPLE_TRANSCRIPTS:
+                return SAMPLE_TRANSCRIPTS[sample_name]
+            return ""
+        def perform_analysis(transcript_text, age_val, gender_val):
+            """Perform CASL analysis on transcript"""
+            if not transcript_text or len(transcript_text.strip()) < 20:
+                return "❌ **Error**: Please provide a longer transcript (minimum 20 characters) for meaningful analysis."
+            try:
+                # Perform analysis
+                results = analyze_transcript(transcript_text, age_val, gender_val)
+                return results['full_report']
+            except Exception as e:
+                logger.exception("Analysis error")
+                return f"❌ **Error during analysis**: {str(e)}\n\nPlease check your transcript format and try again."
+        def copy_transcription_to_analysis(transcription_text):
+            """Copy transcription result to analysis tab"""
+            return transcription_text
+        # Connect event handlers
+        sample_selector.change(
+            load_sample_transcript,
+            inputs=[sample_selector],
+            outputs=[transcript]
+        )
+        file_upload.upload(
+            process_upload,
+            inputs=[file_upload],
+            outputs=[transcript]
+        )
+        analyze_btn.click(
+            perform_analysis,
+            inputs=[transcript, age, gender],
+            outputs=[analysis_output]
+        )
+        transcribe_btn.click(
+            transcribe_audio,
+            inputs=[audio_input],
+            outputs=[transcription_output, transcription_status]
+        )
+        copy_to_analysis_btn.click(
+            copy_transcription_to_analysis,
+            inputs=[transcription_output],
+            outputs=[transcript]
+        )
+    return app
+# Create and launch the application
+if __name__ == "__main__":
+    # Check for optional dependencies
+    missing_deps = []
+    if not REPORTLAB_AVAILABLE:
+        missing_deps.append("reportlab (for PDF export)")
+    if not SPEECH_RECOGNITION_AVAILABLE:
+        missing_deps.append("speech_recognition & pydub (for audio transcription)")
+    if missing_deps:
+        print("📋 Optional dependencies not found:")
+        for dep in missing_deps:
+            print(f"  - {dep}")
+        print("The app will work with reduced functionality.")
+    if not bedrock_client:
+        print("ℹ️  AWS credentials not configured - using demo mode for analysis.")
+        print("   Configure AWS_ACCESS_KEY and AWS_SECRET_KEY for enhanced AI analysis.")
+    print("🚀 Starting CASL Analysis Tool...")
+    # Create and launch the app
+    app = create_interface()
+    app.launch(
+        show_api=False,
+        server_name="0.0.0.0",
+        server_port=7860
+    )

moderate_casl_app.py ADDED Viewed

	@@ -0,0 +1,838 @@

+import gradio as gr
+import boto3
+import json
+import re
+import logging
+import os
+import tempfile
+import shutil
+import time
+import uuid
+from datetime import datetime
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Try to import ReportLab (needed for PDF generation)
+try:
+    from reportlab.lib.pagesizes import letter
+    from reportlab.lib import colors
+    from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
+    from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
+    REPORTLAB_AVAILABLE = True
+except ImportError:
+    logger.warning("ReportLab library not available - PDF export will be disabled")
+    REPORTLAB_AVAILABLE = False
+# Try to import speech recognition for local audio processing
+try:
+    import speech_recognition as sr
+    import pydub
+    SPEECH_RECOGNITION_AVAILABLE = True
+except ImportError:
+    SPEECH_RECOGNITION_AVAILABLE = False
+# AWS credentials for Bedrock API and S3
+AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY", "")
+AWS_SECRET_KEY = os.getenv("AWS_SECRET_KEY", "")
+AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
+S3_BUCKET = os.getenv("S3_BUCKET", "casl-audio-uploads")
+# Initialize AWS clients if credentials are available
+bedrock_client = None
+s3_client = None
+if AWS_ACCESS_KEY and AWS_SECRET_KEY:
+    try:
+        # Initialize Bedrock client for AI analysis
+        bedrock_client = boto3.client(
+            'bedrock-runtime',
+            aws_access_key_id=AWS_ACCESS_KEY,
+            aws_secret_access_key=AWS_SECRET_KEY,
+            region_name=AWS_REGION
+        )
+        # Initialize S3 client for audio file storage
+        s3_client = boto3.client(
+            's3',
+            aws_access_key_id=AWS_ACCESS_KEY,
+            aws_secret_access_key=AWS_SECRET_KEY,
+            region_name=AWS_REGION
+        )
+        logger.info("AWS clients initialized successfully")
+    except Exception as e:
+        logger.error(f"Failed to initialize AWS clients: {str(e)}")
+# Create data directories if they don't exist
+DATA_DIR = os.environ.get("DATA_DIR", "patient_data")
+DOWNLOADS_DIR = os.path.join(DATA_DIR, "downloads")
+AUDIO_DIR = os.path.join(DATA_DIR, "audio")
+def ensure_data_dirs():
+    """Ensure data directories exist"""
+    global DOWNLOADS_DIR, AUDIO_DIR
+    try:
+        os.makedirs(DATA_DIR, exist_ok=True)
+        os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        os.makedirs(AUDIO_DIR, exist_ok=True)
+        logger.info(f"Data directories created: {DATA_DIR}, {DOWNLOADS_DIR}, {AUDIO_DIR}")
+    except Exception as e:
+        logger.warning(f"Could not create data directories: {str(e)}")
+        # Fallback to tmp directory on HF Spaces
+        DOWNLOADS_DIR = os.path.join(tempfile.gettempdir(), "casl_downloads")
+        AUDIO_DIR = os.path.join(tempfile.gettempdir(), "casl_audio")
+        os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        os.makedirs(AUDIO_DIR, exist_ok=True)
+        logger.info(f"Using fallback directories: {DOWNLOADS_DIR}, {AUDIO_DIR}")
+# Initialize data directories
+ensure_data_dirs()
+# Sample transcript for the demo
+SAMPLE_TRANSCRIPT = """*PAR: today I would &-um like to talk about &-um a fun trip I took last &-um summer with my family.
+*PAR: we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually.
+*PAR: there was lots of &-um &-um swimming and &-um sun.
+*PAR: we [/] we stayed for &-um three no [//] four days in a &-um hotel near the water [: ocean] [*].
+*PAR: my favorite part was &-um building &-um castles with sand.
+*PAR: sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built.
+*PAR: my brother he [//] he helped me dig a big hole.
+*PAR: we saw [/] saw fishies [: fish] [*] swimming in the water.
+*PAR: sometimes I wonder [/] wonder where fishies [: fish] [*] go when it's cold.
+*PAR: maybe they have [/] have houses under the water.
+*PAR: after swimming we [//] I eat [: ate] [*] &-um ice cream with &-um chocolate things on top.
+*PAR: what do you call those &-um &-um sprinkles! that's the word.
+*PAR: my mom said to &-um that I could have &-um two scoops next time.
+*PAR: I want to go back to the beach [/] beach next year."""
+def read_cha_file(file_path):
+    """Read and parse a .cha transcript file"""
+    try:
+        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+            content = f.read()
+        # Extract participant lines (starting with *PAR:)
+        par_lines = []
+        for line in content.splitlines():
+            if line.startswith('*PAR:'):
+                par_lines.append(line)
+        # If no PAR lines found, just return the whole content
+        if not par_lines:
+            return content
+        return '\n'.join(par_lines)
+    except Exception as e:
+        logger.error(f"Error reading CHA file: {str(e)}")
+        return ""
+def process_upload(file):
+    """Process an uploaded file (PDF, text, or CHA)"""
+    if file is None:
+        return ""
+    file_path = file.name
+    if file_path.endswith('.pdf'):
+        # For PDF, we would need PyPDF2 or similar
+        return "PDF upload not supported in this simple version"
+    elif file_path.endswith('.cha'):
+        return read_cha_file(file_path)
+    else:
+        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+            return f.read()
+def call_bedrock(prompt, max_tokens=4096):
+    """Call the AWS Bedrock API to analyze text using Claude"""
+    if not bedrock_client:
+        return "AWS credentials not configured. Using demo response instead."
+    try:
+        body = json.dumps({
+            "anthropic_version": "bedrock-2023-05-31",
+            "max_tokens": max_tokens,
+            "messages": [
+                {
+                    "role": "user",
+                    "content": prompt
+                }
+            ],
+            "temperature": 0.3,
+            "top_p": 0.9
+        })
+        modelId = 'anthropic.claude-3-sonnet-20240229-v1:0'
+        response = bedrock_client.invoke_model(
+            body=body,
+            modelId=modelId,
+            accept='application/json',
+            contentType='application/json'
+        )
+        response_body = json.loads(response.get('body').read())
+        return response_body['content'][0]['text']
+    except Exception as e:
+        logger.error(f"Error in call_bedrock: {str(e)}")
+        return f"Error: {str(e)}"
+def upload_to_s3(file_path, file_name):
+    """Upload a file to S3 bucket"""
+    if not s3_client:
+        logger.warning("S3 client not available")
+        return None
+    try:
+        s3_key = f"audio/{datetime.now().strftime('%Y-%m-%d')}/{file_name}"
+        s3_client.upload_file(file_path, S3_BUCKET, s3_key)
+        # Generate a presigned URL that expires in 1 hour
+        url = s3_client.generate_presigned_url(
+            'get_object',
+            Params={'Bucket': S3_BUCKET, 'Key': s3_key},
+            ExpiresIn=3600
+        )
+        logger.info(f"File uploaded to S3: {s3_key}")
+        return {'s3_key': s3_key, 'url': url}
+    except Exception as e:
+        logger.error(f"Error uploading to S3: {str(e)}")
+        return None
+def save_audio_file(audio_path, patient_name="", record_id=""):
+    """Save audio file locally and optionally to S3"""
+    try:
+        # Generate unique filename
+        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
+        filename_base = f"{patient_name}_{record_id}_{timestamp}" if patient_name and record_id else f"audio_{timestamp}"
+        # Get file extension
+        file_ext = os.path.splitext(audio_path)[1] or '.wav'
+        final_filename = f"{filename_base}{file_ext}"
+        # Save locally
+        local_path = os.path.join(AUDIO_DIR, final_filename)
+        shutil.copy2(audio_path, local_path)
+        # Upload to S3 if available
+        s3_info = None
+        if s3_client:
+            s3_info = upload_to_s3(local_path, final_filename)
+        return {
+            'local_path': local_path,
+            'filename': final_filename,
+            's3_info': s3_info
+        }
+    except Exception as e:
+        logger.error(f"Error saving audio file: {str(e)}")
+        return None
+def transcribe_audio_local(audio_path, patient_name="", record_id=""):
+    """Local audio transcription using speech_recognition library with S3 storage"""
+    if not SPEECH_RECOGNITION_AVAILABLE:
+        return generate_demo_transcription()
+    try:
+        # Save the audio file (locally and to S3)
+        saved_file_info = save_audio_file(audio_path, patient_name, record_id)
+        if saved_file_info:
+            logger.info(f"Audio saved: {saved_file_info['filename']}")
+            if saved_file_info['s3_info']:
+                logger.info(f"Audio uploaded to S3: {saved_file_info['s3_info']['s3_key']}")
+        r = sr.Recognizer()
+        # Convert audio to WAV if needed
+        if not audio_path.endswith('.wav'):
+            try:
+                audio = pydub.AudioSegment.from_file(audio_path)
+                wav_path = audio_path.rsplit('.', 1)[0] + '.wav'
+                audio.export(wav_path, format="wav")
+                audio_path = wav_path
+            except Exception as e:
+                logger.error(f"Error converting audio: {str(e)}")
+                return f"Error: Could not process audio file. {str(e)}"
+        # Transcribe audio
+        with sr.AudioFile(audio_path) as source:
+            audio_data = r.record(source)
+            try:
+                text = r.recognize_google(audio_data)
+                return format_transcription_as_chat(text)
+            except sr.UnknownValueError:
+                return "Error: Could not understand audio"
+            except sr.RequestError as e:
+                return f"Error: Could not request results; {e}"
+    except Exception as e:
+        logger.error(f"Error in local transcription: {str(e)}")
+        return generate_demo_transcription()
+def format_transcription_as_chat(text):
+    """Format transcribed text into CHAT format"""
+    # Split text into sentences and format as participant speech
+    sentences = re.split(r'[.!?]+', text)
+    chat_lines = []
+    for sentence in sentences:
+        sentence = sentence.strip()
+        if sentence:
+            chat_lines.append(f"*PAR: {sentence}.")
+    return '\n'.join(chat_lines)
+def generate_demo_transcription():
+    """Generate a simulated transcription response"""
+    return """*PAR: today I want to tell you about my favorite toy.
+*PAR: it's a &-um teddy bear that I got for my birthday.
+*PAR: he has &-um brown fur and a red bow.
+*PAR: I like to sleep with him every night.
+*PAR: sometimes I take him to school in my backpack.
+*INV: what's your teddy bear's name?
+*PAR: his name is &-um Brownie because he's brown."""
+def generate_demo_response(prompt):
+    """Generate a response using Bedrock if available, otherwise return a demo response"""
+    # This function will attempt to call Bedrock, and only fall back to the demo response
+    # if Bedrock is not available or fails
+    # Try to call Bedrock first if client is available
+    if bedrock_client:
+        try:
+            return call_bedrock(prompt)
+        except Exception as e:
+            logger.error(f"Error calling Bedrock: {str(e)}")
+            logger.info("Falling back to demo response")
+            # Continue to fallback response if Bedrock call fails
+    # Fallback demo response
+    logger.warning("Using demo response - Bedrock client not available or call failed")
+    return """<SPEECH_FACTORS_START>
+Difficulty producing fluent speech: 8, 65
+Examples:
+- "today I would &-um like to talk about &-um a fun trip I took last &-um summer with my family"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+Word retrieval issues: 6, 72
+Examples:
+- "what do you call those &-um &-um sprinkles! that's the word"
+- "sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built"
+Grammatical errors: 4, 58
+Examples:
+- "after swimming we [//] I eat [: ate] [*] &-um ice cream"
+- "sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built"
+Repetitions and revisions: 5, 62
+Examples:
+- "we [/] we stayed for &-um three no [//] four days"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+<SPEECH_FACTORS_END>
+<CASL_SKILLS_START>
+Lexical/Semantic Skills: Standard Score (92), Percentile Rank (30%), Average Performance
+Examples:
+- "what do you call those &-um &-um sprinkles! that's the word"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+Syntactic Skills: Standard Score (87), Percentile Rank (19%), Low Average Performance
+Examples:
+- "my brother he [//] he helped me dig a big hole"
+- "after swimming we [//] I eat [: ate] [*] &-um ice cream with &-um chocolate things on top"
+Supralinguistic Skills: Standard Score (90), Percentile Rank (25%), Average Performance
+Examples:
+- "sometimes I wonder [/] wonder where fishies [: fish] [*] go when it's cold"
+- "maybe they have [/] have houses under the water"
+<CASL_SKILLS_END>
+<TREATMENT_RECOMMENDATIONS_START>
+- Implement word-finding strategies with semantic cuing focused on everyday objects and activities, using the patient's beach experience as a context (e.g., "sprinkles," "castles")
+- Practice structured narrative tasks with visual supports to reduce revisions and improve sequencing
+- Use sentence formulation exercises focusing on verb tense consistency (addressing errors like "forgetted" and "eat" for "ate")
+- Incorporate self-monitoring techniques to help identify and correct grammatical errors
+- Work on increasing vocabulary specificity (e.g., "things on top" to "sprinkles")
+<TREATMENT_RECOMMENDATIONS_END>
+<EXPLANATION_START>
+This child demonstrates moderate word-finding difficulties with compensatory strategies including fillers ("&-um") and repetitions. The frequent use of self-corrections shows good metalinguistic awareness, but the pauses and repairs impact conversational fluency. Syntactic errors primarily involve verb tense inconsistency. Overall, the pattern suggests a mild-to-moderate language disorder with stronger receptive than expressive skills.
+<EXPLANATION_END>
+<ADDITIONAL_ANALYSIS_START>
+The child shows relative strengths in maintaining topic coherence and conveying a complete narrative structure despite the language challenges. The pattern of errors suggests that word-finding difficulties and processing speed are primary concerns rather than conceptual or cognitive issues. Semantic network activities that strengthen word associations would likely be beneficial, particularly when paired with visual supports.
+<ADDITIONAL_ANALYSIS_END>
+<DIAGNOSTIC_IMPRESSIONS_START>
+Based on the language sample, this child presents with a profile consistent with a mild-to-moderate expressive language disorder. The most prominent features include:
+1. Word-finding difficulties characterized by fillers, pauses, and self-corrections when attempting to retrieve specific vocabulary
+2. Grammatical challenges primarily affecting verb tense consistency and morphological markers
+3. Relatively intact narrative structure and topic maintenance
+These findings suggest intervention should focus on word retrieval strategies, grammatical form practice, and continued support for narrative development, with an emphasis on fluency and self-monitoring.
+<DIAGNOSTIC_IMPRESSIONS_END>
+<ERROR_EXAMPLES_START>
+Word-finding difficulties:
+- "what do you call those &-um &-um sprinkles! that's the word"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+- "there was lots of &-um &-um swimming and &-um sun"
+Grammatical errors:
+- "after swimming we [//] I eat [: ate] [*] &-um ice cream"
+- "sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built"
+- "we saw [/] saw fishies [: fish] [*] swimming in the water"
+Repetitions and revisions:
+- "we [/] we stayed for &-um three no [//] four days"
+- "I want to go back to the beach [/] beach next year"
+- "sometimes I wonder [/] wonder where fishies [: fish] [*] go when it's cold"
+<ERROR_EXAMPLES_END>"""
+def parse_casl_response(response):
+    """Parse the LLM response for CASL analysis into structured data"""
+    # Extract speech factors section using section markers
+    speech_factors_section = ""
+    factors_pattern = re.compile(r"<SPEECH_FACTORS_START>(.*?)<SPEECH_FACTORS_END>", re.DOTALL)
+    factors_match = factors_pattern.search(response)
+    if factors_match:
+        speech_factors_section = factors_match.group(1).strip()
+    else:
+        speech_factors_section = "Error extracting speech factors from analysis."
+    # Extract CASL skills section
+    casl_section = ""
+    casl_pattern = re.compile(r"<CASL_SKILLS_START>(.*?)<CASL_SKILLS_END>", re.DOTALL)
+    casl_match = casl_pattern.search(response)
+    if casl_match:
+        casl_section = casl_match.group(1).strip()
+    else:
+        casl_section = "Error extracting CASL skills from analysis."
+    # Extract treatment recommendations
+    treatment_text = ""
+    treatment_pattern = re.compile(r"<TREATMENT_RECOMMENDATIONS_START>(.*?)<TREATMENT_RECOMMENDATIONS_END>", re.DOTALL)
+    treatment_match = treatment_pattern.search(response)
+    if treatment_match:
+        treatment_text = treatment_match.group(1).strip()
+    else:
+        treatment_text = "Error extracting treatment recommendations from analysis."
+    # Extract explanation section
+    explanation_text = ""
+    explanation_pattern = re.compile(r"<EXPLANATION_START>(.*?)<EXPLANATION_END>", re.DOTALL)
+    explanation_match = explanation_pattern.search(response)
+    if explanation_match:
+        explanation_text = explanation_match.group(1).strip()
+    else:
+        explanation_text = "Error extracting clinical explanation from analysis."
+    # Extract additional analysis
+    additional_analysis = ""
+    additional_pattern = re.compile(r"<ADDITIONAL_ANALYSIS_START>(.*?)<ADDITIONAL_ANALYSIS_END>", re.DOTALL)
+    additional_match = additional_pattern.search(response)
+    if additional_match:
+        additional_analysis = additional_match.group(1).strip()
+    # Extract diagnostic impressions
+    diagnostic_impressions = ""
+    diagnostic_pattern = re.compile(r"<DIAGNOSTIC_IMPRESSIONS_START>(.*?)<DIAGNOSTIC_IMPRESSIONS_END>", re.DOTALL)
+    diagnostic_match = diagnostic_pattern.search(response)
+    if diagnostic_match:
+        diagnostic_impressions = diagnostic_match.group(1).strip()
+    # Extract specific error examples
+    specific_errors_text = ""
+    errors_pattern = re.compile(r"<ERROR_EXAMPLES_START>(.*?)<ERROR_EXAMPLES_END>", re.DOTALL)
+    errors_match = errors_pattern.search(response)
+    if errors_match:
+        specific_errors_text = errors_match.group(1).strip()
+    # Create full report text
+    full_report = f"""
+## Speech Factors Analysis
+{speech_factors_section}
+## CASL Skills Assessment
+{casl_section}
+## Treatment Recommendations
+{treatment_text}
+## Clinical Explanation
+{explanation_text}
+"""
+    if additional_analysis:
+        full_report += f"\n## Additional Analysis\n\n{additional_analysis}"
+    if diagnostic_impressions:
+        full_report += f"\n## Diagnostic Impressions\n\n{diagnostic_impressions}"
+    if specific_errors_text:
+        full_report += f"\n## Detailed Error Examples\n\n{specific_errors_text}"
+    return {
+        'speech_factors': speech_factors_section,
+        'casl_data': casl_section,
+        'treatment_suggestions': treatment_text,
+        'explanation': explanation_text,
+        'additional_analysis': additional_analysis,
+        'diagnostic_impressions': diagnostic_impressions,
+        'specific_errors': specific_errors_text,
+        'full_report': full_report,
+        'raw_response': response
+    }
+def analyze_transcript(transcript, age, gender):
+    """Analyze a speech transcript using Claude"""
+    # CASL-2 assessment cheat sheet
+    cheat_sheet = """
+    # Speech-Language Pathologist Analysis Cheat Sheet
+    ## Types of Speech Patterns to Identify:
+    1. Difficulty producing fluent, grammatical speech
+       - Fillers (um, uh) and pauses
+       - False starts and revisions
+       - Incomplete sentences
+    2. Word retrieval issues
+       - Pauses before content words
+       - Circumlocutions (talking around a word)
+       - Word substitutions
+    3. Grammatical errors
+       - Verb tense inconsistencies
+       - Subject-verb agreement errors
+       - Morphological errors (plurals, possessives)
+    4. Repetitions and revisions
+       - Word or phrase repetitions [/]
+       - Self-corrections [//]
+       - Retracing
+    5. Neologisms
+       - Made-up words
+       - Word blends
+    6. Perseveration
+       - Inappropriate repetition of ideas
+       - Recurring themes
+    7. Comprehension issues
+       - Topic maintenance difficulties
+       - Non-sequiturs
+       - Inappropriate responses
+    """
+    # Instructions for the analysis
+    instructions = """
+    Analyze this speech transcript to identify specific patterns and provide a detailed CASL-2 (Comprehensive Assessment of Spoken Language) assessment.
+    For each speech pattern you identify:
+    1. Count the occurrences in the transcript
+    2. Estimate a percentile (how typical/atypical this is for the age)
+    3. Provide DIRECT QUOTES from the transcript as evidence
+    Then assess the following CASL-2 domains:
+    1. Lexical/Semantic Skills:
+       - Assess vocabulary diversity, word-finding abilities, semantic precision
+       - Provide Standard Score (mean=100, SD=15), percentile rank, and performance level
+       - Include SPECIFIC QUOTES as evidence
+    2. Syntactic Skills:
+       - Evaluate grammatical accuracy, sentence complexity, morphological skills
+       - Provide Standard Score, percentile rank, and performance level
+       - Include SPECIFIC QUOTES as evidence
+    3. Supralinguistic Skills:
+       - Assess figurative language use, inferencing, and abstract reasoning
+       - Provide Standard Score, percentile rank, and performance level
+       - Include SPECIFIC QUOTES as evidence
+    YOUR RESPONSE MUST USE THESE EXACT SECTION MARKERS FOR PARSING:
+    <SPEECH_FACTORS_START>
+    Difficulty producing fluent, grammatical speech: (occurrences), (percentile)
+    Examples:
+    - "(direct quote from transcript)"
+    - "(direct quote from transcript)"
+    Word retrieval issues: (occurrences), (percentile)
+    Examples:
+    - "(direct quote from transcript)"
+    - "(direct quote from transcript)"
+    (And so on for each factor)
+    <SPEECH_FACTORS_END>
+    <CASL_SKILLS_START>
+    Lexical/Semantic Skills: Standard Score (X), Percentile Rank (X%), Performance Level
+    Examples:
+    - "(direct quote showing strength or weakness)"
+    - "(direct quote showing strength or weakness)"
+    Syntactic Skills: Standard Score (X), Percentile Rank (X%), Performance Level
+    Examples:
+    - "(direct quote showing strength or weakness)"
+    - "(direct quote showing strength or weakness)"
+    Supralinguistic Skills: Standard Score (X), Percentile Rank (X%), Performance Level
+    Examples:
+    - "(direct quote showing strength or weakness)"
+    - "(direct quote showing strength or weakness)"
+    <CASL_SKILLS_END>
+    <TREATMENT_RECOMMENDATIONS_START>
+    - (treatment recommendation)
+    - (treatment recommendation)
+    - (treatment recommendation)
+    <TREATMENT_RECOMMENDATIONS_END>
+    <EXPLANATION_START>
+    (brief diagnostic rationale based on findings)
+    <EXPLANATION_END>
+    <ADDITIONAL_ANALYSIS_START>
+    (specific insights that would be helpful for treatment planning)
+    <ADDITIONAL_ANALYSIS_END>
+    <DIAGNOSTIC_IMPRESSIONS_START>
+    (summarize findings across domains using specific examples and clear explanations)
+    <DIAGNOSTIC_IMPRESSIONS_END>
+    <ERROR_EXAMPLES_START>
+    (Copy all the specific quote examples here again, organized by error type or skill domain)
+    <ERROR_EXAMPLES_END>
+    MOST IMPORTANT:
+    1. Use EXACTLY the section markers provided (like <SPEECH_FACTORS_START>) to make parsing reliable
+    2. For EVERY factor and domain you analyze, you MUST provide direct quotes from the transcript as evidence
+    3. Be very specific and cite the exact text
+    4. Do not omit any of the required sections
+    """
+    # Prepare prompt for Claude with the user's role context
+    role_context = """
+    You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, swallowing, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
+    You are working with a student with speech impediments.
+    The most important thing is that you stay kind to the child. Be constructive and helpful rather than critical.
+    """
+    prompt = f"""
+    {role_context}
+    You are analyzing a transcript for a patient who is {age} years old and {gender}.
+    TRANSCRIPT:
+    {transcript}
+    {cheat_sheet}
+    {instructions}
+    Remember to be precise but compassionate in your analysis. Use direct quotes from the transcript for every factor and domain you analyze.
+    """
+    # Call the appropriate API or fallback to demo mode
+    response = generate_demo_response(prompt)
+    # Parse the response
+    results = parse_casl_response(response)
+    return results
+def create_interface():
+    """Create the Gradio interface"""
+    # Set a theme compatible with Hugging Face Spaces
+    theme = gr.themes.Soft(
+        primary_hue="blue",
+        secondary_hue="indigo",
+    )
+    with gr.Blocks(title="Simple CASL Analysis Tool", theme=theme) as app:
+        gr.Markdown("# CASL Analysis Tool")
+        gr.Markdown("A simplified tool for analyzing speech transcripts and audio using CASL framework")
+        with gr.Tabs() as main_tabs:
+            # Analysis Tab
+            with gr.TabItem("Analysis", id=0):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        # Patient info
+                        gr.Markdown("### Patient Information")
+                        patient_name = gr.Textbox(label="Patient Name", placeholder="Enter patient name")
+                        record_id = gr.Textbox(label="Record ID", placeholder="Enter record ID")
+                        with gr.Row():
+                            age = gr.Number(label="Age", value=8, minimum=1, maximum=120)
+                            gender = gr.Radio(["male", "female", "other"], label="Gender", value="male")
+                        assessment_date = gr.Textbox(
+                            label="Assessment Date",
+                            placeholder="MM/DD/YYYY",
+                            value=datetime.now().strftime('%m/%d/%Y')
+                        )
+                        clinician_name = gr.Textbox(label="Clinician", placeholder="Enter clinician name")
+                        # Transcript input
+                        gr.Markdown("### Transcript")
+                        sample_btn = gr.Button("Load Sample Transcript")
+                        file_upload = gr.File(label="Upload transcript file (.txt or .cha)")
+                        transcript = gr.Textbox(
+                            label="Speech transcript (CHAT format preferred)",
+                            placeholder="Enter transcript text or upload a file...",
+                            lines=10
+                        )
+                        # Analysis button
+                        analyze_btn = gr.Button("Analyze Transcript", variant="primary")
+                    with gr.Column(scale=1):
+                        # Results display
+                        gr.Markdown("### Analysis Results")
+                        analysis_output = gr.Markdown(label="Full Analysis")
+                        # PDF export (only shown if ReportLab is available)
+                        export_status = gr.Markdown("")
+                        if REPORTLAB_AVAILABLE:
+                            export_btn = gr.Button("Export as PDF", variant="secondary")
+                        else:
+                            gr.Markdown("⚠️ PDF export is disabled - ReportLab library is not installed")
+            # Transcription Tab
+            with gr.TabItem("Transcription", id=1):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### Audio Transcription")
+                        gr.Markdown("Upload an audio recording to automatically transcribe it in CHAT format")
+                        # Patient's age helps with transcription accuracy
+                        transcription_age = gr.Number(label="Patient Age", value=8, minimum=1, maximum=120,
+                                                     info="For children under 10, special language models may be used")
+                        # Audio input - FIXED: removed format parameter
+                        audio_input = gr.Audio(type="filepath", label="Upload Audio Recording")
+                        # Transcribe button
+                        transcribe_btn = gr.Button("Transcribe Audio", variant="primary")
+                    with gr.Column(scale=1):
+                        # Transcription output
+                        transcription_output = gr.Textbox(
+                            label="Transcription Result",
+                            placeholder="Transcription will appear here...",
+                            lines=12
+                        )
+                        with gr.Row():
+                            # Button to use transcription in analysis
+                            copy_to_analysis_btn = gr.Button("Use for Analysis", variant="secondary")
+                        # Status/info message
+                        transcription_status = gr.Markdown("")
+        # Load sample transcript button
+        def load_sample():
+            return SAMPLE_TRANSCRIPT
+        sample_btn.click(load_sample, outputs=[transcript])
+        # File upload handler
+        file_upload.upload(process_upload, file_upload, transcript)
+        # Analysis button handler
+        def on_analyze_click(transcript_text, age_val, gender_val, patient_name_val, record_id_val, clinician_val, assessment_date_val):
+            if not transcript_text or len(transcript_text.strip()) < 50:
+                return "Error: Please provide a longer transcript for analysis."
+            try:
+                # Get the analysis results
+                results = analyze_transcript(transcript_text, age_val, gender_val)
+                # Return the full report
+                return results['full_report']
+            except Exception as e:
+                logger.exception("Error during analysis")
+                return f"Error during analysis: {str(e)}"
+        analyze_btn.click(
+            on_analyze_click,
+            inputs=[
+                transcript, age, gender,
+                patient_name, record_id, clinician_name, assessment_date
+            ],
+            outputs=[analysis_output]
+        )
+        # Transcription button handler
+        def on_transcribe_audio(audio_path, age_val, patient_name_val, record_id_val):
+            try:
+                if not audio_path:
+                    return "Please upload an audio file to transcribe.", "Error: No audio file provided."
+                # Process the audio file with local transcription and S3 upload
+                transcription = transcribe_audio_local(audio_path, patient_name_val, record_id_val)
+                # Return status message based on whether it's a demo or real transcription
+                if not SPEECH_RECOGNITION_AVAILABLE:
+                    status_msg = "⚠️ Demo mode: Using example transcription (speech_recognition not installed)"
+                else:
+                    s3_status = " and uploaded to S3" if s3_client else ""
+                    status_msg = f"✅ Transcription completed successfully{s3_status}"
+                return transcription, status_msg
+            except Exception as e:
+                logger.exception("Error transcribing audio")
+                return f"Error: {str(e)}", f"❌ Transcription failed: {str(e)}"
+        # Connect the transcribe button to its handler
+        transcribe_btn.click(
+            on_transcribe_audio,
+            inputs=[audio_input, transcription_age, patient_name, record_id],
+            outputs=[transcription_output, transcription_status]
+        )
+        # Copy transcription to analysis tab
+        def copy_to_analysis(transcription):
+            return transcription, gr.update(selected=0)  # Switch to Analysis tab
+        copy_to_analysis_btn.click(
+            copy_to_analysis,
+            inputs=[transcription_output],
+            outputs=[transcript, main_tabs]
+        )
+    return app
+if __name__ == "__main__":
+    # Check for AWS credentials
+    if not AWS_ACCESS_KEY or not AWS_SECRET_KEY:
+        print("NOTE: AWS credentials not found. The app will run in demo mode with simulated responses.")
+        print("To enable full functionality, set AWS_ACCESS_KEY, AWS_SECRET_KEY, and optionally S3_BUCKET environment variables.")
+    else:
+        print(f"AWS clients initialized. S3 bucket: {S3_BUCKET}")
+        if s3_client:
+            print("✅ S3 audio storage enabled")
+        else:
+            print("⚠️ S3 client not available")
+    app = create_interface()
+    app.launch(show_api=False)  # Disable API tab for security

reference_files/CLEANUP_PLAN.md ADDED Viewed

	@@ -0,0 +1,73 @@

+# CASL Directory Cleanup Plan
+## ✅ KEEP (Deployment Ready)
+### For Simple Deployment:
+- `README.md` - HuggingFace Spaces config
+- `simple_casl.py` - Ultra-simple version (186 lines)
+- `requirements.txt` - Dependencies
+### For Full-Featured Deployment:
+- `app.py` - Complete version (683 lines)
+- `simple_app_fixed.py` - Alternative moderate version
+### Reference:
+- `aphasia_analysis_app_code.py` - Working Bedrock API reference
+## 🗑️ REMOVE (Redundant/Problematic)
+### Large/Complex Files with Issues:
+- `casl_analysis.py` (2493 lines) - S3 dependencies, errors
+- `casl_analysis_improved.py` (1443 lines) - Compatibility issues
+- `copy_of_casl_analysis.py` (1490 lines) - Duplicate
+- `simple_app.py` (1207 lines) - S3 dependencies, replaced
+### Redundant Files:
+- `requirements_improved.txt` - Use main requirements.txt instead
+### Auto-Generated:
+- `patient_data/` directory - Will be recreated automatically
+## 🎯 FINAL DEPLOYMENT STRUCTURE
+### Option 1: Ultra-Simple
+```
+/CASL/
+├── README.md (app_file: simple_casl.py)
+├── simple_casl.py
+├── requirements.txt
+└── aphasia_analysis_app_code.py (reference)
+```
+### Option 2: Full-Featured
+```
+/CASL/
+├── README.md (app_file: app.py)
+├── app.py
+├── requirements.txt
+└── aphasia_analysis_app_code.py (reference)
+```
+## 📋 CLEANUP COMMANDS
+```bash
+# Remove redundant files
+rm casl_analysis.py
+rm casl_analysis_improved.py
+rm copy_of_casl_analysis.py
+rm simple_app.py
+rm requirements_improved.txt
+# Remove auto-generated data
+rm -rf patient_data/
+# Update README.md to point to chosen app file
+```
+## 🚀 RECOMMENDATION
+**Use Option 1 (Ultra-Simple)** for reliable deployment:
+- Smallest codebase (186 lines)
+- Fewest dependencies
+- Proven Bedrock API format
+- Clean, focused functionality

reference_files/casl_analysis.py ADDED Viewed

The diff for this file is too large to render. See raw diff

reference_files/copy_of_casl_analysis.py ADDED Viewed

	@@ -0,0 +1,1491 @@

+import gradio as gr
+import boto3
+import json
+import pandas as pd
+import matplotlib.pyplot as plt
+import numpy as np
+import re
+import logging
+import os
+import pickle
+import csv
+from PIL import Image
+import io
+from datetime import datetime
+import uuid
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Try to import ReportLab (needed for PDF generation)
+try:
+    from reportlab.lib.pagesizes import letter
+    from reportlab.lib import colors
+    from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
+    from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
+    REPORTLAB_AVAILABLE = True
+except ImportError:
+    logger.warning("ReportLab library not available - PDF export will be disabled")
+    REPORTLAB_AVAILABLE = False
+# Try to import PyPDF2 (needed for PDF reading)
+try:
+    import PyPDF2
+    PYPDF2_AVAILABLE = True
+except ImportError:
+    logger.warning("PyPDF2 library not available - PDF reading will be disabled")
+    PYPDF2_AVAILABLE = False
+# AWS credentials for Bedrock API
+# For HuggingFace Spaces, set these as secrets in the Space settings
+AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY", "")
+AWS_SECRET_KEY = os.getenv("AWS_SECRET_KEY", "")
+AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
+# Initialize AWS clients if credentials are available
+bedrock_client = None
+transcribe_client = None
+s3_client = None
+if AWS_ACCESS_KEY and AWS_SECRET_KEY:
+    try:
+        # Initialize Bedrock client for AI analysis
+        bedrock_client = boto3.client(
+            'bedrock-runtime',
+            aws_access_key_id=AWS_ACCESS_KEY,
+            aws_secret_access_key=AWS_SECRET_KEY,
+            region_name=AWS_REGION
+        )
+        logger.info("Bedrock client initialized successfully")
+        # Initialize Transcribe client for speech-to-text
+        transcribe_client = boto3.client(
+            'transcribe',
+            aws_access_key_id=AWS_ACCESS_KEY,
+            aws_secret_access_key=AWS_SECRET_KEY,
+            region_name=AWS_REGION
+        )
+        logger.info("Transcribe client initialized successfully")
+        # Initialize S3 client for storing audio files
+        s3_client = boto3.client(
+            's3',
+            aws_access_key_id=AWS_ACCESS_KEY,
+            aws_secret_access_key=AWS_SECRET_KEY,
+            region_name=AWS_REGION
+        )
+        logger.info("S3 client initialized successfully")
+    except Exception as e:
+        logger.error(f"Failed to initialize AWS clients: {str(e)}")
+# S3 bucket for storing audio files
+S3_BUCKET = os.environ.get("S3_BUCKET", "casl-audio-files")
+S3_PREFIX = "transcribe-audio/"
+# Sample transcript for the demo
+SAMPLE_TRANSCRIPT = """*PAR: today I would &-um like to talk about &-um a fun trip I took last &-um summer with my family.
+*PAR: we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually.
+*PAR: there was lots of &-um &-um swimming and &-um sun.
+*PAR: we [/] we stayed for &-um three no [//] four days in a &-um hotel near the water [: ocean] [*].
+*PAR: my favorite part was &-um building &-um castles with sand.
+*PAR: sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built.
+*PAR: my brother he [//] he helped me dig a big hole.
+*PAR: we saw [/] saw fishies [: fish] [*] swimming in the water.
+*PAR: sometimes I wonder [/] wonder where fishies [: fish] [*] go when it's cold.
+*PAR: maybe they have [/] have houses under the water.
+*PAR: after swimming we [//] I eat [: ate] [*] &-um ice cream with &-um chocolate things on top.
+*PAR: what do you call those &-um &-um sprinkles! that's the word.
+*PAR: my mom said to &-um that I could have &-um two scoops next time.
+*PAR: I want to go back to the beach [/] beach next year."""
+# ===============================
+# Database and Storage Functions
+# ===============================
+# Create data directories if they don't exist
+DATA_DIR = os.environ.get("DATA_DIR", "patient_data")
+RECORDS_FILE = os.path.join(DATA_DIR, "patient_records.csv")
+ANALYSES_DIR = os.path.join(DATA_DIR, "analyses")
+DOWNLOADS_DIR = os.path.join(DATA_DIR, "downloads")
+AUDIO_DIR = os.path.join(DATA_DIR, "audio")
+def ensure_data_dirs():
+    """Ensure data directories exist"""
+    global DOWNLOADS_DIR, AUDIO_DIR
+    try:
+        os.makedirs(DATA_DIR, exist_ok=True)
+        os.makedirs(ANALYSES_DIR, exist_ok=True)
+        os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        os.makedirs(AUDIO_DIR, exist_ok=True)
+        logger.info(f"Data directories created: {DATA_DIR}, {ANALYSES_DIR}, {DOWNLOADS_DIR}, {AUDIO_DIR}")
+        # Create records file if it doesn't exist
+        if not os.path.exists(RECORDS_FILE):
+            with open(RECORDS_FILE, 'w', newline='') as f:
+                writer = csv.writer(f)
+                writer.writerow([
+                    "ID", "Name", "Record ID", "Age", "Gender",
+                    "Assessment Date", "Clinician", "Analysis Date", "File Path"
+                ])
+    except Exception as e:
+        logger.warning(f"Could not create data directories: {str(e)}")
+        # Fallback to tmp directory on HF Spaces
+        DOWNLOADS_DIR = os.path.join(os.path.expanduser("~"), "casl_downloads")
+        AUDIO_DIR = os.path.join(os.path.expanduser("~"), "casl_audio")
+        os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        os.makedirs(AUDIO_DIR, exist_ok=True)
+        logger.info(f"Using fallback directories: {DOWNLOADS_DIR}, {AUDIO_DIR}")
+# Initialize data directories
+ensure_data_dirs()
+def save_patient_record(patient_info, analysis_results, transcript):
+    """Save patient record to storage"""
+    try:
+        # Generate unique ID for the record
+        record_id = str(uuid.uuid4())
+        # Extract patient information
+        name = patient_info.get("name", "")
+        patient_id = patient_info.get("record_id", "")
+        age = patient_info.get("age", "")
+        gender = patient_info.get("gender", "")
+        assessment_date = patient_info.get("assessment_date", "")
+        clinician = patient_info.get("clinician", "")
+        # Create filename for the analysis data
+        filename = f"analysis_{record_id}.pkl"
+        filepath = os.path.join(ANALYSES_DIR, filename)
+        # Save analysis data
+        with open(filepath, 'wb') as f:
+            pickle.dump({
+                "patient_info": patient_info,
+                "analysis_results": analysis_results,
+                "transcript": transcript,
+                "timestamp": datetime.now().isoformat(),
+            }, f)
+        # Add record to CSV file
+        with open(RECORDS_FILE, 'a', newline='') as f:
+            writer = csv.writer(f)
+            writer.writerow([
+                record_id, name, patient_id, age, gender,
+                assessment_date, clinician, datetime.now().strftime('%Y-%m-%d'),
+                filepath
+            ])
+        return record_id
+    except Exception as e:
+        logger.error(f"Error saving patient record: {str(e)}")
+        return None
+def load_patient_record(record_id):
+    """Load patient record from storage"""
+    try:
+        # Find the record in the CSV file
+        if not os.path.exists(RECORDS_FILE):
+            logger.error(f"Records file does not exist: {RECORDS_FILE}")
+            return None
+        with open(RECORDS_FILE, 'r', newline='') as f:
+            reader = csv.reader(f)
+            next(reader)  # Skip header
+            for row in reader:
+                if len(row) < 9:  # Ensure row has enough elements
+                    logger.warning(f"Skipping malformed record row: {row}")
+                    continue
+                if row[0] == record_id:
+                    file_path = row[8]
+                    # Check if the file exists
+                    if not os.path.exists(file_path):
+                        logger.error(f"Analysis file not found: {file_path}")
+                        return None
+                    # Load and return the data
+                    try:
+                        with open(file_path, 'rb') as f:
+                            return pickle.load(f)
+                    except (pickle.PickleError, EOFError) as pickle_err:
+                        logger.error(f"Error unpickling file {file_path}: {str(pickle_err)}")
+                        return None
+        logger.warning(f"Record ID not found: {record_id}")
+        return None
+    except Exception as e:
+        logger.error(f"Error loading patient record: {str(e)}")
+        return None
+def get_all_patient_records():
+    """Return a list of all patient records"""
+    try:
+        records = []
+        # Ensure data directories exist
+        ensure_data_dirs()
+        if not os.path.exists(RECORDS_FILE):
+            logger.warning(f"Records file does not exist, creating it: {RECORDS_FILE}")
+            with open(RECORDS_FILE, 'w', newline='') as f:
+                writer = csv.writer(f)
+                writer.writerow([
+                    "ID", "Name", "Record ID", "Age", "Gender",
+                    "Assessment Date", "Clinician", "Analysis Date", "File Path"
+                ])
+            return records
+        # Read existing records
+        valid_records = []
+        with open(RECORDS_FILE, 'r', newline='') as f:
+            reader = csv.reader(f)
+            next(reader)  # Skip header
+            for row in reader:
+                if len(row) < 9:  # Check for malformed rows
+                    continue
+                # Check if the analysis file exists
+                file_path = row[8]
+                file_exists = os.path.exists(file_path)
+                record = {
+                    "id": row[0],
+                    "name": row[1],
+                    "record_id": row[2],
+                    "age": row[3],
+                    "gender": row[4],
+                    "assessment_date": row[5],
+                    "clinician": row[6],
+                    "analysis_date": row[7],
+                    "file_path": file_path,
+                    "status": "Valid" if file_exists else "Missing File"
+                }
+                records.append(record)
+                # Keep track of valid records for potential cleanup
+                if file_exists:
+                    valid_records.append(row)
+        # If we found invalid records, consider rewriting the CSV with only valid entries
+        if len(valid_records) < len(records):
+            logger.warning(f"Found {len(records) - len(valid_records)} invalid records")
+            # Uncomment to enable automatic cleanup:
+            # with open(RECORDS_FILE, 'w', newline='') as f:
+            #     writer = csv.writer(f)
+            #     writer.writerow([
+            #         "ID", "Name", "Record ID", "Age", "Gender",
+            #         "Assessment Date", "Clinician", "Analysis Date", "File Path"
+            #     ])
+            #     for row in valid_records:
+            #         writer.writerow(row)
+        return records
+    except Exception as e:
+        logger.error(f"Error getting patient records: {str(e)}")
+        return []
+def delete_patient_record(record_id):
+    """Delete a patient record"""
+    try:
+        if not os.path.exists(RECORDS_FILE):
+            return False
+        # Find the record and its file
+        file_path = None
+        with open(RECORDS_FILE, 'r', newline='') as f:
+            reader = csv.reader(f)
+            rows = list(reader)
+            header = rows[0]
+            for i, row in enumerate(rows[1:], 1):
+                if len(row) < 9:
+                    continue
+                if row[0] == record_id:
+                    file_path = row[8]
+                    break
+        if not file_path:
+            return False
+        # Delete the analysis file if it exists
+        if os.path.exists(file_path):
+            os.remove(file_path)
+        # Remove the record from the CSV
+        rows_to_keep = [row for row in rows[1:] if len(row) >= 9 and row[0] != record_id]
+        with open(RECORDS_FILE, 'w', newline='') as f:
+            writer = csv.writer(f)
+            writer.writerow(header)
+            writer.writerows(rows_to_keep)
+        return True
+    except Exception as e:
+        logger.error(f"Error deleting patient record: {str(e)}")
+        return False
+# ===============================
+# Utility Functions
+# ===============================
+def read_pdf(file_path):
+    """Read text from a PDF file"""
+    if not PYPDF2_AVAILABLE:
+        return "Error: PDF reading is not available - PyPDF2 library is not installed"
+    try:
+        with open(file_path, 'rb') as file:
+            pdf_reader = PyPDF2.PdfReader(file)
+            text = ""
+            for page in pdf_reader.pages:
+                text += page.extract_text()
+            return text
+    except Exception as e:
+        logger.error(f"Error reading PDF: {str(e)}")
+        return ""
+def read_cha_file(file_path):
+    """Read and parse a .cha transcript file"""
+    try:
+        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+            content = f.read()
+        # Extract participant lines (starting with *PAR:)
+        par_lines = []
+        for line in content.splitlines():
+            if line.startswith('*PAR:'):
+                par_lines.append(line)
+        # If no PAR lines found, just return the whole content
+        if not par_lines:
+            return content
+        return '\n'.join(par_lines)
+    except Exception as e:
+        logger.error(f"Error reading CHA file: {str(e)}")
+        return ""
+def process_upload(file):
+    """Process an uploaded file (PDF, text, or CHA)"""
+    if file is None:
+        return ""
+    file_path = file.name
+    if file_path.endswith('.pdf'):
+        if PYPDF2_AVAILABLE:
+            return read_pdf(file_path)
+        else:
+            return "Error: PDF reading is disabled - PyPDF2 library is not installed"
+    elif file_path.endswith('.cha'):
+        return read_cha_file(file_path)
+    else:
+        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+            return f.read()
+# ===============================
+# AI Model Interface Functions
+# ===============================
+def call_bedrock(prompt, max_tokens=4096):
+    """Call the AWS Bedrock API to analyze text using Claude"""
+    if not bedrock_client:
+        return "AWS credentials not configured. Using demo response instead."
+    try:
+        body = json.dumps({
+            "anthropic_version": "bedrock-2023-05-31",
+            "max_tokens": max_tokens,
+            "messages": [
+                {
+                    "role": "user",
+                    "content": prompt
+                }
+            ],
+            "temperature": 0.3,
+            "top_p": 0.9
+        })
+        modelId = 'anthropic.claude-3-sonnet-20240229-v1:0'
+        response = bedrock_client.invoke_model(
+            body=body,
+            modelId=modelId,
+            accept='application/json',
+            contentType='application/json'
+        )
+        response_body = json.loads(response.get('body').read())
+        return response_body['content'][0]['text']
+    except Exception as e:
+        logger.error(f"Error in call_bedrock: {str(e)}")
+        return f"Error: {str(e)}"
+def transcribe_audio(audio_path, patient_age=8):
+    """Transcribe an audio recording using Amazon Transcribe and format in CHAT format"""
+    if not os.path.exists(audio_path):
+        logger.error(f"Audio file not found: {audio_path}")
+        return "Error: Audio file not found."
+    if not transcribe_client or not s3_client:
+        logger.warning("AWS clients not initialized, using demo transcription")
+        return generate_demo_transcription()
+    try:
+        # Get file info
+        file_name = os.path.basename(audio_path)
+        file_size = os.path.getsize(audio_path)
+        _, file_extension = os.path.splitext(file_name)
+        # Check file format
+        supported_formats = ['.mp3', '.mp4', '.wav', '.flac', '.ogg', '.amr', '.webm']
+        if file_extension.lower() not in supported_formats:
+            logger.error(f"Unsupported audio format: {file_extension}")
+            return f"Error: Unsupported audio format. Please use one of: {', '.join(supported_formats)}"
+        # Generate a unique job name
+        timestamp = datetime.now().strftime('%Y%m%d%H%M%S')
+        job_name = f"casl-transcription-{timestamp}"
+        s3_key = f"{S3_PREFIX}{job_name}{file_extension}"
+        # Upload to S3
+        logger.info(f"Uploading {file_name} to S3 bucket {S3_BUCKET}")
+        try:
+            with open(audio_path, 'rb') as audio_file:
+                s3_client.upload_fileobj(audio_file, S3_BUCKET, s3_key)
+        except Exception as e:
+            logger.error(f"Failed to upload to S3: {str(e)}")
+            # If upload fails, try to create the bucket
+            try:
+                s3_client.create_bucket(Bucket=S3_BUCKET)
+                logger.info(f"Created S3 bucket: {S3_BUCKET}")
+                # Try upload again
+                with open(audio_path, 'rb') as audio_file:
+                    s3_client.upload_fileobj(audio_file, S3_BUCKET, s3_key)
+            except Exception as bucket_error:
+                logger.error(f"Failed to create bucket and upload: {str(bucket_error)}")
+                return "Error: Failed to upload audio file. Please check your AWS permissions."
+        # Start transcription job
+        logger.info(f"Starting transcription job: {job_name}")
+        media_format = file_extension.lower()[1:]  # Remove the dot
+        if media_format == 'webm':
+            media_format = 'webm'  # Amazon Transcribe expects this
+        # Determine language settings based on patient age
+        if patient_age < 10:
+            # For younger children, enabling child language model is helpful
+            language_options = {
+                'LanguageCode': 'en-US',
+                'Settings': {
+                    'ShowSpeakerLabels': True,
+                    'MaxSpeakerLabels': 2  # Typically patient + clinician
+                }
+            }
+        else:
+            language_options = {
+                'LanguageCode': 'en-US',
+                'Settings': {
+                    'ShowSpeakerLabels': True,
+                    'MaxSpeakerLabels': 2  # Typically patient + clinician
+                }
+            }
+        transcribe_client.start_transcription_job(
+            TranscriptionJobName=job_name,
+            Media={
+                'MediaFileUri': f"s3://{S3_BUCKET}/{s3_key}"
+            },
+            MediaFormat=media_format,
+            **language_options
+        )
+        # Wait for the job to complete (with timeout)
+        logger.info("Waiting for transcription to complete...")
+        max_tries = 30  # 5 minutes max wait
+        tries = 0
+        while tries < max_tries:
+            try:
+                job = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)
+                status = job['TranscriptionJob']['TranscriptionJobStatus']
+                if status == 'COMPLETED':
+                    # Get the transcript
+                    transcript_uri = job['TranscriptionJob']['Transcript']['TranscriptFileUri']
+                    # Download the transcript
+                    import urllib.request
+                    import json
+                    with urllib.request.urlopen(transcript_uri) as response:
+                        transcript_json = json.loads(response.read().decode('utf-8'))
+                    # Convert to CHAT format
+                    chat_transcript = format_as_chat(transcript_json)
+                    return chat_transcript
+                elif status == 'FAILED':
+                    reason = job['TranscriptionJob'].get('FailureReason', 'Unknown failure')
+                    logger.error(f"Transcription job failed: {reason}")
+                    return f"Error: Transcription failed - {reason}"
+                # Still in progress, wait and try again
+                tries += 1
+                time.sleep(10)  # Check every 10 seconds
+            except Exception as e:
+                logger.error(f"Error checking transcription job: {str(e)}")
+                return f"Error getting transcription: {str(e)}"
+        # If we got here, we timed out
+        return "Error: Transcription timed out. The process is taking longer than expected."
+    except Exception as e:
+        logger.exception("Error in audio transcription")
+        return f"Error transcribing audio: {str(e)}"
+def format_as_chat(transcript_json):
+    """Format the Amazon Transcribe JSON result as CHAT format"""
+    try:
+        # Get transcript items
+        items = transcript_json['results']['items']
+        # Get speaker labels if available
+        speakers = {}
+        if 'speaker_labels' in transcript_json['results']:
+            speaker_segments = transcript_json['results']['speaker_labels']['segments']
+            # Map each item to its speaker
+            for segment in speaker_segments:
+                for item in segment['items']:
+                    start_time = item['start_time']
+                    speakers[start_time] = segment['speaker_label']
+        # Build transcript by combining words into utterances by speaker
+        current_speaker = None
+        current_utterance = []
+        utterances = []
+        for item in items:
+            # Skip non-pronunciation items (like punctuation)
+            if item['type'] != 'pronunciation':
+                continue
+            word = item['alternatives'][0]['content']
+            start_time = item.get('start_time')
+            # Determine speaker if available
+            speaker = speakers.get(start_time, 'spk_0')
+            # If speaker changed, start a new utterance
+            if speaker != current_speaker and current_utterance:
+                utterances.append((current_speaker, ' '.join(current_utterance)))
+                current_utterance = []
+            current_speaker = speaker
+            current_utterance.append(word)
+        # Add the last utterance
+        if current_utterance:
+            utterances.append((current_speaker, ' '.join(current_utterance)))
+        # Format as CHAT
+        chat_lines = []
+        for speaker, text in utterances:
+            # Map speakers to CHAT format
+            # Assuming spk_0 is the patient (PAR) and spk_1 is the clinician (INV)
+            chat_speaker = "*PAR:" if speaker == "spk_0" else "*INV:"
+            chat_lines.append(f"{chat_speaker} {text}.")
+        return '\n'.join(chat_lines)
+    except Exception as e:
+        logger.exception("Error formatting transcript")
+        return "*PAR: (Error formatting transcript)"
+def generate_demo_transcription():
+    """Generate a simulated transcription response"""
+    return """*PAR: today I want to tell you about my favorite toy.
+*PAR: it's a &-um teddy bear that I got for my birthday.
+*PAR: he has &-um brown fur and a red bow.
+*PAR: I like to sleep with him every night.
+*PAR: sometimes I take him to school in my backpack.
+*INV: what's your teddy bear's name?
+*PAR: his name is &-um Brownie because he's brown."""
+def generate_demo_response(prompt):
+    """Generate a simulated response for demo purposes"""
+    # This function generates a realistic but fake response for demo purposes
+    # In a real deployment, you would call an actual LLM API
+    return """<SPEECH_FACTORS_START>
+Difficulty producing fluent speech: 8, 65
+Examples:
+- "today I would &-um like to talk about &-um a fun trip I took last &-um summer with my family"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+Word retrieval issues: 6, 72
+Examples:
+- "what do you call those &-um &-um sprinkles! that's the word"
+- "sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built"
+Grammatical errors: 4, 58
+Examples:
+- "after swimming we [//] I eat [: ate] [*] &-um ice cream"
+- "sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built"
+Repetitions and revisions: 5, 62
+Examples:
+- "we [/] we stayed for &-um three no [//] four days"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+<SPEECH_FACTORS_END>
+<CASL_SKILLS_START>
+Lexical/Semantic Skills: Standard Score (92), Percentile Rank (30%), Average Performance
+Examples:
+- "what do you call those &-um &-um sprinkles! that's the word"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+Syntactic Skills: Standard Score (87), Percentile Rank (19%), Low Average Performance
+Examples:
+- "my brother he [//] he helped me dig a big hole"
+- "after swimming we [//] I eat [: ate] [*] &-um ice cream with &-um chocolate things on top"
+Supralinguistic Skills: Standard Score (90), Percentile Rank (25%), Average Performance
+Examples:
+- "sometimes I wonder [/] wonder where fishies [: fish] [*] go when it's cold"
+- "maybe they have [/] have houses under the water"
+<CASL_SKILLS_END>
+<TREATMENT_RECOMMENDATIONS_START>
+- Implement word-finding strategies with semantic cuing focused on everyday objects and activities, using the patient's beach experience as a context (e.g., "sprinkles," "castles")
+- Practice structured narrative tasks with visual supports to reduce revisions and improve sequencing
+- Use sentence formulation exercises focusing on verb tense consistency (addressing errors like "forgetted" and "eat" for "ate")
+- Incorporate self-monitoring techniques to help identify and correct grammatical errors
+- Work on increasing vocabulary specificity (e.g., "things on top" to "sprinkles")
+<TREATMENT_RECOMMENDATIONS_END>
+<EXPLANATION_START>
+This child demonstrates moderate word-finding difficulties with compensatory strategies including fillers ("&-um") and repetitions. The frequent use of self-corrections shows good metalinguistic awareness, but the pauses and repairs impact conversational fluency. Syntactic errors primarily involve verb tense inconsistency. Overall, the pattern suggests a mild-to-moderate language disorder with stronger receptive than expressive skills.
+<EXPLANATION_END>
+<ADDITIONAL_ANALYSIS_START>
+The child shows relative strengths in maintaining topic coherence and conveying a complete narrative structure despite the language challenges. The pattern of errors suggests that word-finding difficulties and processing speed are primary concerns rather than conceptual or cognitive issues. Semantic network activities that strengthen word associations would likely be beneficial, particularly when paired with visual supports.
+<ADDITIONAL_ANALYSIS_END>
+<DIAGNOSTIC_IMPRESSIONS_START>
+Based on the language sample, this child presents with a profile consistent with a mild-to-moderate expressive language disorder. The most prominent features include:
+1. Word-finding difficulties characterized by fillers, pauses, and self-corrections when attempting to retrieve specific vocabulary
+2. Grammatical challenges primarily affecting verb tense consistency and morphological markers
+3. Relatively intact narrative structure and topic maintenance
+These findings suggest intervention should focus on word retrieval strategies, grammatical form practice, and continued support for narrative development, with an emphasis on fluency and self-monitoring.
+<DIAGNOSTIC_IMPRESSIONS_END>
+<ERROR_EXAMPLES_START>
+Word-finding difficulties:
+- "what do you call those &-um &-um sprinkles! that's the word"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+- "there was lots of &-um &-um swimming and &-um sun"
+Grammatical errors:
+- "after swimming we [//] I eat [: ate] [*] &-um ice cream"
+- "sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built"
+- "we saw [/] saw fishies [: fish] [*] swimming in the water"
+Repetitions and revisions:
+- "we [/] we stayed for &-um three no [//] four days"
+- "I want to go back to the beach [/] beach next year"
+- "sometimes I wonder [/] wonder where fishies [: fish] [*] go when it's cold"
+<ERROR_EXAMPLES_END>"""
+def parse_casl_response(response):
+    """Parse the LLM response for CASL analysis into structured data"""
+    # Extract speech factors section using section markers
+    speech_factors_section = ""
+    factors_pattern = re.compile(r"<SPEECH_FACTORS_START>(.*?)<SPEECH_FACTORS_END>", re.DOTALL)
+    factors_match = factors_pattern.search(response)
+    if factors_match:
+        speech_factors_section = factors_match.group(1).strip()
+    else:
+        speech_factors_section = "Error extracting speech factors from analysis."
+    # Extract CASL skills section
+    casl_section = ""
+    casl_pattern = re.compile(r"<CASL_SKILLS_START>(.*?)<CASL_SKILLS_END>", re.DOTALL)
+    casl_match = casl_pattern.search(response)
+    if casl_match:
+        casl_section = casl_match.group(1).strip()
+    else:
+        casl_section = "Error extracting CASL skills from analysis."
+    # Extract treatment recommendations
+    treatment_text = ""
+    treatment_pattern = re.compile(r"<TREATMENT_RECOMMENDATIONS_START>(.*?)<TREATMENT_RECOMMENDATIONS_END>", re.DOTALL)
+    treatment_match = treatment_pattern.search(response)
+    if treatment_match:
+        treatment_text = treatment_match.group(1).strip()
+    else:
+        treatment_text = "Error extracting treatment recommendations from analysis."
+    # Extract explanation section
+    explanation_text = ""
+    explanation_pattern = re.compile(r"<EXPLANATION_START>(.*?)<EXPLANATION_END>", re.DOTALL)
+    explanation_match = explanation_pattern.search(response)
+    if explanation_match:
+        explanation_text = explanation_match.group(1).strip()
+    else:
+        explanation_text = "Error extracting clinical explanation from analysis."
+    # Extract additional analysis
+    additional_analysis = ""
+    additional_pattern = re.compile(r"<ADDITIONAL_ANALYSIS_START>(.*?)<ADDITIONAL_ANALYSIS_END>", re.DOTALL)
+    additional_match = additional_pattern.search(response)
+    if additional_match:
+        additional_analysis = additional_match.group(1).strip()
+    # Extract diagnostic impressions
+    diagnostic_impressions = ""
+    diagnostic_pattern = re.compile(r"<DIAGNOSTIC_IMPRESSIONS_START>(.*?)<DIAGNOSTIC_IMPRESSIONS_END>", re.DOTALL)
+    diagnostic_match = diagnostic_pattern.search(response)
+    if diagnostic_match:
+        diagnostic_impressions = diagnostic_match.group(1).strip()
+    # Extract specific error examples
+    specific_errors_text = ""
+    errors_pattern = re.compile(r"<ERROR_EXAMPLES_START>(.*?)<ERROR_EXAMPLES_END>", re.DOTALL)
+    errors_match = errors_pattern.search(response)
+    if errors_match:
+        specific_errors_text = errors_match.group(1).strip()
+    # Create full report text
+    full_report = f"""
+## Speech Factors Analysis
+{speech_factors_section}
+## CASL Skills Assessment
+{casl_section}
+## Treatment Recommendations
+{treatment_text}
+## Clinical Explanation
+{explanation_text}
+"""
+    if additional_analysis:
+        full_report += f"\n## Additional Analysis\n\n{additional_analysis}"
+    if diagnostic_impressions:
+        full_report += f"\n## Diagnostic Impressions\n\n{diagnostic_impressions}"
+    if specific_errors_text:
+        full_report += f"\n## Detailed Error Examples\n\n{specific_errors_text}"
+    return {
+        'speech_factors': speech_factors_section,
+        'casl_data': casl_section,
+        'treatment_suggestions': treatment_text,
+        'explanation': explanation_text,
+        'additional_analysis': additional_analysis,
+        'diagnostic_impressions': diagnostic_impressions,
+        'specific_errors': specific_errors_text,
+        'full_report': full_report,
+        'raw_response': response
+    }
+def analyze_transcript(transcript, age, gender):
+    """Analyze a speech transcript using Claude"""
+    # CASL-2 assessment cheat sheet
+    cheat_sheet = """
+    # Speech-Language Pathologist Analysis Cheat Sheet
+    ## Types of Speech Patterns to Identify:
+    1. Difficulty producing fluent, grammatical speech
+       - Fillers (um, uh) and pauses
+       - False starts and revisions
+       - Incomplete sentences
+    2. Word retrieval issues
+       - Pauses before content words
+       - Circumlocutions (talking around a word)
+       - Word substitutions
+    3. Grammatical errors
+       - Verb tense inconsistencies
+       - Subject-verb agreement errors
+       - Morphological errors (plurals, possessives)
+    4. Repetitions and revisions
+       - Word or phrase repetitions [/]
+       - Self-corrections [//]
+       - Retracing
+    5. Neologisms
+       - Made-up words
+       - Word blends
+    6. Perseveration
+       - Inappropriate repetition of ideas
+       - Recurring themes
+    7. Comprehension issues
+       - Topic maintenance difficulties
+       - Non-sequiturs
+       - Inappropriate responses
+    """
+    # Instructions for the analysis
+    instructions = """
+    Analyze this speech transcript to identify specific patterns and provide a detailed CASL-2 (Comprehensive Assessment of Spoken Language) assessment.
+    For each speech pattern you identify:
+    1. Count the occurrences in the transcript
+    2. Estimate a percentile (how typical/atypical this is for the age)
+    3. Provide DIRECT QUOTES from the transcript as evidence
+    Then assess the following CASL-2 domains:
+    1. Lexical/Semantic Skills:
+       - Assess vocabulary diversity, word-finding abilities, semantic precision
+       - Provide Standard Score (mean=100, SD=15), percentile rank, and performance level
+       - Include SPECIFIC QUOTES as evidence
+    2. Syntactic Skills:
+       - Evaluate grammatical accuracy, sentence complexity, morphological skills
+       - Provide Standard Score, percentile rank, and performance level
+       - Include SPECIFIC QUOTES as evidence
+    3. Supralinguistic Skills:
+       - Assess figurative language use, inferencing, and abstract reasoning
+       - Provide Standard Score, percentile rank, and performance level
+       - Include SPECIFIC QUOTES as evidence
+    YOUR RESPONSE MUST USE THESE EXACT SECTION MARKERS FOR PARSING:
+    <SPEECH_FACTORS_START>
+    Difficulty producing fluent speech: (occurrences), (percentile)
+    Examples:
+    - "(direct quote from transcript)"
+    - "(direct quote from transcript)"
+    Word retrieval issues: (occurrences), (percentile)
+    Examples:
+    - "(direct quote from transcript)"
+    - "(direct quote from transcript)"
+    (And so on for each factor)
+    <SPEECH_FACTORS_END>
+    <CASL_SKILLS_START>
+    Lexical/Semantic Skills: Standard Score (X), Percentile Rank (X%), Performance Level
+    Examples:
+    - "(direct quote showing strength or weakness)"
+    - "(direct quote showing strength or weakness)"
+    Syntactic Skills: Standard Score (X), Percentile Rank (X%), Performance Level
+    Examples:
+    - "(direct quote showing strength or weakness)"
+    - "(direct quote showing strength or weakness)"
+    Supralinguistic Skills: Standard Score (X), Percentile Rank (X%), Performance Level
+    Examples:
+    - "(direct quote showing strength or weakness)"
+    - "(direct quote showing strength or weakness)"
+    <CASL_SKILLS_END>
+    <TREATMENT_RECOMMENDATIONS_START>
+    - (treatment recommendation)
+    - (treatment recommendation)
+    - (treatment recommendation)
+    <TREATMENT_RECOMMENDATIONS_END>
+    <EXPLANATION_START>
+    (brief diagnostic rationale based on findings)
+    <EXPLANATION_END>
+    <ADDITIONAL_ANALYSIS_START>
+    (specific insights that would be helpful for treatment planning)
+    <ADDITIONAL_ANALYSIS_END>
+    <DIAGNOSTIC_IMPRESSIONS_START>
+    (summarize findings across domains using specific examples and clear explanations)
+    <DIAGNOSTIC_IMPRESSIONS_END>
+    <ERROR_EXAMPLES_START>
+    (Copy all the specific quote examples here again, organized by error type or skill domain)
+    <ERROR_EXAMPLES_END>
+    MOST IMPORTANT:
+    1. Use EXACTLY the section markers provided (like <SPEECH_FACTORS_START>) to make parsing reliable
+    2. For EVERY factor and domain you analyze, you MUST provide direct quotes from the transcript as evidence
+    3. Be very specific and cite the exact text
+    4. Do not omit any of the required sections
+    """
+    # Prepare prompt for Claude with the user's role context
+    role_context = """
+    You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, swallowing, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
+    You are working with a student with speech impediments.
+    The most important thing is that you stay kind to the child. Be constructive and helpful rather than critical.
+    """
+    prompt = f"""
+    {role_context}
+    You are analyzing a transcript for a patient who is {age} years old and {gender}.
+    TRANSCRIPT:
+    {transcript}
+    {cheat_sheet}
+    {instructions}
+    Remember to be precise but compassionate in your analysis. Use direct quotes from the transcript for every factor and domain you analyze.
+    """
+    # Call the appropriate API or fallback to demo mode
+    if bedrock_client:
+        response = call_bedrock(prompt)
+    else:
+        response = generate_demo_response(prompt)
+    # Parse the response
+    results = parse_casl_response(response)
+    return results
+def export_pdf(results, patient_name="", record_id="", age="", gender="", assessment_date="", clinician=""):
+    """Export analysis results to a PDF report"""
+    global DOWNLOADS_DIR
+    # Check if ReportLab is available
+    if not REPORTLAB_AVAILABLE:
+        return "ERROR: PDF export is not available - ReportLab library is not installed. Please run 'pip install reportlab'."
+    try:
+        # Generate a safe filename
+        if patient_name:
+            safe_name = f"{patient_name.replace(' ', '_')}"
+        else:
+            safe_name = f"speech_analysis_{datetime.now().strftime('%Y%m%d%H%M%S')}"
+        # Make sure the downloads directory exists
+        try:
+            os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        except Exception as e:
+            logger.warning(f"Could not access downloads directory: {str(e)}")
+            # Fallback to temp directory
+            DOWNLOADS_DIR = os.path.join(os.path.expanduser("~"), "casl_downloads")
+            os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        # Create the PDF path in our downloads directory
+        pdf_path = os.path.join(DOWNLOADS_DIR, f"{safe_name}.pdf")
+        # Create the PDF document
+        doc = SimpleDocTemplate(pdf_path, pagesize=letter)
+        styles = getSampleStyleSheet()
+        # Create enhanced custom styles
+        styles.add(ParagraphStyle(
+            name='Heading1',
+            parent=styles['Heading1'],
+            fontSize=16,
+            spaceAfter=12,
+            textColor=colors.navy
+        ))
+        styles.add(ParagraphStyle(
+            name='Heading2',
+            parent=styles['Heading2'],
+            fontSize=14,
+            spaceAfter=10,
+            spaceBefore=10,
+            textColor=colors.darkblue
+        ))
+        styles.add(ParagraphStyle(
+            name='Heading3',
+            parent=styles['Heading2'],
+            fontSize=12,
+            spaceAfter=8,
+            spaceBefore=8,
+            textColor=colors.darkblue
+        ))
+        styles.add(ParagraphStyle(
+            name='BodyText',
+            parent=styles['BodyText'],
+            fontSize=11,
+            spaceAfter=8,
+            leading=14
+        ))
+        styles.add(ParagraphStyle(
+            name='BulletPoint',
+            parent=styles['BodyText'],
+            fontSize=11,
+            leftIndent=20,
+            firstLineIndent=-15,
+            spaceAfter=4,
+            leading=14
+        ))
+        # Convert markdown to PDF elements
+        story = []
+        # Add title and date
+        story.append(Paragraph("Speech Language Assessment Report", styles['Title']))
+        story.append(Spacer(1, 12))
+        # Add patient information table
+        if patient_name or record_id or age or gender:
+            # Prepare patient info data
+            data = []
+            if patient_name:
+                data.append(["Patient Name:", patient_name])
+            if record_id:
+                data.append(["Record ID:", record_id])
+            if age:
+                data.append(["Age:", f"{age} years"])
+            if gender:
+                data.append(["Gender:", gender])
+            if assessment_date:
+                data.append(["Assessment Date:", assessment_date])
+            if clinician:
+                data.append(["Clinician:", clinician])
+            if data:
+                # Create a table with the data
+                patient_table = Table(data, colWidths=[120, 350])
+                patient_table.setStyle(TableStyle([
+                    ('BACKGROUND', (0, 0), (0, -1), colors.lightgrey),
+                    ('TEXTCOLOR', (0, 0), (0, -1), colors.darkblue),
+                    ('ALIGN', (0, 0), (0, -1), 'RIGHT'),
+                    ('ALIGN', (1, 0), (1, -1), 'LEFT'),
+                    ('FONTNAME', (0, 0), (0, -1), 'Helvetica-Bold'),
+                    ('BOTTOMPADDING', (0, 0), (-1, -1), 6),
+                    ('TOPPADDING', (0, 0), (-1, -1), 6),
+                    ('GRID', (0, 0), (-1, -1), 0.5, colors.lightgrey),
+                ]))
+                story.append(patient_table)
+                story.append(Spacer(1, 12))
+        # Add clinical analysis sections
+        story.append(Paragraph("Speech Factors Analysis", styles['Heading1']))
+        speech_factors_paragraphs = []
+        for line in results['speech_factors'].split('\n'):
+            line = line.strip()
+            if not line:
+                continue
+            if line.startswith('- '):
+                story.append(Paragraph(f"• {line[2:]}", styles['BulletPoint']))
+            else:
+                story.append(Paragraph(line, styles['BodyText']))
+        story.append(Spacer(1, 12))
+        story.append(Paragraph("CASL Skills Assessment", styles['Heading1']))
+        for line in results['casl_data'].split('\n'):
+            line = line.strip()
+            if not line:
+                continue
+            if line.startswith('- '):
+                story.append(Paragraph(f"• {line[2:]}", styles['BulletPoint']))
+            else:
+                story.append(Paragraph(line, styles['BodyText']))
+        story.append(Spacer(1, 12))
+        story.append(Paragraph("Treatment Recommendations", styles['Heading1']))
+        # Process treatment recommendations as bullet points
+        for line in results['treatment_suggestions'].split('\n'):
+            line = line.strip()
+            if not line:
+                continue
+            if line.startswith('- '):
+                story.append(Paragraph(f"• {line[2:]}", styles['BulletPoint']))
+            else:
+                story.append(Paragraph(line, styles['BodyText']))
+        story.append(Spacer(1, 12))
+        story.append(Paragraph("Clinical Explanation", styles['Heading1']))
+        story.append(Paragraph(results['explanation'], styles['BodyText']))
+        story.append(Spacer(1, 12))
+        if results['additional_analysis']:
+            story.append(Paragraph("Additional Analysis", styles['Heading1']))
+            story.append(Paragraph(results['additional_analysis'], styles['BodyText']))
+            story.append(Spacer(1, 12))
+        if results['diagnostic_impressions']:
+            story.append(Paragraph("Diagnostic Impressions", styles['Heading1']))
+            story.append(Paragraph(results['diagnostic_impressions'], styles['BodyText']))
+            story.append(Spacer(1, 12))
+        # Add footer with date
+        footer_text = f"Generated on: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}"
+        story.append(Spacer(1, 20))
+        story.append(Paragraph(footer_text, ParagraphStyle(
+            name='Footer',
+            parent=styles['Normal'],
+            fontSize=8,
+            textColor=colors.grey
+        )))
+        # Build the PDF
+        doc.build(story)
+        logger.info(f"Report saved as PDF: {pdf_path}")
+        return pdf_path
+    except Exception as e:
+        logger.exception("Error creating PDF")
+        return f"Error creating PDF: {str(e)}"
+def create_interface():
+    """Create the Gradio interface"""
+    # Set a theme compatible with Hugging Face Spaces
+    theme = gr.themes.Soft(
+        primary_hue="blue",
+        secondary_hue="indigo",
+    )
+    with gr.Blocks(title="CASL Analysis Tool", theme=theme) as app:
+        gr.Markdown("# CASL Analysis Tool")
+        gr.Markdown("A tool for analyzing speech transcripts and audio using the CASL framework")
+        with gr.Tabs() as main_tabs:
+            # Analysis Tab
+            with gr.TabItem("Analysis", id=0):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        # Patient info
+                        gr.Markdown("### Patient Information")
+                        patient_name = gr.Textbox(label="Patient Name", placeholder="Enter patient name")
+                        record_id = gr.Textbox(label="Record ID", placeholder="Enter record ID")
+                        with gr.Row():
+                            age = gr.Number(label="Age", value=8, minimum=1, maximum=120)
+                            gender = gr.Radio(["male", "female", "other"], label="Gender", value="male")
+                        assessment_date = gr.Textbox(
+                            label="Assessment Date",
+                            placeholder="MM/DD/YYYY",
+                            value=datetime.now().strftime('%m/%d/%Y')
+                        )
+                        clinician_name = gr.Textbox(label="Clinician", placeholder="Enter clinician name")
+                        # Transcript input
+                        gr.Markdown("### Transcript")
+                        sample_btn = gr.Button("Load Sample Transcript")
+                        file_upload = gr.File(label="Upload transcript file (.txt or .cha)")
+                        transcript = gr.Textbox(
+                            label="Speech transcript (CHAT format preferred)",
+                            placeholder="Enter transcript text or upload a file...",
+                            lines=10
+                        )
+                        # Analysis button
+                        analyze_btn = gr.Button("Analyze Transcript", variant="primary")
+                    with gr.Column(scale=1):
+                        # Results display
+                        with gr.Tabs() as results_tabs:
+                            with gr.TabItem("Summary", id=0):
+                                gr.Markdown("### Speech Factors Analysis")
+                                speech_factors_md = gr.Markdown()
+                                gr.Markdown("### CASL Skills Assessment")
+                                casl_results_md = gr.Markdown()
+                            with gr.TabItem("Treatment", id=1):
+                                gr.Markdown("### Treatment Recommendations")
+                                treatment_md = gr.Markdown()
+                                gr.Markdown("### Clinical Explanation")
+                                explanation_md = gr.Markdown()
+                            with gr.TabItem("Error Examples", id=2):
+                                specific_errors_md = gr.Markdown()
+                            with gr.TabItem("Full Report", id=3):
+                                full_analysis = gr.Markdown()
+                        # PDF export (only shown if ReportLab is available)
+                        export_status = gr.Markdown("")
+                        if REPORTLAB_AVAILABLE:
+                            export_btn = gr.Button("Export as PDF", variant="secondary")
+                        else:
+                            gr.Markdown("⚠️ PDF export is disabled - ReportLab library is not installed")
+            # Transcription Tab
+            with gr.TabItem("Transcription", id=1):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### Audio Transcription")
+                        gr.Markdown("Upload an audio recording to automatically transcribe it in CHAT format")
+                        # Patient's age helps with transcription accuracy
+                        transcription_age = gr.Number(label="Patient Age", value=8, minimum=1, maximum=120,
+                                                     info="For children under 10, special language models may be used")
+                        # Audio input
+                        audio_input = gr.Audio(type="filepath", label="Upload Audio Recording",
+                                              format="mp3,wav,ogg,webm",
+                                              elem_id="audio-input")
+                        # Transcribe button
+                        transcribe_btn = gr.Button("Transcribe Audio", variant="primary")
+                    with gr.Column(scale=1):
+                        # Transcription output
+                        transcription_output = gr.Textbox(
+                            label="Transcription Result",
+                            placeholder="Transcription will appear here...",
+                            lines=12
+                        )
+                        with gr.Row():
+                            # Button to use transcription in analysis
+                            copy_to_analysis_btn = gr.Button("Use for Analysis", variant="secondary")
+                        # Status/info message
+                        transcription_status = gr.Markdown("")
+        # Load sample transcript button
+        def load_sample():
+            return SAMPLE_TRANSCRIPT
+        sample_btn.click(load_sample, outputs=[transcript])
+        # File upload handler
+        file_upload.upload(process_upload, file_upload, transcript)
+        # Analysis button handler
+        def on_analyze_click(transcript_text, age_val, gender_val, patient_name_val, record_id_val, clinician_val, assessment_date_val):
+            if not transcript_text or len(transcript_text.strip()) < 50:
+                return "Error: Please provide a longer transcript for analysis.", "Error: Insufficient data", "Error: Insufficient data", "Error: Please provide a transcript of at least 50 characters for meaningful analysis.", "Error: Not enough transcript data for analysis.", "Error: No detailed error examples available for an empty transcript."
+            try:
+                # Get the analysis results
+                results = analyze_transcript(transcript_text, age_val, gender_val)
+                # Save patient record
+                patient_info = {
+                    "name": patient_name_val,
+                    "record_id": record_id_val,
+                    "age": age_val,
+                    "gender": gender_val,
+                    "assessment_date": assessment_date_val,
+                    "clinician": clinician_val
+                }
+                saved_id = save_patient_record(patient_info, results, transcript_text)
+                if saved_id:
+                    save_msg = f"✅ Patient record saved successfully. ID: {saved_id}"
+                else:
+                    save_msg = "⚠️ Could not save patient record. Check directory permissions."
+                # Return the results
+                return results['speech_factors'], results['casl_data'], results['treatment_suggestions'], results['explanation'], results['full_report'], save_msg, results['specific_errors']
+            except Exception as e:
+                logger.exception("Error during analysis")
+                return f"Error during analysis: {str(e)}", "Analysis failed", "Not available", f"Error: {str(e)}", f"Analysis error: {str(e)}", "", ""
+        analyze_btn.click(
+            on_analyze_click,
+            inputs=[
+                transcript, age, gender,
+                patient_name, record_id, clinician_name, assessment_date
+            ],
+            outputs=[
+                speech_factors_md,
+                casl_results_md,
+                treatment_md,
+                explanation_md,
+                full_analysis,
+                export_status,
+                specific_errors_md
+            ]
+        )
+        # PDF export function
+        def on_export_pdf(report_text, p_name, p_record_id, p_age, p_gender, p_date, p_clinician):
+            # Check if ReportLab is available
+            if not REPORTLAB_AVAILABLE:
+                return "ERROR: PDF export is not available because the ReportLab library is not installed. Please install it with 'pip install reportlab'."
+            if not report_text or len(report_text.strip()) < 50:
+                return "Error: Please run the analysis first before exporting to PDF."
+            try:
+                # Parse the report text back into sections
+                results = {
+                    'speech_factors': '',
+                    'casl_data': '',
+                    'treatment_suggestions': '',
+                    'explanation': '',
+                    'additional_analysis': '',
+                    'diagnostic_impressions': '',
+                    'specific_errors': '',
+                }
+                sections = report_text.split('##')
+                for section in sections:
+                    section = section.strip()
+                    if not section:
+                        continue
+                    title_content = section.split('\n', 1)
+                    if len(title_content) < 2:
+                        continue
+                    title = title_content[0].strip()
+                    content = title_content[1].strip()
+                    if "Speech Factors Analysis" in title:
+                        results['speech_factors'] = content
+                    elif "CASL Skills Assessment" in title:
+                        results['casl_data'] = content
+                    elif "Treatment Recommendations" in title:
+                        results['treatment_suggestions'] = content
+                    elif "Clinical Explanation" in title:
+                        results['explanation'] = content
+                    elif "Additional Analysis" in title:
+                        results['additional_analysis'] = content
+                    elif "Diagnostic Impressions" in title:
+                        results['diagnostic_impressions'] = content
+                    elif "Detailed Error Examples" in title:
+                        results['specific_errors'] = content
+                pdf_path = export_pdf(
+                    results,
+                    patient_name=p_name,
+                    record_id=p_record_id,
+                    age=p_age,
+                    gender=p_gender,
+                    assessment_date=p_date,
+                    clinician=p_clinician
+                )
+                # Check if the export was successful
+                if pdf_path.startswith("ERROR:"):
+                    return pdf_path
+                # Make it downloadable in Hugging Face Spaces
+                download_link = f'<a href="file={pdf_path}" download="{os.path.basename(pdf_path)}">Download PDF Report</a>'
+                return f"Report saved as PDF: {pdf_path}<br>{download_link}"
+            except Exception as e:
+                logger.exception("Error exporting to PDF")
+                return f"Error creating PDF: {str(e)}"
+        # Only set up the PDF export button if ReportLab is available
+        if REPORTLAB_AVAILABLE:
+            export_btn.click(
+                on_export_pdf,
+                inputs=[
+                    full_analysis,
+                    patient_name,
+                    record_id,
+                    age,
+                    gender,
+                    assessment_date,
+                    clinician_name
+                ],
+                outputs=[export_status]
+            )
+        # Transcription button handler
+        def on_transcribe_audio(audio_path, age_val):
+            try:
+                if not audio_path:
+                    return "Please upload an audio file to transcribe.", "Error: No audio file provided."
+                # Process the audio file with Amazon Transcribe
+                transcription = transcribe_audio(audio_path, age_val)
+                # Return status message based on whether it's a demo or real transcription
+                if not transcribe_client:
+                    status_msg = "⚠️ Demo mode: Using example transcription (AWS credentials not configured)"
+                else:
+                    status_msg = "✅ Transcription completed successfully"
+                return transcription, status_msg
+            except Exception as e:
+                logger.exception("Error transcribing audio")
+                return f"Error: {str(e)}", f"❌ Transcription failed: {str(e)}"
+        # Connect the transcribe button to its handler
+        transcribe_btn.click(
+            on_transcribe_audio,
+            inputs=[audio_input, transcription_age],
+            outputs=[transcription_output, transcription_status]
+        )
+        # Copy transcription to analysis tab
+        def copy_to_analysis(transcription):
+            return transcription, gr.update(selected=0)  # Switch to Analysis tab
+        copy_to_analysis_btn.click(
+            copy_to_analysis,
+            inputs=[transcription_output],
+            outputs=[transcript, main_tabs]
+        )
+    return app
+# Create requirements.txt file for HuggingFace Spaces
+def create_requirements_file():
+    requirements = [
+        "gradio>=4.0.0",
+        "pandas",
+        "numpy",
+        "matplotlib",
+        "Pillow",
+        "reportlab>=3.6.0",  # Required for PDF exports
+        "PyPDF2>=3.0.0",     # Required for PDF reading
+        "boto3>=1.28.0"      # Required for AWS services
+    ]
+    with open("requirements.txt", "w") as f:
+        for req in requirements:
+            f.write(f"{req}\n")
+if __name__ == "__main__":
+    # Create requirements.txt for HuggingFace Spaces
+    create_requirements_file()
+    # Check for AWS credentials
+    if not AWS_ACCESS_KEY or not AWS_SECRET_KEY:
+        print("NOTE: AWS credentials not found. The app will run in demo mode with simulated responses.")
+        print("To enable full functionality, set AWS_ACCESS_KEY and AWS_SECRET_KEY environment variables.")
+    app = create_interface()
+    app.launch(show_api=False)  # Disable API tab for security

reference_files/requirements_improved.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+gradio>=4.0.0
+pandas>=1.5.0
+numpy>=1.21.0
+matplotlib>=3.5.0
+seaborn>=0.11.0
+Pillow>=8.0.0
+reportlab>=3.6.0
+boto3>=1.28.0
+botocore>=1.31.0
+PyPDF2>=3.0.0
+speech_recognition>=3.10.0
+pydub>=0.25.0

reference_files/simple_app.py ADDED Viewed

	@@ -0,0 +1,1208 @@

+import gradio as gr
+import boto3
+import json
+import re
+import logging
+import os
+import tempfile
+import shutil
+import time
+import uuid
+from datetime import datetime
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Try to import ReportLab (needed for PDF generation)
+try:
+    from reportlab.lib.pagesizes import letter
+    from reportlab.lib import colors
+    from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
+    from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
+    REPORTLAB_AVAILABLE = True
+except ImportError:
+    logger.warning("ReportLab library not available - PDF export will be disabled")
+    REPORTLAB_AVAILABLE = False
+# AWS credentials for Bedrock API
+AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY", "")
+AWS_SECRET_KEY = os.getenv("AWS_SECRET_KEY", "")
+AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
+# Initialize AWS clients if credentials are available
+bedrock_client = None
+transcribe_client = None
+s3_client = None
+if AWS_ACCESS_KEY and AWS_SECRET_KEY:
+    try:
+        # Initialize Bedrock client for AI analysis
+        bedrock_client = boto3.client(
+            'bedrock-runtime',
+            aws_access_key_id=AWS_ACCESS_KEY,
+            aws_secret_access_key=AWS_SECRET_KEY,
+            region_name=AWS_REGION
+        )
+        logger.info("Bedrock client initialized successfully")
+        # Initialize Transcribe client for speech-to-text
+        transcribe_client = boto3.client(
+            'transcribe',
+            aws_access_key_id=AWS_ACCESS_KEY,
+            aws_secret_access_key=AWS_SECRET_KEY,
+            region_name=AWS_REGION
+        )
+        logger.info("Transcribe client initialized successfully")
+        # Initialize S3 client for storing audio files
+        s3_client = boto3.client(
+            's3',
+            aws_access_key_id=AWS_ACCESS_KEY,
+            aws_secret_access_key=AWS_SECRET_KEY,
+            region_name=AWS_REGION
+        )
+        logger.info("S3 client initialized successfully")
+    except Exception as e:
+        logger.error(f"Failed to initialize AWS clients: {str(e)}")
+# S3 bucket for storing audio files
+S3_BUCKET = os.environ.get("S3_BUCKET", "casl-audio-files")
+S3_PREFIX = "transcribe-audio/"
+# Create data directories if they don't exist
+DATA_DIR = os.environ.get("DATA_DIR", "patient_data")
+DOWNLOADS_DIR = os.path.join(DATA_DIR, "downloads")
+AUDIO_DIR = os.path.join(DATA_DIR, "audio")
+def ensure_data_dirs():
+    """Ensure data directories exist"""
+    global DOWNLOADS_DIR, AUDIO_DIR
+    try:
+        os.makedirs(DATA_DIR, exist_ok=True)
+        os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        os.makedirs(AUDIO_DIR, exist_ok=True)
+        logger.info(f"Data directories created: {DATA_DIR}, {DOWNLOADS_DIR}, {AUDIO_DIR}")
+    except Exception as e:
+        logger.warning(f"Could not create data directories: {str(e)}")
+        # Fallback to tmp directory on HF Spaces
+        DOWNLOADS_DIR = os.path.join(tempfile.gettempdir(), "casl_downloads")
+        AUDIO_DIR = os.path.join(tempfile.gettempdir(), "casl_audio")
+        os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        os.makedirs(AUDIO_DIR, exist_ok=True)
+        logger.info(f"Using fallback directories: {DOWNLOADS_DIR}, {AUDIO_DIR}")
+# Initialize data directories
+ensure_data_dirs()
+# Sample transcript for the demo
+SAMPLE_TRANSCRIPT = """*PAR: today I would &-um like to talk about &-um a fun trip I took last &-um summer with my family.
+*PAR: we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually.
+*PAR: there was lots of &-um &-um swimming and &-um sun.
+*PAR: we [/] we stayed for &-um three no [//] four days in a &-um hotel near the water [: ocean] [*].
+*PAR: my favorite part was &-um building &-um castles with sand.
+*PAR: sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built.
+*PAR: my brother he [//] he helped me dig a big hole.
+*PAR: we saw [/] saw fishies [: fish] [*] swimming in the water.
+*PAR: sometimes I wonder [/] wonder where fishies [: fish] [*] go when it's cold.
+*PAR: maybe they have [/] have houses under the water.
+*PAR: after swimming we [//] I eat [: ate] [*] &-um ice cream with &-um chocolate things on top.
+*PAR: what do you call those &-um &-um sprinkles! that's the word.
+*PAR: my mom said to &-um that I could have &-um two scoops next time.
+*PAR: I want to go back to the beach [/] beach next year."""
+def read_cha_file(file_path):
+    """Read and parse a .cha transcript file"""
+    try:
+        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+            content = f.read()
+        # Extract participant lines (starting with *PAR:)
+        par_lines = []
+        for line in content.splitlines():
+            if line.startswith('*PAR:'):
+                par_lines.append(line)
+        # If no PAR lines found, just return the whole content
+        if not par_lines:
+            return content
+        return '\n'.join(par_lines)
+    except Exception as e:
+        logger.error(f"Error reading CHA file: {str(e)}")
+        return ""
+def process_upload(file):
+    """Process an uploaded file (PDF, text, or CHA)"""
+    if file is None:
+        return ""
+    file_path = file.name
+    if file_path.endswith('.pdf'):
+        # For PDF, we would need PyPDF2 or similar
+        return "PDF upload not supported in this simple version"
+    elif file_path.endswith('.cha'):
+        return read_cha_file(file_path)
+    else:
+        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+            return f.read()
+def call_bedrock(prompt, max_tokens=4096):
+    """Call the AWS Bedrock API to analyze text using Claude"""
+    if not bedrock_client:
+        return "AWS credentials not configured. Using demo response instead."
+    try:
+        body = json.dumps({
+            "anthropic_version": "bedrock-2023-05-31",
+            "max_tokens": max_tokens,
+            "messages": [
+                {
+                    "role": "user",
+                    "content": prompt
+                }
+            ],
+            "temperature": 0.3,
+            "top_p": 0.9
+        })
+        modelId = 'anthropic.claude-3-sonnet-20240229-v1:0'
+        response = bedrock_client.invoke_model(
+            body=body,
+            modelId=modelId,
+            accept='application/json',
+            contentType='application/json'
+        )
+        response_body = json.loads(response.get('body').read())
+        return response_body['content'][0]['text']
+    except Exception as e:
+        logger.error(f"Error in call_bedrock: {str(e)}")
+        return f"Error: {str(e)}"
+def transcribe_audio(audio_path, patient_age=8):
+    """Transcribe an audio recording using Amazon Transcribe and format in CHAT format"""
+    if not os.path.exists(audio_path):
+        logger.error(f"Audio file not found: {audio_path}")
+        return "Error: Audio file not found."
+    if not transcribe_client or not s3_client:
+        logger.warning("AWS clients not initialized, using demo transcription")
+        return generate_demo_transcription()
+    try:
+        # Get file info
+        file_name = os.path.basename(audio_path)
+        file_size = os.path.getsize(audio_path)
+        _, file_extension = os.path.splitext(file_name)
+        # Check file format
+        supported_formats = ['.mp3', '.mp4', '.wav', '.flac', '.ogg', '.amr', '.webm']
+        if file_extension.lower() not in supported_formats:
+            logger.error(f"Unsupported audio format: {file_extension}")
+            return f"Error: Unsupported audio format. Please use one of: {', '.join(supported_formats)}"
+        # Generate a unique job name
+        timestamp = datetime.now().strftime('%Y%m%d%H%M%S')
+        job_name = f"casl-transcription-{timestamp}"
+        s3_key = f"{S3_PREFIX}{job_name}{file_extension}"
+        # Upload to S3
+        logger.info(f"Uploading {file_name} to S3 bucket {S3_BUCKET}")
+        try:
+            with open(audio_path, 'rb') as audio_file:
+                s3_client.upload_fileobj(audio_file, S3_BUCKET, s3_key)
+        except Exception as e:
+            logger.error(f"Failed to upload to S3: {str(e)}")
+            # If upload fails, try to create the bucket
+            try:
+                s3_client.create_bucket(Bucket=S3_BUCKET)
+                logger.info(f"Created S3 bucket: {S3_BUCKET}")
+                # Try upload again
+                with open(audio_path, 'rb') as audio_file:
+                    s3_client.upload_fileobj(audio_file, S3_BUCKET, s3_key)
+            except Exception as bucket_error:
+                logger.error(f"Failed to create bucket and upload: {str(bucket_error)}")
+                return "Error: Failed to upload audio file. Please check your AWS permissions."
+        # Start transcription job
+        logger.info(f"Starting transcription job: {job_name}")
+        media_format = file_extension.lower()[1:]  # Remove the dot
+        if media_format == 'webm':
+            media_format = 'webm'  # Amazon Transcribe expects this
+        # Determine language settings based on patient age
+        if patient_age < 10:
+            # For younger children, enabling child language model is helpful
+            language_options = {
+                'LanguageCode': 'en-US',
+                'Settings': {
+                    'LanguageModelName': 'ChildLanguage'
+                }
+            }
+        else:
+            language_options = {
+                'LanguageCode': 'en-US'
+            }
+        transcribe_client.start_transcription_job(
+            TranscriptionJobName=job_name,
+            Media={
+                'MediaFileUri': f"s3://{S3_BUCKET}/{s3_key}"
+            },
+            MediaFormat=media_format,
+            **language_options,
+            Settings={
+                'ShowSpeakerLabels': True,
+                'MaxSpeakerLabels': 2  # Typically patient + clinician
+            }
+        )
+        # Wait for the job to complete (with timeout)
+        logger.info("Waiting for transcription to complete...")
+        max_tries = 30  # 5 minutes max wait
+        tries = 0
+        while tries < max_tries:
+            try:
+                job = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)
+                status = job['TranscriptionJob']['TranscriptionJobStatus']
+                if status == 'COMPLETED':
+                    # Get the transcript
+                    transcript_uri = job['TranscriptionJob']['Transcript']['TranscriptFileUri']
+                    # Download the transcript
+                    import urllib.request
+                    import json
+                    with urllib.request.urlopen(transcript_uri) as response:
+                        transcript_json = json.loads(response.read().decode('utf-8'))
+                    # Convert to CHAT format
+                    chat_transcript = format_as_chat(transcript_json)
+                    return chat_transcript
+                elif status == 'FAILED':
+                    reason = job['TranscriptionJob'].get('FailureReason', 'Unknown failure')
+                    logger.error(f"Transcription job failed: {reason}")
+                    return f"Error: Transcription failed - {reason}"
+                # Still in progress, wait and try again
+                tries += 1
+                time.sleep(10)  # Check every 10 seconds
+            except Exception as e:
+                logger.error(f"Error checking transcription job: {str(e)}")
+                return f"Error getting transcription: {str(e)}"
+        # If we got here, we timed out
+        return "Error: Transcription timed out. The process is taking longer than expected."
+    except Exception as e:
+        logger.exception("Error in audio transcription")
+        return f"Error transcribing audio: {str(e)}"
+def format_as_chat(transcript_json):
+    """Format the Amazon Transcribe JSON result as CHAT format"""
+    try:
+        # Get transcript items
+        items = transcript_json['results']['items']
+        # Get speaker labels if available
+        speakers = {}
+        if 'speaker_labels' in transcript_json['results']:
+            speaker_segments = transcript_json['results']['speaker_labels']['segments']
+            # Map each item to its speaker
+            for segment in speaker_segments:
+                for item in segment['items']:
+                    start_time = item['start_time']
+                    speakers[start_time] = segment['speaker_label']
+        # Build transcript by combining words into utterances by speaker
+        current_speaker = None
+        current_utterance = []
+        utterances = []
+        for item in items:
+            # Skip non-pronunciation items (like punctuation)
+            if item['type'] != 'pronunciation':
+                continue
+            word = item['alternatives'][0]['content']
+            start_time = item.get('start_time')
+            # Determine speaker if available
+            speaker = speakers.get(start_time, 'spk_0')
+            # If speaker changed, start a new utterance
+            if speaker != current_speaker and current_utterance:
+                utterances.append((current_speaker, ' '.join(current_utterance)))
+                current_utterance = []
+            current_speaker = speaker
+            current_utterance.append(word)
+        # Add the last utterance
+        if current_utterance:
+            utterances.append((current_speaker, ' '.join(current_utterance)))
+        # Format as CHAT
+        chat_lines = []
+        for speaker, text in utterances:
+            # Map speakers to CHAT format
+            # Assuming spk_0 is the patient (PAR) and spk_1 is the clinician (INV)
+            chat_speaker = "*PAR:" if speaker == "spk_0" else "*INV:"
+            chat_lines.append(f"{chat_speaker} {text}.")
+        return '\n'.join(chat_lines)
+    except Exception as e:
+        logger.exception("Error formatting transcript")
+        return "*PAR: (Error formatting transcript)"
+def generate_demo_transcription():
+    """Generate a simulated transcription response"""
+    return """*PAR: today I want to tell you about my favorite toy.
+*PAR: it's a &-um teddy bear that I got for my birthday.
+*PAR: he has &-um brown fur and a red bow.
+*PAR: I like to sleep with him every night.
+*PAR: sometimes I take him to school in my backpack.
+*INV: what's your teddy bear's name?
+*PAR: his name is &-um Brownie because he's brown."""
+def generate_demo_response(prompt):
+    """Generate a response using Bedrock if available, otherwise return a demo response"""
+    # This function will attempt to call Bedrock, and only fall back to the demo response
+    # if Bedrock is not available or fails
+    # Try to call Bedrock first if client is available
+    if bedrock_client:
+        try:
+            return call_bedrock(prompt)
+        except Exception as e:
+            logger.error(f"Error calling Bedrock: {str(e)}")
+            logger.info("Falling back to demo response")
+            # Continue to fallback response if Bedrock call fails
+    # Fallback demo response
+    logger.warning("Using demo response - Bedrock client not available or call failed")
+    return """<SPEECH_FACTORS_START>
+Difficulty producing fluent speech: 8, 65
+Examples:
+- "today I would &-um like to talk about &-um a fun trip I took last &-um summer with my family"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+Word retrieval issues: 6, 72
+Examples:
+- "what do you call those &-um &-um sprinkles! that's the word"
+- "sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built"
+Grammatical errors: 4, 58
+Examples:
+- "after swimming we [//] I eat [: ate] [*] &-um ice cream"
+- "sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built"
+Repetitions and revisions: 5, 62
+Examples:
+- "we [/] we stayed for &-um three no [//] four days"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+<SPEECH_FACTORS_END>
+<CASL_SKILLS_START>
+Lexical/Semantic Skills: Standard Score (92), Percentile Rank (30%), Average Performance
+Examples:
+- "what do you call those &-um &-um sprinkles! that's the word"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+Syntactic Skills: Standard Score (87), Percentile Rank (19%), Low Average Performance
+Examples:
+- "my brother he [//] he helped me dig a big hole"
+- "after swimming we [//] I eat [: ate] [*] &-um ice cream with &-um chocolate things on top"
+Supralinguistic Skills: Standard Score (90), Percentile Rank (25%), Average Performance
+Examples:
+- "sometimes I wonder [/] wonder where fishies [: fish] [*] go when it's cold"
+- "maybe they have [/] have houses under the water"
+<CASL_SKILLS_END>
+<TREATMENT_RECOMMENDATIONS_START>
+- Implement word-finding strategies with semantic cuing focused on everyday objects and activities, using the patient's beach experience as a context (e.g., "sprinkles," "castles")
+- Practice structured narrative tasks with visual supports to reduce revisions and improve sequencing
+- Use sentence formulation exercises focusing on verb tense consistency (addressing errors like "forgetted" and "eat" for "ate")
+- Incorporate self-monitoring techniques to help identify and correct grammatical errors
+- Work on increasing vocabulary specificity (e.g., "things on top" to "sprinkles")
+<TREATMENT_RECOMMENDATIONS_END>
+<EXPLANATION_START>
+This child demonstrates moderate word-finding difficulties with compensatory strategies including fillers ("&-um") and repetitions. The frequent use of self-corrections shows good metalinguistic awareness, but the pauses and repairs impact conversational fluency. Syntactic errors primarily involve verb tense inconsistency. Overall, the pattern suggests a mild-to-moderate language disorder with stronger receptive than expressive skills.
+<EXPLANATION_END>
+<ADDITIONAL_ANALYSIS_START>
+The child shows relative strengths in maintaining topic coherence and conveying a complete narrative structure despite the language challenges. The pattern of errors suggests that word-finding difficulties and processing speed are primary concerns rather than conceptual or cognitive issues. Semantic network activities that strengthen word associations would likely be beneficial, particularly when paired with visual supports.
+<ADDITIONAL_ANALYSIS_END>
+<DIAGNOSTIC_IMPRESSIONS_START>
+Based on the language sample, this child presents with a profile consistent with a mild-to-moderate expressive language disorder. The most prominent features include:
+1. Word-finding difficulties characterized by fillers, pauses, and self-corrections when attempting to retrieve specific vocabulary
+2. Grammatical challenges primarily affecting verb tense consistency and morphological markers
+3. Relatively intact narrative structure and topic maintenance
+These findings suggest intervention should focus on word retrieval strategies, grammatical form practice, and continued support for narrative development, with an emphasis on fluency and self-monitoring.
+<DIAGNOSTIC_IMPRESSIONS_END>
+<ERROR_EXAMPLES_START>
+Word-finding difficulties:
+- "what do you call those &-um &-um sprinkles! that's the word"
+- "we went to the &-um &-um beach [//] no to the mountains [//] I mean the beach actually"
+- "there was lots of &-um &-um swimming and &-um sun"
+Grammatical errors:
+- "after swimming we [//] I eat [: ate] [*] &-um ice cream"
+- "sometimes I forget [//] forgetted [: forgot] [*] what they call those things we built"
+- "we saw [/] saw fishies [: fish] [*] swimming in the water"
+Repetitions and revisions:
+- "we [/] we stayed for &-um three no [//] four days"
+- "I want to go back to the beach [/] beach next year"
+- "sometimes I wonder [/] wonder where fishies [: fish] [*] go when it's cold"
+<ERROR_EXAMPLES_END>"""
+def parse_casl_response(response):
+    """Parse the LLM response for CASL analysis into structured data"""
+    # Extract speech factors section using section markers
+    speech_factors_section = ""
+    factors_pattern = re.compile(r"<SPEECH_FACTORS_START>(.*?)<SPEECH_FACTORS_END>", re.DOTALL)
+    factors_match = factors_pattern.search(response)
+    if factors_match:
+        speech_factors_section = factors_match.group(1).strip()
+    else:
+        speech_factors_section = "Error extracting speech factors from analysis."
+    # Extract CASL skills section
+    casl_section = ""
+    casl_pattern = re.compile(r"<CASL_SKILLS_START>(.*?)<CASL_SKILLS_END>", re.DOTALL)
+    casl_match = casl_pattern.search(response)
+    if casl_match:
+        casl_section = casl_match.group(1).strip()
+    else:
+        casl_section = "Error extracting CASL skills from analysis."
+    # Extract treatment recommendations
+    treatment_text = ""
+    treatment_pattern = re.compile(r"<TREATMENT_RECOMMENDATIONS_START>(.*?)<TREATMENT_RECOMMENDATIONS_END>", re.DOTALL)
+    treatment_match = treatment_pattern.search(response)
+    if treatment_match:
+        treatment_text = treatment_match.group(1).strip()
+    else:
+        treatment_text = "Error extracting treatment recommendations from analysis."
+    # Extract explanation section
+    explanation_text = ""
+    explanation_pattern = re.compile(r"<EXPLANATION_START>(.*?)<EXPLANATION_END>", re.DOTALL)
+    explanation_match = explanation_pattern.search(response)
+    if explanation_match:
+        explanation_text = explanation_match.group(1).strip()
+    else:
+        explanation_text = "Error extracting clinical explanation from analysis."
+    # Extract additional analysis
+    additional_analysis = ""
+    additional_pattern = re.compile(r"<ADDITIONAL_ANALYSIS_START>(.*?)<ADDITIONAL_ANALYSIS_END>", re.DOTALL)
+    additional_match = additional_pattern.search(response)
+    if additional_match:
+        additional_analysis = additional_match.group(1).strip()
+    # Extract diagnostic impressions
+    diagnostic_impressions = ""
+    diagnostic_pattern = re.compile(r"<DIAGNOSTIC_IMPRESSIONS_START>(.*?)<DIAGNOSTIC_IMPRESSIONS_END>", re.DOTALL)
+    diagnostic_match = diagnostic_pattern.search(response)
+    if diagnostic_match:
+        diagnostic_impressions = diagnostic_match.group(1).strip()
+    # Extract specific error examples
+    specific_errors_text = ""
+    errors_pattern = re.compile(r"<ERROR_EXAMPLES_START>(.*?)<ERROR_EXAMPLES_END>", re.DOTALL)
+    errors_match = errors_pattern.search(response)
+    if errors_match:
+        specific_errors_text = errors_match.group(1).strip()
+    # Create full report text
+    full_report = f"""
+## Speech Factors Analysis
+{speech_factors_section}
+## CASL Skills Assessment
+{casl_section}
+## Treatment Recommendations
+{treatment_text}
+## Clinical Explanation
+{explanation_text}
+"""
+    if additional_analysis:
+        full_report += f"\n## Additional Analysis\n\n{additional_analysis}"
+    if diagnostic_impressions:
+        full_report += f"\n## Diagnostic Impressions\n\n{diagnostic_impressions}"
+    if specific_errors_text:
+        full_report += f"\n## Detailed Error Examples\n\n{specific_errors_text}"
+    return {
+        'speech_factors': speech_factors_section,
+        'casl_data': casl_section,
+        'treatment_suggestions': treatment_text,
+        'explanation': explanation_text,
+        'additional_analysis': additional_analysis,
+        'diagnostic_impressions': diagnostic_impressions,
+        'specific_errors': specific_errors_text,
+        'full_report': full_report,
+        'raw_response': response
+    }
+def analyze_transcript(transcript, age, gender):
+    """Analyze a speech transcript using Claude"""
+    # CASL-2 assessment cheat sheet
+    cheat_sheet = """
+    # Speech-Language Pathologist Analysis Cheat Sheet
+    ## Types of Speech Patterns to Identify:
+    1. Difficulty producing fluent, grammatical speech
+       - Fillers (um, uh) and pauses
+       - False starts and revisions
+       - Incomplete sentences
+    2. Word retrieval issues
+       - Pauses before content words
+       - Circumlocutions (talking around a word)
+       - Word substitutions
+    3. Grammatical errors
+       - Verb tense inconsistencies
+       - Subject-verb agreement errors
+       - Morphological errors (plurals, possessives)
+    4. Repetitions and revisions
+       - Word or phrase repetitions [/]
+       - Self-corrections [//]
+       - Retracing
+    5. Neologisms
+       - Made-up words
+       - Word blends
+    6. Perseveration
+       - Inappropriate repetition of ideas
+       - Recurring themes
+    7. Comprehension issues
+       - Topic maintenance difficulties
+       - Non-sequiturs
+       - Inappropriate responses
+    """
+    # Instructions for the analysis
+    instructions = """
+    Analyze this speech transcript to identify specific patterns and provide a detailed CASL-2 (Comprehensive Assessment of Spoken Language) assessment.
+    For each speech pattern you identify:
+    1. Count the occurrences in the transcript
+    2. Estimate a percentile (how typical/atypical this is for the age)
+    3. Provide DIRECT QUOTES from the transcript as evidence
+    Then assess the following CASL-2 domains:
+    1. Lexical/Semantic Skills:
+       - Assess vocabulary diversity, word-finding abilities, semantic precision
+       - Provide Standard Score (mean=100, SD=15), percentile rank, and performance level
+       - Include SPECIFIC QUOTES as evidence
+    2. Syntactic Skills:
+       - Evaluate grammatical accuracy, sentence complexity, morphological skills
+       - Provide Standard Score, percentile rank, and performance level
+       - Include SPECIFIC QUOTES as evidence
+    3. Supralinguistic Skills:
+       - Assess figurative language use, inferencing, and abstract reasoning
+       - Provide Standard Score, percentile rank, and performance level
+       - Include SPECIFIC QUOTES as evidence
+    YOUR RESPONSE MUST USE THESE EXACT SECTION MARKERS FOR PARSING:
+    <SPEECH_FACTORS_START>
+    Difficulty producing fluent, grammatical speech: (occurrences), (percentile)
+    Examples:
+    - "(direct quote from transcript)"
+    - "(direct quote from transcript)"
+    Word retrieval issues: (occurrences), (percentile)
+    Examples:
+    - "(direct quote from transcript)"
+    - "(direct quote from transcript)"
+    (And so on for each factor)
+    <SPEECH_FACTORS_END>
+    <CASL_SKILLS_START>
+    Lexical/Semantic Skills: Standard Score (X), Percentile Rank (X%), Performance Level
+    Examples:
+    - "(direct quote showing strength or weakness)"
+    - "(direct quote showing strength or weakness)"
+    Syntactic Skills: Standard Score (X), Percentile Rank (X%), Performance Level
+    Examples:
+    - "(direct quote showing strength or weakness)"
+    - "(direct quote showing strength or weakness)"
+    Supralinguistic Skills: Standard Score (X), Percentile Rank (X%), Performance Level
+    Examples:
+    - "(direct quote showing strength or weakness)"
+    - "(direct quote showing strength or weakness)"
+    <CASL_SKILLS_END>
+    <TREATMENT_RECOMMENDATIONS_START>
+    - (treatment recommendation)
+    - (treatment recommendation)
+    - (treatment recommendation)
+    <TREATMENT_RECOMMENDATIONS_END>
+    <EXPLANATION_START>
+    (brief diagnostic rationale based on findings)
+    <EXPLANATION_END>
+    <ADDITIONAL_ANALYSIS_START>
+    (specific insights that would be helpful for treatment planning)
+    <ADDITIONAL_ANALYSIS_END>
+    <DIAGNOSTIC_IMPRESSIONS_START>
+    (summarize findings across domains using specific examples and clear explanations)
+    <DIAGNOSTIC_IMPRESSIONS_END>
+    <ERROR_EXAMPLES_START>
+    (Copy all the specific quote examples here again, organized by error type or skill domain)
+    <ERROR_EXAMPLES_END>
+    MOST IMPORTANT:
+    1. Use EXACTLY the section markers provided (like <SPEECH_FACTORS_START>) to make parsing reliable
+    2. For EVERY factor and domain you analyze, you MUST provide direct quotes from the transcript as evidence
+    3. Be very specific and cite the exact text
+    4. Do not omit any of the required sections
+    """
+    # Prepare prompt for Claude with the user's role context
+    role_context = """
+    You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, swallowing, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
+    You are working with a student with speech impediments.
+    The most important thing is that you stay kind to the child. Be constructive and helpful rather than critical.
+    """
+    prompt = f"""
+    {role_context}
+    You are analyzing a transcript for a patient who is {age} years old and {gender}.
+    TRANSCRIPT:
+    {transcript}
+    {cheat_sheet}
+    {instructions}
+    Remember to be precise but compassionate in your analysis. Use direct quotes from the transcript for every factor and domain you analyze.
+    """
+    # Call the appropriate API or fallback to demo mode
+    if bedrock_client:
+        response = call_bedrock(prompt)
+    else:
+        response = generate_demo_response(prompt)
+    # Parse the response
+    results = parse_casl_response(response)
+    return results
+def export_pdf(results, patient_name="", record_id="", age="", gender="", assessment_date="", clinician=""):
+    """Export analysis results to a PDF report"""
+    global DOWNLOADS_DIR
+    # Check if ReportLab is available
+    if not REPORTLAB_AVAILABLE:
+        return "ERROR: PDF export is not available - ReportLab library is not installed. Please run 'pip install reportlab'."
+    try:
+        # Generate a safe filename
+        if patient_name:
+            safe_name = f"{patient_name.replace(' ', '_')}"
+        else:
+            safe_name = f"speech_analysis_{datetime.now().strftime('%Y%m%d%H%M%S')}"
+        # Make sure the downloads directory exists
+        try:
+            os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        except Exception as e:
+            logger.warning(f"Could not access downloads directory: {str(e)}")
+            # Fallback to temp directory
+            DOWNLOADS_DIR = os.path.join(tempfile.gettempdir(), "casl_downloads")
+            os.makedirs(DOWNLOADS_DIR, exist_ok=True)
+        # Create the PDF path in our downloads directory
+        pdf_path = os.path.join(DOWNLOADS_DIR, f"{safe_name}.pdf")
+        # Create the PDF document
+        doc = SimpleDocTemplate(pdf_path, pagesize=letter)
+        styles = getSampleStyleSheet()
+        # Create enhanced custom styles
+        styles.add(ParagraphStyle(
+            name='Heading1',
+            parent=styles['Heading1'],
+            fontSize=16,
+            spaceAfter=12,
+            textColor=colors.navy
+        ))
+        styles.add(ParagraphStyle(
+            name='Heading2',
+            parent=styles['Heading2'],
+            fontSize=14,
+            spaceAfter=10,
+            spaceBefore=10,
+            textColor=colors.darkblue
+        ))
+        styles.add(ParagraphStyle(
+            name='Heading3',
+            parent=styles['Heading2'],
+            fontSize=12,
+            spaceAfter=8,
+            spaceBefore=8,
+            textColor=colors.darkblue
+        ))
+        styles.add(ParagraphStyle(
+            name='BodyText',
+            parent=styles['BodyText'],
+            fontSize=11,
+            spaceAfter=8,
+            leading=14
+        ))
+        styles.add(ParagraphStyle(
+            name='BulletPoint',
+            parent=styles['BodyText'],
+            fontSize=11,
+            leftIndent=20,
+            firstLineIndent=-15,
+            spaceAfter=4,
+            leading=14
+        ))
+        # Convert markdown to PDF elements
+        story = []
+        # Add title and date
+        story.append(Paragraph("Speech Language Assessment Report", styles['Title']))
+        story.append(Spacer(1, 12))
+        # Add patient information table
+        if patient_name or record_id or age or gender:
+            # Prepare patient info data
+            data = []
+            if patient_name:
+                data.append(["Patient Name:", patient_name])
+            if record_id:
+                data.append(["Record ID:", record_id])
+            if age:
+                data.append(["Age:", f"{age} years"])
+            if gender:
+                data.append(["Gender:", gender])
+            if assessment_date:
+                data.append(["Assessment Date:", assessment_date])
+            if clinician:
+                data.append(["Clinician:", clinician])
+            if data:
+                # Create a table with the data
+                patient_table = Table(data, colWidths=[120, 350])
+                patient_table.setStyle(TableStyle([
+                    ('BACKGROUND', (0, 0), (0, -1), colors.lightgrey),
+                    ('TEXTCOLOR', (0, 0), (0, -1), colors.darkblue),
+                    ('ALIGN', (0, 0), (0, -1), 'RIGHT'),
+                    ('ALIGN', (1, 0), (1, -1), 'LEFT'),
+                    ('FONTNAME', (0, 0), (0, -1), 'Helvetica-Bold'),
+                    ('BOTTOMPADDING', (0, 0), (-1, -1), 6),
+                    ('TOPPADDING', (0, 0), (-1, -1), 6),
+                    ('GRID', (0, 0), (-1, -1), 0.5, colors.lightgrey),
+                ]))
+                story.append(patient_table)
+                story.append(Spacer(1, 12))
+        # Add clinical analysis sections
+        story.append(Paragraph("Speech Factors Analysis", styles['Heading1']))
+        speech_factors_paragraphs = []
+        for line in results['speech_factors'].split('\n'):
+            line = line.strip()
+            if not line:
+                continue
+            if line.startswith('- '):
+                story.append(Paragraph(f"• {line[2:]}", styles['BulletPoint']))
+            else:
+                story.append(Paragraph(line, styles['BodyText']))
+        story.append(Spacer(1, 12))
+        story.append(Paragraph("CASL Skills Assessment", styles['Heading1']))
+        for line in results['casl_data'].split('\n'):
+            line = line.strip()
+            if not line:
+                continue
+            if line.startswith('- '):
+                story.append(Paragraph(f"• {line[2:]}", styles['BulletPoint']))
+            else:
+                story.append(Paragraph(line, styles['BodyText']))
+        story.append(Spacer(1, 12))
+        story.append(Paragraph("Treatment Recommendations", styles['Heading1']))
+        # Process treatment recommendations as bullet points
+        for line in results['treatment_suggestions'].split('\n'):
+            line = line.strip()
+            if not line:
+                continue
+            if line.startswith('- '):
+                story.append(Paragraph(f"• {line[2:]}", styles['BulletPoint']))
+            else:
+                story.append(Paragraph(line, styles['BodyText']))
+        story.append(Spacer(1, 12))
+        story.append(Paragraph("Clinical Explanation", styles['Heading1']))
+        story.append(Paragraph(results['explanation'], styles['BodyText']))
+        story.append(Spacer(1, 12))
+        if results['additional_analysis']:
+            story.append(Paragraph("Additional Analysis", styles['Heading1']))
+            story.append(Paragraph(results['additional_analysis'], styles['BodyText']))
+            story.append(Spacer(1, 12))
+        if results['diagnostic_impressions']:
+            story.append(Paragraph("Diagnostic Impressions", styles['Heading1']))
+            story.append(Paragraph(results['diagnostic_impressions'], styles['BodyText']))
+            story.append(Spacer(1, 12))
+        # Add footer with date
+        footer_text = f"Generated on: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}"
+        story.append(Spacer(1, 20))
+        story.append(Paragraph(footer_text, ParagraphStyle(
+            name='Footer',
+            parent=styles['Normal'],
+            fontSize=8,
+            textColor=colors.grey
+        )))
+        # Build the PDF
+        doc.build(story)
+        logger.info(f"Report saved as PDF: {pdf_path}")
+        return pdf_path
+    except Exception as e:
+        logger.exception("Error creating PDF")
+        return f"Error creating PDF: {str(e)}"
+def create_interface():
+    """Create the Gradio interface"""
+    # Set a theme compatible with Hugging Face Spaces
+    theme = gr.themes.Soft(
+        primary_hue="blue",
+        secondary_hue="indigo",
+    )
+    with gr.Blocks(title="Simple CASL Analysis Tool", theme=theme) as app:
+        gr.Markdown("# CASL Analysis Tool")
+        gr.Markdown("A simplified tool for analyzing speech transcripts and audio using CASL framework")
+        with gr.Tabs() as main_tabs:
+            # Analysis Tab
+            with gr.TabItem("Analysis", id=0):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        # Patient info
+                        gr.Markdown("### Patient Information")
+                        patient_name = gr.Textbox(label="Patient Name", placeholder="Enter patient name")
+                        record_id = gr.Textbox(label="Record ID", placeholder="Enter record ID")
+                        with gr.Row():
+                            age = gr.Number(label="Age", value=8, minimum=1, maximum=120)
+                            gender = gr.Radio(["male", "female", "other"], label="Gender", value="male")
+                        assessment_date = gr.Textbox(
+                            label="Assessment Date",
+                            placeholder="MM/DD/YYYY",
+                            value=datetime.now().strftime('%m/%d/%Y')
+                        )
+                        clinician_name = gr.Textbox(label="Clinician", placeholder="Enter clinician name")
+                        # Transcript input
+                        gr.Markdown("### Transcript")
+                        sample_btn = gr.Button("Load Sample Transcript")
+                        file_upload = gr.File(label="Upload transcript file (.txt or .cha)")
+                        transcript = gr.Textbox(
+                            label="Speech transcript (CHAT format preferred)",
+                            placeholder="Enter transcript text or upload a file...",
+                            lines=10
+                        )
+                        # Analysis button
+                        analyze_btn = gr.Button("Analyze Transcript", variant="primary")
+                    with gr.Column(scale=1):
+                        # Results display
+                        gr.Markdown("### Analysis Results")
+                        analysis_output = gr.Markdown(label="Full Analysis")
+                        # PDF export (only shown if ReportLab is available)
+                        export_status = gr.Markdown("")
+                        if REPORTLAB_AVAILABLE:
+                            export_btn = gr.Button("Export as PDF", variant="secondary")
+                        else:
+                            gr.Markdown("⚠️ PDF export is disabled - ReportLab library is not installed")
+            # Transcription Tab
+            with gr.TabItem("Transcription", id=1):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### Audio Transcription")
+                        gr.Markdown("Upload an audio recording to automatically transcribe it in CHAT format")
+                        # Patient's age helps with transcription accuracy
+                        transcription_age = gr.Number(label="Patient Age", value=8, minimum=1, maximum=120,
+                                                     info="For children under 10, special language models may be used")
+                        # Audio input
+                        audio_input = gr.Audio(type="filepath", label="Upload Audio Recording",
+                                              elem_id="audio-input")
+                        # Transcribe button
+                        transcribe_btn = gr.Button("Transcribe Audio", variant="primary")
+                    with gr.Column(scale=1):
+                        # Transcription output
+                        transcription_output = gr.Textbox(
+                            label="Transcription Result",
+                            placeholder="Transcription will appear here...",
+                            lines=12
+                        )
+                        with gr.Row():
+                            # Button to use transcription in analysis
+                            copy_to_analysis_btn = gr.Button("Use for Analysis", variant="secondary")
+                        # Status/info message
+                        transcription_status = gr.Markdown("")
+        # Load sample transcript button
+        def load_sample():
+            return SAMPLE_TRANSCRIPT
+        sample_btn.click(load_sample, outputs=[transcript])
+        # File upload handler
+        file_upload.upload(process_upload, file_upload, transcript)
+        # Analysis button handler
+        def on_analyze_click(transcript_text, age_val, gender_val, patient_name_val, record_id_val, clinician_val, assessment_date_val):
+            if not transcript_text or len(transcript_text.strip()) < 50:
+                return "Error: Please provide a longer transcript for analysis."
+            try:
+                # Get the analysis results
+                results = analyze_transcript(transcript_text, age_val, gender_val)
+                # Return the full report
+                return results['full_report']
+            except Exception as e:
+                logger.exception("Error during analysis")
+                return f"Error during analysis: {str(e)}"
+        analyze_btn.click(
+            on_analyze_click,
+            inputs=[
+                transcript, age, gender,
+                patient_name, record_id, clinician_name, assessment_date
+            ],
+            outputs=[analysis_output]
+        )
+        # PDF export function
+        def on_export_pdf(report_text, p_name, p_record_id, p_age, p_gender, p_date, p_clinician):
+            # Check if ReportLab is available
+            if not REPORTLAB_AVAILABLE:
+                return "ERROR: PDF export is not available because the ReportLab library is not installed. Please install it with 'pip install reportlab'."
+            if not report_text or len(report_text.strip()) < 50:
+                return "Error: Please run the analysis first before exporting to PDF."
+            try:
+                # Parse the report text back into sections
+                results = {
+                    'speech_factors': '',
+                    'casl_data': '',
+                    'treatment_suggestions': '',
+                    'explanation': '',
+                    'additional_analysis': '',
+                    'diagnostic_impressions': '',
+                }
+                sections = report_text.split('##')
+                for section in sections:
+                    section = section.strip()
+                    if not section:
+                        continue
+                    title_content = section.split('\n', 1)
+                    if len(title_content) < 2:
+                        continue
+                    title = title_content[0].strip()
+                    content = title_content[1].strip()
+                    if "Speech Factors Analysis" in title:
+                        results['speech_factors'] = content
+                    elif "CASL Skills Assessment" in title:
+                        results['casl_data'] = content
+                    elif "Treatment Recommendations" in title:
+                        results['treatment_suggestions'] = content
+                    elif "Clinical Explanation" in title:
+                        results['explanation'] = content
+                    elif "Additional Analysis" in title:
+                        results['additional_analysis'] = content
+                    elif "Diagnostic Impressions" in title:
+                        results['diagnostic_impressions'] = content
+                pdf_path = export_pdf(
+                    results,
+                    patient_name=p_name,
+                    record_id=p_record_id,
+                    age=p_age,
+                    gender=p_gender,
+                    assessment_date=p_date,
+                    clinician=p_clinician
+                )
+                # Check if the export was successful
+                if pdf_path.startswith("ERROR:"):
+                    return pdf_path
+                # Make it downloadable in Hugging Face Spaces
+                download_link = f'<a href="file={pdf_path}" download="{os.path.basename(pdf_path)}">Download PDF Report</a>'
+                return f"Report saved as PDF: {pdf_path}<br>{download_link}"
+            except Exception as e:
+                logger.exception("Error exporting to PDF")
+                return f"Error creating PDF: {str(e)}"
+        # Only set up the PDF export button if ReportLab is available
+        if REPORTLAB_AVAILABLE:
+            export_btn.click(
+                on_export_pdf,
+                inputs=[
+                    analysis_output,
+                    patient_name,
+                    record_id,
+                    age,
+                    gender,
+                    assessment_date,
+                    clinician_name
+                ],
+                outputs=[export_status]
+            )
+        # Transcription button handler
+        def on_transcribe_audio(audio_path, age_val):
+            try:
+                if not audio_path:
+                    return "Please upload an audio file to transcribe.", "Error: No audio file provided."
+                # Process the audio file with Amazon Transcribe
+                transcription = transcribe_audio(audio_path, age_val)
+                # Return status message based on whether it's a demo or real transcription
+                if not transcribe_client:
+                    status_msg = "⚠️ Demo mode: Using example transcription (AWS credentials not configured)"
+                else:
+                    status_msg = "✅ Transcription completed successfully"
+                return transcription, status_msg
+            except Exception as e:
+                logger.exception("Error transcribing audio")
+                return f"Error: {str(e)}", f"❌ Transcription failed: {str(e)}"
+        # Connect the transcribe button to its handler
+        transcribe_btn.click(
+            on_transcribe_audio,
+            inputs=[audio_input, transcription_age],
+            outputs=[transcription_output, transcription_status]
+        )
+        # Copy transcription to analysis tab
+        def copy_to_analysis(transcription):
+            return transcription, gr.update(selected=0)  # Switch to Analysis tab
+        copy_to_analysis_btn.click(
+            copy_to_analysis,
+            inputs=[transcription_output],
+            outputs=[transcript, main_tabs]
+        )
+    return app
+# Create requirements.txt file for HuggingFace Spaces
+def create_requirements_file():
+    requirements = [
+        "gradio>=4.0.0",
+        "pandas",
+        "numpy",
+        "Pillow",
+        "boto3>=1.28.0",     # Required for AWS services
+        "botocore>=1.31.0",  # Required for AWS services
+        "reportlab>=3.6.0"   # Optional for PDF exports
+    ]
+    with open("requirements.txt", "w") as f:
+        for req in requirements:
+            f.write(f"{req}\n")
+if __name__ == "__main__":
+    # Create requirements.txt for HuggingFace Spaces
+    create_requirements_file()
+    # Check for AWS credentials
+    if not AWS_ACCESS_KEY or not AWS_SECRET_KEY:
+        print("NOTE: AWS credentials not found. The app will run in demo mode with simulated responses.")
+        print("To enable full functionality, set AWS_ACCESS_KEY and AWS_SECRET_KEY environment variables.")
+    app = create_interface()
+    app.launch(show_api=False)  # Disable API tab for security

requirements.txt CHANGED Viewed

@@ -1,12 +1,9 @@
 gradio>=4.0.0
-pandas>=1.5.0
-numpy>=1.21.0
-matplotlib>=3.5.0
-seaborn>=0.11.0
-Pillow>=8.0.0
 reportlab>=3.6.0
-boto3>=1.28.0
-botocore>=1.31.0
-PyPDF2>=3.0.0
-SpeechRecognition>=3.8.1
 pydub>=0.25.0

 gradio>=4.0.0
+pandas>=1.3.0
+numpy>=1.20.0
+matplotlib>=3.3.0
+boto3>=1.20.0
 reportlab>=3.6.0
+PyPDF2>=2.0.0
+speech_recognition>=3.8.0
 pydub>=0.25.0

simple_casl_app.py ADDED Viewed

	@@ -0,0 +1,187 @@

+import gradio as gr
+import boto3
+import json
+import os
+import logging
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# AWS credentials
+AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY", "")
+AWS_SECRET_KEY = os.getenv("AWS_SECRET_KEY", "")
+AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
+# Initialize Bedrock client
+bedrock_client = None
+if AWS_ACCESS_KEY and AWS_SECRET_KEY:
+    try:
+        bedrock_client = boto3.client(
+            'bedrock-runtime',
+            aws_access_key_id=AWS_ACCESS_KEY,
+            aws_secret_access_key=AWS_SECRET_KEY,
+            region_name=AWS_REGION
+        )
+        logger.info("Bedrock client initialized successfully")
+    except Exception as e:
+        logger.error(f"Failed to initialize AWS Bedrock client: {str(e)}")
+def call_bedrock(prompt):
+    """Call AWS Bedrock API with correct format"""
+    if not bedrock_client:
+        return "❌ AWS Bedrock not configured. Please set AWS credentials."
+    try:
+        body = json.dumps({
+            "anthropic_version": "bedrock-2023-05-31",
+            "max_tokens": 4096,
+            "top_k": 250,
+            "stop_sequences": [],
+            "temperature": 0.3,
+            "top_p": 0.9,
+            "messages": [
+                {
+                    "role": "user",
+                    "content": [
+                        {
+                            "type": "text",
+                            "text": prompt
+                        }
+                    ]
+                }
+            ]
+        })
+        response = bedrock_client.invoke_model(
+            body=body,
+            modelId='anthropic.claude-3-5-sonnet-20240620-v1:0',
+            accept='application/json',
+            contentType='application/json'
+        )
+        response_body = json.loads(response.get('body').read())
+        return response_body['content'][0]['text']
+    except Exception as e:
+        logger.error(f"Error calling Bedrock: {str(e)}")
+        return f"❌ Error calling Bedrock: {str(e)}"
+def process_file(file):
+    """Process uploaded file"""
+    if file is None:
+        return "Please upload a file first."
+    try:
+        # Read file content
+        with open(file.name, 'r', encoding='utf-8', errors='ignore') as f:
+            content = f.read()
+        if not content.strip():
+            return "File appears to be empty."
+        return content
+    except Exception as e:
+        return f"Error reading file: {str(e)}"
+def analyze_transcript(file, age, gender):
+    """Simple CASL analysis"""
+    if file is None:
+        return "Please upload a transcript file first."
+    # Get transcript content
+    transcript = process_file(file)
+    if transcript.startswith("Error") or transcript.startswith("Please"):
+        return transcript
+    # Simple analysis prompt
+    prompt = f"""
+    You are a speech-language pathologist analyzing a transcript for CASL assessment.
+    Patient: {age}-year-old {gender}
+    TRANSCRIPT:
+    {transcript}
+    Please provide a CASL analysis including:
+    1. SPEECH FACTORS (with counts and severity):
+    - Difficulty producing fluent speech
+    - Word retrieval issues
+    - Grammatical errors
+    - Repetitions and revisions
+    2. CASL SKILLS ASSESSMENT:
+    - Lexical/Semantic Skills (Standard Score, Percentile, Level)
+    - Syntactic Skills (Standard Score, Percentile, Level)
+    - Supralinguistic Skills (Standard Score, Percentile, Level)
+    3. TREATMENT RECOMMENDATIONS:
+    - List 3-5 specific intervention strategies
+    4. CLINICAL SUMMARY:
+    - Brief explanation of findings and prognosis
+    Use exact quotes from the transcript as evidence.
+    Provide realistic standard scores (70-130 range, mean=100).
+    """
+    # Get analysis from Bedrock
+    result = call_bedrock(prompt)
+    return result
+# Create simple interface
+with gr.Blocks(title="Simple CASL Analysis", theme=gr.themes.Soft()) as app:
+    gr.Markdown("# 🗣️ Simple CASL Analysis Tool")
+    gr.Markdown("Upload a speech transcript and get instant CASL assessment results.")
+    with gr.Row():
+        with gr.Column():
+            gr.Markdown("### Upload & Settings")
+            file_upload = gr.File(
+                label="Upload Transcript File",
+                file_types=[".txt", ".cha"]
+            )
+            age = gr.Number(
+                label="Patient Age",
+                value=8,
+                minimum=1,
+                maximum=120
+            )
+            gender = gr.Radio(
+                ["male", "female", "other"],
+                label="Gender",
+                value="male"
+            )
+            analyze_btn = gr.Button(
+                "🔍 Analyze Transcript",
+                variant="primary"
+            )
+        with gr.Column():
+            gr.Markdown("### Analysis Results")
+            output = gr.Textbox(
+                label="CASL Analysis Report",
+                placeholder="Analysis results will appear here...",
+                lines=25,
+                max_lines=30
+            )
+    # Connect the analyze button
+    analyze_btn.click(
+        analyze_transcript,
+        inputs=[file_upload, age, gender],
+        outputs=[output]
+    )
+if __name__ == "__main__":
+    print("🚀 Starting Simple CASL Analysis Tool...")
+    if not bedrock_client:
+        print("⚠️  AWS credentials not configured - analysis will show error message")
+    app.launch(show_api=False)