# SILMA TTS API Server Setup

This guide explains how to configure a remote SILMA TTS API server for faster voice synthesis.

## Why Use API Server Mode?

Based on performance testing:
- **Local Mode**: ~75 seconds per synthesis (CPU-based processing)
- **API Server Mode**: ~5 seconds per synthesis (GPU-accelerated)
- **Performance Improvement**: 15x faster

## Prerequisites

1. SILMA TTS API server deployed and accessible
2. API server supports WebSocket connections
3. Network connectivity between your server and API server

## Setup Steps

### 1. Deploy SILMA TTS API Server

You have two options:

#### Option A: Use SILMA AI's Public API (Coming Soon)
- Contact SILMA AI for API access
- Get API endpoint URL
- Obtain authentication credentials

#### Option B: Self-Hosted API Server
- Deploy SILMA TTS with GPU acceleration
- Configure WebSocket endpoint
- Set up authentication

### 2. Configure API Endpoint

#### Via Settings UI:
1. Navigate to Settings → SILMA TTS
2. Under "Server Configuration"
3. Click "API Server (Fast)" button
4. Enter your API endpoint URL
5. Click "Save"

#### Via Database:
```sql
UPDATE silma_credentials 
SET api_endpoint = 'https://your-silma-server.com/api',
    is_local = false
WHERE user_id = 'your-user-id';
```

### 3. Configure Geographic Region

For optimal performance, select the region closest to your services:

1. In the SILMA TTS settings page
2. Select "API Server Region"
3. Choose from:
   - **Auto-detect** (Recommended)
   - US East (Virginia)
   - US West (Oregon)
   - EU Central (Frankfurt)
   - EU West (London)
   - Asia Pacific (Singapore)
   - Middle East (Bahrain)

### 4. Verify Connection

The system will automatically:
- Test API server connectivity
- Warm up connection pool (3 initial connections)
- Monitor connection health
- Display connection pool statistics

## Connection Pool Architecture

The system maintains a pool of pre-connected WebSocket connections:

- **Pool Size**: 5 connections
- **Warmup**: 3 connections on startup
- **Maintenance**: Periodic cleanup and refresh
- **Fallback**: HTTP API if WebSocket unavailable

### Connection Pool Statistics

View in SILMA TTS settings:
- Total connections in pool
- Ready/active connections
- Connection age and health

## Performance Optimization Tips

### Geographic Co-Location
Deploy your SILMA TTS API server in the same region as:
- Your telephony provider (Twilio/Plivo)
- Your LLM provider (Groq/OpenAI)
- Your primary user base

**Expected improvement**: 50% latency reduction

### Network Optimization
- Use low-latency network routes
- Enable TCP keepalive
- Configure appropriate timeouts
- Monitor network metrics

## WebSocket Protocol

The SILMA TTS API server should support the following WebSocket message format:

### Request Format:
```json
{
  "type": "synthesize",
  "audio": "base64_encoded_reference_audio",
  "text": "text to synthesize",
  "reference_text": "transcription of reference audio",
  "speed": 1.0,
  "seed": 42
}
```

### Response Format:
```json
{
  "type": "synthesis_complete",
  "audio": "base64_encoded_audio_data"
}
```

### Error Format:
```json
{
  "type": "error",
  "error": "error message"
}
```

## Troubleshooting

### Connection Pool Empty
- Check API server is running
- Verify network connectivity
- Review firewall rules
- Check authentication credentials

### High Latency
- Verify geographic region settings
- Check network path to API server
- Monitor connection pool utilization
- Review API server performance

### Fallback to HTTP
- WebSocket connections failing
- Check API server WebSocket support
- Verify TLS/SSL certificates
- Review proxy configurations

## Monitoring

### Key Metrics to Monitor:
1. **Connection Pool Utilization**: Should be > 80% ready
2. **Synthesis Latency**: Target < 5 seconds
3. **Connection Age**: Recreated every 5 minutes
4. **Error Rate**: Should be < 1%

### Health Check
The system performs automatic health checks:
- Every 60 seconds
- On-demand via settings page
- Logs health status

## Cost Considerations

API server mode may incur:
- API call costs (if using commercial API)
- Network transfer costs
- Server hosting costs (if self-hosted)

Balance these against:
- 15x faster synthesis
- Improved user experience
- Reduced server CPU usage

## Resources

- [SILMA TTS Documentation](https://github.com/SILMA-AI/silma-tts)
- [Voice Agent Performance Guide](https://www.ntik.me/posts/voice-agent)
- [SILMA TTS Settings](https://halavoice.store/app/settings)

## Example API Server Deployment

For reference, here's a basic API server structure:

```python
# SILMA TTS API Server Example
from silma_tts.api import SilmaTTS
from flask import Flask, request, jsonify
import base64

app = Flask(__name__)
silma_tts = SilmaTTS()

@app.route('/synthesize', methods=['POST'])
def synthesize():
    data = request.json
    
    # Decode reference audio
    reference_audio = base64.b64decode(data['audio'])
    
    # Perform synthesis
    wav, sr, spec = silma_tts.infer(
        ref_file=reference_audio,
        ref_text=data.get('reference_text', ''),
        gen_text=data['text'],
        speed=data.get('speed', 1.0)
    )
    
    # Return audio
    return jsonify({
        'audio': base64.b64encode(wav).decode()
    })
```

This provides a foundation for your API server implementation.
