﻿# RAG Latency Optimization System - Deployment Guide

## 🚀 Quick Deployment

### Option 1: Docker (Recommended)
`ash
# Build and run with Docker
docker-compose up --build -d

# Check logs
docker-compose logs -f

# Stop
docker-compose down
Option 2: Direct Python
bash
# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
# OR
.venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements-production.txt

# Initialize system
python scripts/download_sample_data.py
python scripts/initialize_rag.py

# Start server
python -m app.main
📊 Performance Tuning
Configuration (config.py)
Key parameters to tune:

CHUNK_SIZE: 300-500 tokens (smaller = more precise, larger = more context)

TOP_K_DYNAMIC: Adjust based on query complexity

ENABLE_EMBEDDING_CACHE: Set to True for production

ENABLE_PRE_FILTER: Set to True for large datasets

Scaling Recommendations
< 1K documents: Single instance, 2GB RAM

1K-10K documents: Add Redis cache, 4GB RAM

10K-100K documents: Sharded FAISS indexes, 8GB+ RAM

> 100K documents: Distributed system, multiple nodes

🔧 Monitoring & Maintenance
Health Checks
Endpoint: GET /

Response: {"status": "running"}

Metrics
Real-time: GET /metrics

Historical: data/metrics.csv

Export: data/metrics.json

Cache Management
Embedding cache: data/embedding_cache.db

Clear cache: Delete the file and restart

Cache stats available via API

🚨 Troubleshooting
Common Issues:
High memory usage: Reduce TOP_K values, enable compression

Slow responses: Check embedding cache hits, optimize FAISS index

No results: Rebuild FAISS index with python scripts/initialize_rag.py

Performance Commands:
bash
# Test performance
python test_rag.py

# Run scalability test
python scale_test.py

# Generate demo report
python showcase_demo.py
📈 Production Checklist
Configure proper logging

Set up monitoring (Prometheus/Grafana)

Implement rate limiting

Add authentication if needed

Set up backup for data directory

Configure SSL/TLS

Set up CI/CD pipeline

🔗 API Documentation
Access Swagger UI: http://localhost:8000/docs

Key Endpoints:
POST /query - Process RAG query

GET /metrics - Get performance metrics

POST /reset_metrics - Reset metrics

📞 Support
For issues or questions:

Check logs in data/ directory

Review metrics for performance issues

Consult this deployment guide
