Decorator Pattern

The Decorator pattern wraps objects to add new functionality without changing their core implementation. In LLM applications, decorators excel at adding cross-cutting concerns like caching, retry logic, monitoring, and cost tracking.

Why Decorator Pattern for LLM?

LLM applications often need:

  • Response caching: Store expensive API calls to reduce costs and latency

  • Retry mechanisms: Handle API failures gracefully with exponential backoff

  • Cost tracking: Monitor spending across different providers and models

  • Performance monitoring: Track response times and success rates

  • Rate limiting: Control API call frequency to avoid quota issues

Key LLM Use Cases

1. Response Caching Strategy

Cache LLM responses to improve performance and reduce costs:

import functools
import hashlib
import json
from typing import Dict, Any, Optional

class LLMCache:
    def __init__(self, max_size: int = 1000, ttl: int = 3600):
        self.cache: Dict[str, Dict[str, Any]] = {}
        self.max_size = max_size
        self.ttl = ttl  # Time to live in seconds
    
    def _generate_key(self, *args, **kwargs) -> str:
        """Generate cache key from function arguments"""
        key_data = {'args': str(args), 'kwargs': kwargs}
        return hashlib.md5(json.dumps(key_data, sort_keys=True).encode()).hexdigest()
    
    def get(self, key: str) -> Optional[Any]:
        """Get cached value if exists and not expired"""
        if key in self.cache:
            entry = self.cache[key]
            if time.time() - entry['timestamp'] < self.ttl:
                return entry['value']
            else:
                del self.cache[key]  # Remove expired entry
        return None
    
    def set(self, key: str, value: Any):
        """Set cache value with timestamp"""
        if len(self.cache) >= self.max_size:
            # Remove oldest entry
            oldest_key = min(self.cache.keys(), key=lambda k: self.cache[k]['timestamp'])
            del self.cache[oldest_key]
        
        self.cache[key] = {
            'value': value,
            'timestamp': time.time()
        }

def cache_llm_response(cache_instance: Optional[LLMCache] = None):
    """Decorator to cache LLM responses"""
    if cache_instance is None:
        cache_instance = LLMCache()
    
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            # Generate cache key
            cache_key = cache_instance._generate_key(*args, **kwargs)
            
            # Check cache first
            cached_result = cache_instance.get(cache_key)
            if cached_result is not None:
                print(f"🎯 Cache hit! Saved API call")
                return cached_result
            
            # Call original function
            print(f"🌐 Making API call...")
            result = func(*args, **kwargs)
            
            # Store in cache
            cache_instance.set(cache_key, result)
            print(f"πŸ’Ύ Response cached")
            
            return result
        
        wrapper.cache = cache_instance  # Expose cache for debugging
        return wrapper
    return decorator

# Usage example
@cache_llm_response()
def call_llm(prompt: str, model: str = "gpt-4") -> str:
    # Simulate expensive LLM API call
    time.sleep(2)  # Simulate network delay
    return f"Response to: {prompt[:50]}..."

Benefits:

  • Significant cost reduction for repeated queries

  • Improved response times for cached content

  • Reduced API quota usage

  • Better user experience with faster responses

2. Retry and Resilience Decorator

Handle API failures with intelligent retry logic:

Benefits:

  • Automatic recovery from transient failures

  • Exponential backoff prevents API overwhelming

  • Configurable retry policies per use case

  • Maintains service reliability despite API issues

3. Cost Tracking Decorator

Monitor and control LLM spending:

Benefits:

  • Real-time cost monitoring

  • Budget control and alerts

  • Cost optimization insights

  • Spending transparency across models

4. Performance Monitoring Decorator

Track response times and success rates:

Benefits:

  • Performance bottleneck identification

  • SLA compliance monitoring

  • Model comparison insights

  • System health tracking

Implementation Advantages

1. Separation of Concerns

  • Core LLM logic remains clean and focused

  • Cross-cutting concerns handled separately

  • Easy to enable/disable features per environment

  • Independent testing of each concern

2. Composability

  • Stack multiple decorators for comprehensive functionality

  • Order-dependent behavior (cache before retry, monitor everything)

  • Mix and match based on requirements

  • Reusable across different LLM functions

3. Transparency

  • Original function interface unchanged

  • Callers unaware of decorator enhancements

  • Easy migration from basic to enhanced functions

  • Backward compatibility maintained

4. Configuration Flexibility

  • Runtime decorator configuration

  • Environment-specific behavior

  • A/B testing different decorator combinations

  • Dynamic feature toggling

Real-World Impact

The Decorator pattern in LLM applications provides:

  • Cost Savings: Caching reduces API calls by 30-70% in typical applications

  • Reliability: Retry mechanisms improve success rates from 95% to 99.9%

  • Performance: Cached responses serve in milliseconds vs seconds

  • Observability: Complete visibility into LLM usage patterns and costs


πŸ”— Interactive Implementation

πŸ““ Decorator Pattern Notebookarrow-up-right Open In Colabarrow-up-right - LLM response caching, performance monitoring, and cost tracking with real OpenAI API integration.

This pattern is essential for production LLM systems where reliability, performance, and cost control are critical requirements.

Last updated