Skip to main content

LLM Wrapper

The LLM Wrapper is a unified API proxy that provides a consistent interface for interacting with multiple LLM providers (OpenAI, Anthropic, Ollama, etc.).

Overview

This service acts as a middleware layer that:

  • Provides a unified API endpoint for all LLM interactions
  • Handles provider-specific authentication and API differences
  • Supports key rotation and fallback strategies
  • Enables consistent logging and monitoring across all LLM calls

Quick Start

Prerequisites

  • Node.js >= 20.0
  • API keys for desired providers (OpenAI, Anthropic, etc.)

Installation

git clone https://github.com/FinAI-Temp/LLM-Wrapper.git
cd LLM-Wrapper
npm install

Configuration

Create a .env file in the root directory:

# Provider API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Server Configuration
PORT=3000
LOG_LEVEL=info

# Provider Selection (comma-separated, in priority order)
ACTIVE_PROVIDERS=openai,anthropic

Running the Service

# Development
npm run dev

# Production
npm start

API Reference

Unified Endpoint

POST /api/chat/completions

The unified chat completions endpoint that routes requests to the configured LLM providers.

Request Body

{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the current market sentiment?"
}
],
"temperature": 0.7,
"max_tokens": 1000
}

Response

{
"id": "chatcmpl-xxx",
"model": "gpt-4o",
"choices": [
{
"message": {
"role": "assistant",
"content": "Based on recent market data..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 50,
"completion_tokens": 150,
"total_tokens": 200
}
}

Supported Models

ProviderModels
OpenAIgpt-4o, gpt-4-turbo, gpt-3.5-turbo
Anthropicclaude-3-5-sonnet, claude-3-opus, claude-3-haiku
Ollamallama3, mistral, codellama

Adding Providers

To add a new provider:

  1. Create a new adapter in src/adapters/ following the existing adapter pattern
  2. Implement the LLMAdapter interface with complete() and stream() methods
  3. Register the provider in src/providers/registry.ts
  4. Add configuration options in src/config/

Key Management

Setting API Keys

API keys are loaded from environment variables at startup. Never commit API keys to version control.

Key Rotation

The LLM Wrapper supports automatic key rotation:

  1. Configure multiple API keys for a provider (comma-separated in env)
  2. The wrapper automatically rotates keys on rate limit errors (429)
  3. Configure rotation strategy via KEY_ROTATION_STRATEGY env var:
    • round-robin (default): Sequential rotation
    • random: Random key selection
    • failover: Use next key only on failure

Monitoring

Monitor key usage and rotation via the /api/health endpoint which includes:

  • Current active key index per provider
  • Request counts per key
  • Error rates per key

Integration with News Analyzer

The News Analyzer service integrates with the LLM Wrapper using the unified endpoint.

Integration Pattern

const response = await fetch('http://llm-wrapper:3000/api/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: 'You are a financial news analyst. Analyze the following article...'
},
{
role: 'user',
content: articleContent
}
],
temperature: 0.3,
max_tokens: 500
})
});

Conventions

  • Model Selection: News Analyzer uses gpt-4o for standard analysis and claude-3-5-sonnet for complex reasoning
  • Temperature: Typically 0.1-0.3 for analytical tasks (more deterministic)
  • Max Tokens: 500-1000 for typical news summaries, adjust based on expected output length
  • Timeout: 30 second timeout for all LLM calls; implement retry logic with exponential backoff

Error Handling

When the LLM Wrapper returns an error:

  1. Rate Limit (429): Wait and retry with exponential backoff (max 3 retries)
  2. Auth Error (401/403): Log and alert; do not retry with same key
  3. Server Error (500+): Retry on different provider if available
  4. Timeout: Retry once, then fail gracefully with cached response if available

Troubleshooting

Common Issues

Connection Refused

  • Ensure the LLM Wrapper service is running
  • Check that the port matches the configured PORT value

Invalid API Key

  • Verify the API key is correct and has not expired
  • Check that the key has necessary permissions for the requested model

Rate Limiting

  • Reduce request frequency
  • Consider adding more API keys for rotation
  • Implement request queuing for high-volume scenarios

Architecture

┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Client │────▶│LLM Wrapper │────▶│ OpenAI │
└─────────────┘ │ │ └─────────────┘
│ Router & │ ┌─────────────┐
│ Fallback │────▶│ Anthropic │
│ │ └─────────────┘
│ Key Mgmt │ ┌─────────────┐
│ │────▶│ Ollama │
└─────────────┘ └─────────────┘