Skip to main content

News Analyzer

News Analyzer is a Rust-based microservice that aggregates financial news, clusters articles into Event Bundles, and provides LLM-powered sentiment analysis for traders.

Overview

News Analyzer operates as part of the FinAI trading system pipeline:

Trader BE → News Analyzer → LLM Wrapper → Trader BE → Trader App

Key Responsibilities:

  • Fetch news articles from NewsAPI based on asset symbols
  • Group articles into Event Bundles for multi-source analysis
  • Send to LLM Wrapper for sentiment analysis
  • Map LLM responses to structured trader API format

Architecture

API Endpoints

GET /health

Health check endpoint.

Response:

{
"status": "ok"
}

POST /api/v1/news-analysis

Main analysis endpoint. Accepts asset symbols and returns structured sentiment analysis.

Request:

{
"request_id": "optional-client-request-id",
"assets": [
{ "symbol": "AAPL" }
],
"time_range_hours": 24,
"max_articles": 10,
"language": "en"
}

Response:

{
"asset": "AAPL",
"analysis_timestamp": "2026-05-30T12:00:00Z",
"overall_sentiment_score": 0.75,
"confidence_level": 0.82,
"variance": {
"status": "Low",
"type": "Interpretive",
"note": null
},
"key_drivers": [
{
"event": "iPhone demand outlook",
"impact": "High",
"sentiment": "Bullish",
"description": "Supplier indicates strong demand"
}
],
"summary": "Mildly Bullish — Weak Signal",
"sources": [
{
"source": "reuters.com",
"relevance": 0.9,
"sentiment": 0.8,
"url": "https://reuters.com/article/123"
}
],
"headline_verdict": "Mildly Bullish — Weak Signal. Recent news is slightly positive.",
"time_horizon": "Medium-term theme",
"likely_impact": "Medium",
"risks_and_counterpoints": "No earnings data analysed.",
"watch_next": ["Apple earnings", "iPhone sales data"]
}

POST /analyze/batch

Batch analysis returning SentimentAnalysisResponse (raw LLM format).

Request:

{
"asset_name": "BTC",
"event_bundles": [
{
"id": "1",
"name": "Fed Rate Decision",
"sources": [
{
"label": "Source A",
"url": "https://reuters.com/btc",
"title": "Fed holds rates",
"content": "The Fed kept rates unchanged..."
}
]
}
]
}

POST /analyze/batch/trader

Same as batch but returns TraderApiResponse format.

Data Models

Request DTOs

ModelFieldsDescription
AssetRefsymbol: StringAsset identifier
NewsAnalysisRequestrequest_id, assets, time_range_hours, max_articles, languageMain request
BatchAnalysisRequestasset_name, event_bundlesInternal batch for LLM

Response DTOs

ModelFieldsDescription
TraderApiResponseFull trader responseFinal response for Trader BE
SentimentAnalysisResponseRaw LLM responseDirect from LLM Wrapper
VarianceInfostatus, type, noteSource variance analysis
KeyDriverevent, impact, sentiment, descriptionKey market drivers
SourceInfosource, relevance, sentiment, urlSource metadata

LLM Integration

Model Priority

  1. gemini-3.5-flash (primary)
  2. gemini-3-flash-preview
  3. gemini-2.5-flash

Response Schema

The LLM is configured with JSON schema validation for structured output:

{
"asset_name": "string",
"overall_sentiment": "Bullish" | "Bearish" | "Neutral" | "Mixed",
"confidence_score": 0.0-1.0,
"event_analyses": [...],
"bias_flags": [...],
"headline_verdict": "string",
"time_horizon": "Short-term noise" | "Medium-term theme" | "Long-term business signal",
"likely_impact": "Low" | "Medium" | "High",
"risks_and_counterpoints": "string",
"watch_next": ["string"]
}

Retry Logic

On JSON parse failure:

  1. First attempt with full system prompt
  2. Retry with simplified prompt and temperature: 0.1
  3. Fail if retry also fails

Source Authority Tiers

TierWeightExamples
Tier 11.0NYSE, NASDAQ, CME, Bloomberg, Reuters
Tier 20.7WSJ, FT, Barron's
Tier 30.4Secondary outlets
Tier 40.2Social media, forums

Sentiment Scoring

SentimentScore
Bullish1.0
Bearish-1.0
Neutral0.0
Mixed0.0

Variance Detection

Interpretive Variance (Medium severity)

  • Different interpretations of the same facts
  • Conflicting analyst opinions
  • Action: Note, reduce confidence by15%

Factual Conflict (High severity)

  • Direct contradictions in reported facts
  • Inconsistent timelines
  • Action: Flag, reduce confidence by 40-70%

Configuration

VariableDefaultDescription
LLM_ENDPOINT_URLhttp://llm-wrapper:11435LLM Wrapper endpoint
NEWS_API_KEY(required)NewsAPI.org API key
RUST_LOGinfoLogging level

Module Structure

src/
├── main.rs # Entry point, Axum router
├── config/ # AppConfig
├── domain/ # Domain models
├── shared/ # Error handling, logging
├── middleware/ # Request ID injection
├── clients/ # External service clients
│ ├── llm_wrapper_client.rs
│ └── news_api_client.rs
└── modules/
├── health/ # Health check
└── news_analysis/ # Main module
├── controller.rs # Route handlers
├── service.rs # Business logic
├── dto.rs # Data transfer objects
├── prompt.rs # System prompt template
└── gemini_schema.rs # JSON schemas

Adding New News Sources

Current State

News Analyzer currently uses NewsAPI.org as its primary news source. The source selection is configured via NEWS_API_KEY.

Planned Architecture

Per the project plan (US-3.2 through US-3.7), future versions will support:

  1. Source Registry - Central source profiles with:

    • Domain, enabled flag, discovery methods
    • Market/asset type mappings
    • Crawl rate limits, JS rendering flags
  2. URL Discovery - Multiple discovery methods:

    • RSS feeds (primary)
    • Sitemaps (secondary)
    • Category pages
    • Search API fallback
  3. Keyword Expansion - Asset symbol expansion:

    • Primary keywords: AAPL, Apple
    • Secondary keywords: Apple stock, Apple earnings
    • Negative keywords: filter noise
  4. Article Processing:

    • HTML cleaning (remove nav, ads, boilerplate)
    • Short article filter (less than 200 chars)
    • Relevance filtering (keyword overlap)
    • MD5 exact deduplication
  5. Event Bundles - Semantic grouping:

    • Embedding generation (all-MiniLM-L6-v2)
    • Cosine similarity clustering
    • 3-5 authoritative sources per bundle

Implementation Path

To add new sources today:

  1. NewsAPI-based: No code changes needed - just use different search queries
  2. New source type: Implement new client in src/clients/
  3. Full crawling pipeline: Follow US-3.2 → US-3.7 implementation order