banatie-service/apps/api-service/NETWORK_ERROR_DETECTION.md

4.4 KiB

Network Error Detection

Overview

This implementation adds intelligent network error detection to the Banatie API service. It follows best practices by only checking connectivity when errors occur (zero overhead on successful requests) and provides clear, actionable error messages to users.

Features

Lazy Validation

  • Network checks only trigger on failures
  • Zero performance overhead on successful requests
  • Follows the fail-fast pattern

Error Classification

Automatically detects and classifies:

  • DNS Resolution Failures (ENOTFOUND, EAI_AGAIN)
  • Connection Timeouts (ETIMEDOUT)
  • Connection Refused (ECONNREFUSED)
  • Connection Resets (ECONNRESET, ENETUNREACH)
  • Generic Network Errors (fetch failed, etc.)

Clear User Messages

Before:

Gemini AI generation failed: exception TypeError: fetch failed sending request

After:

Network error: Unable to connect to Gemini API. Please check your internet connection and firewall settings.

Detailed Logging

Logs contain both user-friendly messages and technical details:

[NETWORK ERROR - DNS] DNS resolution failed: Unable to resolve Gemini API hostname | Technical: Error code: ENOTFOUND

Implementation

Core Utility

Location: src/utils/NetworkErrorDetector.ts

Key Methods:

  • classifyError(error, serviceName) - Analyzes an error and determines if it's network-related
  • formatErrorForLogging(result) - Formats errors for logging with technical details

Integration Points

  1. ImageGenService (src/services/ImageGenService.ts)

    • Enhanced error handling in generateImageWithAI() method
    • Provides clear network diagnostics when image generation fails
  2. PromptEnhancementService (src/services/promptEnhancement/PromptEnhancementService.ts)

    • Enhanced error handling in enhancePrompt() method
    • Helps users diagnose connectivity issues during prompt enhancement

How It Works

Normal Operation (No Overhead)

User Request → Service → Gemini API → Success ✓

No network checks performed - zero overhead.

Error Scenario (Smart Detection)

User Request → Service → Gemini API → Error ✗
              ↓
    NetworkErrorDetector.classifyError()
              ↓
    1. Check error code/message for network patterns
    2. If network-related: Quick DNS check (2s timeout)
    3. Return classification + user-friendly message

Error Types

Error Type Trigger User Message
dns ENOTFOUND, EAI_AGAIN DNS resolution failed: Unable to resolve Gemini API hostname
timeout ETIMEDOUT Connection timeout: Gemini API did not respond in time
refused ECONNREFUSED Connection refused: Service may be down or blocked by firewall
reset ECONNRESET, ENETUNREACH Connection lost: Network connection was interrupted
connection General connectivity failure Network connection failed: Unable to reach Gemini API
unknown Network keywords detected Network error: Unable to connect to Gemini API

Testing

Run the manual test to see error detection in action:

npx tsx test-network-error-detector.ts

This demonstrates:

  • DNS errors
  • Timeout errors
  • Fetch failures (actual error from your logs)
  • Non-network errors (no false positives)

Example Error Logs

Before Implementation

[2025-10-09T16:56:29.228Z] [fmfnz0zp7] Text-to-image generation completed: {
  success: false,
  error: 'Gemini AI generation failed: exception TypeError: fetch failed sending request'
}

After Implementation

[ImageGenService] [NETWORK ERROR - UNKNOWN] Network error: Unable to connect to Gemini API. Please check your internet connection and firewall settings. | Technical: exception TypeError: fetch failed sending request

Benefits

  1. Better UX: Users get actionable error messages
  2. Faster Debugging: Developers immediately know if it's a network issue
  3. Zero Overhead: No performance impact on successful requests
  4. Production-Ready: Follows industry best practices (AWS SDK, Stripe, Google Cloud)
  5. Comprehensive: Detects all major network error types

Future Enhancements

Potential improvements:

  • Retry logic with exponential backoff for transient network errors
  • Circuit breaker pattern for repeated failures
  • Metrics/alerting for network error rates
  • Health check endpoint with connectivity status