Joseph M OBrien joe

Joined on 2026-03-15

message-extractor (0.1.0)

Published 2026-05-31 14:29:37 -05:00 by joe

Set up this registry in the Cargo configuration file (for example ~/.cargo/config.toml):

[registry]
default = "gitea"

[registries.gitea]
index = "sparse+" # Sparse index
# index = "" # Git

[net]
git-fetch-with-cli = true

To install the package using Cargo, run the following command:

cargo add message-extractor@0.1.0

For more information on the Cargo registry, see the documentation.

Message Extractor

A Rust-based system for extracting, monitoring, and visualizing conversations from AI coding assistants in real-time.

Overview

Message Extractor provides a complete solution for working with AI assistant conversations:

Core Library (message-extractor): Extract messages from 7+ AI assistants
Watcher Service (message-watcher): Monitor files and stream updates via SSE
Web UI (frontend): Real-time visualization with search and filtering

Why this exists: If you use multiple AI coding assistants, you might want to aggregate their conversations, search across them, or analyze usage patterns. This project makes that easy.

┌──────────┐    ┌─────────────────┐    ┌──────────────┐
│ Session  │───▶│ message-watcher │───▶│   Browser    │
│  Files   │    │   (Rust + SSE)  │    │ (Yew + WASM) │
└──────────┘    └─────────────────┘    └──────────────┘
  .jsonl/.json     File watching         Real-time UI

Quick Start (Full System)

Get the complete system running in 2 minutes:

# 1. Clone and build
git clone https://github.com/yourusername/message-extractor.git
cd message-extractor
cargo build --release

# 2. Start backend (Terminal 1)
mise run watcher:run

# 3. Start frontend (Terminal 2)
cd frontend && mise exec -- trunk serve

# 4. Open browser
open http://localhost:8080

You'll see messages from all your AI assistants streaming in real-time! 🎉

Features

Core Library

Trait-based design for extensible provider support
Async I/O for efficient file processing
Type-safe error handling with thiserror
7 provider implementations:
- Claude
- Codex
- Copilot
- Gemini
- OpenCode
- Droid
- Cursor

Watcher Service

Real-time monitoring with notify file watcher
Incremental reading - only new messages, not entire files
SSE streaming to multiple clients
Bounded history prevents memory leaks
Concurrency control prevents resource exhaustion

Web Frontend

Live message feed with auto-scroll
Filter by provider (Claude, Cursor, etc.)
Search messages with highlighting
Dark/Light mode with localStorage persistence
Export to JSON for filtered results
Session browser explore all past conversations

Use Cases

Cross-assistant search: Find that snippet across all your AI tools
Usage analytics: Track which assistants you use most
Conversation backup: Keep all your AI interactions
Research: Analyze conversation patterns
Debugging: Monitor what messages are being sent

System Components

📚 message-extractor (Library)

The core library that knows how to parse each AI assistant's session format.

Location: Root directory Documentation: See ARCHITECTURE.md

Quick example:

let registry = ExtractorRegistry::new();
let messages = registry
    .get(Provider::Claude)?
    .extract_messages(Path::new("session.jsonl"))
    .await?;

👁️ message-watcher (Service)

Real-time file watcher that monitors your AI assistant directories and broadcasts updates.

Location: message-watcher/ Documentation: message-watcher/README.md

Features:

Type-state pattern for lifecycle safety
Bounded message history (10k limit)
Semaphore-based concurrency control
Path sanitization for PII protection

🖥️ frontend (Web UI)

Yew-based WebAssembly frontend for visualizing message streams.

Location: frontend/ Documentation: frontend/README.md

Features:

Real-time SSE connection
Provider and text filtering
Dark mode support
Message export

Installation

Using the Library

Add this to your Cargo.toml:

[dependencies]
message-extractor = "0.1.0"

Usage

use message_extractor::{ExtractorRegistry, Provider};
use std::path::Path;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create registry with all providers
    let registry = ExtractorRegistry::new();

    // Get extractor for a specific provider
    let extractor = registry
        .get(Provider::Claude)
        .expect("Claude extractor should be registered");

    // Extract messages from a session file
    let messages = extractor
        .extract_messages(Path::new("~/.claude/projects/session-123.jsonl"))
        .await?;

    // Process messages
    for msg in messages {
        println!("{:?}: {}", msg.role, msg.content);
        if let Some(ts) = msg.timestamp {
            println!("  at {}", ts);
        }
    }

    Ok(())
}

Architecture

Core Types

SimpleMessage: Contains role, content, and optional timestamp
MessageRole: Enum for User, Assistant, and System roles
Provider: Enum for all supported providers

Trait Design

The MessageExtractor trait defines the interface all providers implement:

#[async_trait]
pub trait MessageExtractor: Send + Sync {
    async fn extract_messages(&self, session_path: &Path) -> Result<Vec<SimpleMessage>>;
    fn provider(&self) -> Provider;
}

Registry Pattern

The ExtractorRegistry provides a centralized way to access all provider extractors:

let registry = ExtractorRegistry::new();
let extractor = registry.get(Provider::Claude)?;

Provider Details

Claude

Format: JSONL (line-delimited JSON)
Location: ~/.claude/projects/*.jsonl
Content: Supports both string and array content blocks

Codex

Format: JSONL
Fields: role, content, timestamp

Copilot

Format: JSON
Structure: { "history": [...] }
Timestamp: Unix timestamp (seconds)

Gemini

Format: JSON
Structure: { "contents": [...] }
Content: Parts array with text blocks

OpenCode

Format: JSONL
Fields: role, content, created_at

Droid

Format: JSON
Structure: { "messages": [...] }

Cursor

Format: JSON
Structure: { "conversation": [...] }
Timestamp: Unix timestamp (milliseconds)

Helper Functions

Filter Conversation

Remove system messages from a message list:

use message_extractor::filter_conversation;

let filtered = filter_conversation(messages);
// Only user and assistant messages remain

Running the Example

cargo run --example extract tests/fixtures/claude_session.jsonl

Running Tests

# Run all tests
cargo test

# Run with verbose output
cargo test -- --nocapture

# Run specific test
cargo test test_claude_extractor

Development

Building

cargo build

Linting

cargo clippy

Formatting

cargo fmt

Extending

To add a new provider:

Create src/providers/new_provider.rs
Implement the MessageExtractor trait
Add to src/providers/mod.rs
Register in ExtractorRegistry::new()
Add test fixture in tests/fixtures/
Add integration test

Example:

use async_trait::async_trait;
use crate::extractor::MessageExtractor;
use crate::types::{SimpleMessage, Provider};
use crate::error::Result;

pub struct NewProviderExtractor;

#[async_trait]
impl MessageExtractor for NewProviderExtractor {
    async fn extract_messages(&self, session_path: &Path) -> Result<Vec<SimpleMessage>> {
        // Implementation here
    }

    fn provider(&self) -> Provider {
        Provider::NewProvider
    }
}

Troubleshooting

No Messages Appearing

Problem: Frontend shows "Waiting for messages..." but files exist.

Solutions:

Check if watcher is running:

curl http://localhost:3030/health
# Should return: OK

Check if your AI assistant files are in the expected locations:

ls ~/.claude/projects/*.jsonl
ls ~/.cursor/sessions/*.json
# etc.

Check watcher logs for errors:

# Look for "Failed to extract messages" or "Unknown provider"

Verify file extensions:
- The watcher only processes .json and .jsonl files
- Other extensions are silently skipped

Connection Failed

Problem: Frontend shows "Connection error".

Solutions:

Verify backend is running on port 3030
Check for port conflicts:
```
lsof -i :3030
```
Try restarting both backend and frontend

High Memory Usage

Problem: Watcher using too much memory.

Solutions:

The message history is bounded to 10,000 messages by default

Reduce it in message-watcher/src/config.rs:

max_history_size: 5_000  // Lower limit

Check for excessive file changes triggering many extractions

Missing Providers

Problem: Some providers don't show up.

Solutions:

Verify the provider's directory exists
Check that you have session files in that directory
Ensure session files are valid JSON/JSONL

Build Failures

Problem: cargo build fails.

Solutions:

Update Rust toolchain:
```
rustup update stable
```
Clean and rebuild:
```
cargo clean
cargo build
```
Check that all dependencies are available

Frontend Not Loading

Problem: Browser shows blank page.

Solutions:

Check browser console for errors
Verify WebAssembly is enabled
Clear browser cache
Rebuild frontend:
```
cd frontend
trunk clean
trunk serve
```

FAQ

Q: Does this work with all AI assistants? A: Currently supports 7 assistants. See Contributing Guide to add more.

Q: Is my data sent anywhere? A: No. Everything runs locally. The server binds to localhost only.

Q: Can I filter messages before they're stored? A: Not yet, but this is a planned feature. Currently, filtering happens in the UI.

Q: What's the performance impact? A: Minimal. The watcher uses incremental reading and bounded concurrency.

Q: Can I run this on a server? A: Yes, but you must add authentication first. See Security Documentation.

Q: How do I export all my conversations? A: Use the "EXPORT" button in the UI, or use the library directly:

let messages = registry.get(provider)?.extract_messages(path).await?;
let json = serde_json::to_string_pretty(&messages)?;
std::fs::write("export.json", json)?;

License

MIT

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

ID	Version
async-trait	^0.1
chrono	^0.4
serde	^1.0
serde_json	^1.0
thiserror	^1.0
tokio	^1.0
criterion	^0.5
tempfile	^3.8
tokio-test	^0.4

Details

Cargo

2026-05-31 14:29:37 -05:00

84 KiB

Assets (1)

message-extractor-0.1.0.crate 84 KiB

Versions (1) View all

0.1.0

2026-05-31

message-extractor (0.1.0)

Installation

About this package

Message Extractor

Overview

Quick Start (Full System)

Features

Core Library

Watcher Service

Web Frontend

Use Cases

System Components

📚 message-extractor (Library)

👁️ message-watcher (Service)

🖥️ frontend (Web UI)

Installation

Using the Library

Usage

Architecture

Core Types

Trait Design

Registry Pattern

Provider Details

Claude

Codex

Copilot

Gemini

OpenCode

Droid

Cursor

Helper Functions

Filter Conversation

Running the Example

Running Tests

Development

Building

Linting

Formatting

Extending

Troubleshooting

No Messages Appearing

Connection Failed

High Memory Usage

Missing Providers

Build Failures

Frontend Not Loading

FAQ

Further Reading

License

Contributing

Dependencies