Agentic Node Logo

Agentic Node

Your data is your power

A privacy-first personal data platform that aggregates your entire digital life into a single, locally-hosted AI agent you can talk to. Documents, emails, finances, photos, contacts, and calendars all live in one place, on your machine, under your control. Built on RAG, local LLMs, and a tiered privacy architecture.

100%
On-device data storage
3
Docker containers orchestrated
7
Data connector types designed
384-dim
Local vector embeddings (FAISS)

The Vision

Every person generates a massive digital footprint scattered across email providers, cloud drives, bank portals, photo libraries, and local files. Accessing your own data requires logging into dozens of services, each with their own interface, each holding a fragment of your life.

Agentic Node is a personal data platform that pulls all of this into one place, indexed and searchable through a conversational AI agent. The core principle:

“Everything comes TO you. Nothing leaves unless YOU say so.”

Unlike cloud-first AI assistants that upload your data to external servers, Agentic Node runs locally. Your documents are chunked and embedded on your machine. Your LLM runs on your hardware. Your financial data never touches an external API unless you explicitly allow it. The result is an AI that knows everything about your life while maintaining complete privacy.

System Architecture

The platform uses a hybrid storage architecture with specialized data stores, a dual LLM pipeline, and a connector framework that pulls data from any source into the user's local Node.

===========================================================================================
  DEVICES (Access Points)
===========================================================================================

    Desktop            Laptop             Mobile             Remote Workstation
      |                  |                   |                      |
      +------------------+-------------------+----------------------+
                                     |
                              [ Browser / PWA ]
                              localhost:3000 or LAN
                                     |
===========================================================================================
  FRONTEND  (React + TypeScript + Tailwind + Vite)
===========================================================================================
                                     |
                    +----------------+----------------+
                    |                                 |
          [ Agent Chat UI ]            [ Connections Dashboard ]
          Named personal agent        Data source management
          Source citations             Folder browser
          Privacy indicators           RAG progress bars
          LLM tier selector            Sync controls
                    |                                 |
                    +----------------+----------------+
                                     |
                            [ FastAPI Backend ]
                              Port 8000 (Docker)
                                     |
===========================================================================================
  AGENT LAYER  (The Conductor)
===========================================================================================
                                     |
              +----------+-----------+-----------+----------+
              |          |           |           |          |
        Query Router   Privacy    RAG Service  Structured  Connector
        (local/cloud)  Service    (retrieval)  Query Svc   Framework
              |          |           |           |          |
              |     [ spaCy NER ]    |           |          |
              |     PII detection    |           |          |
              |     Tokenization     |           |          |
              |          |           |           |          |
===========================================================================================
  DATA STORES  (Hybrid Architecture)
===========================================================================================
              |          |           |           |          |
              |          |     [ FAISS Index ]  [ SQLite DB ]   [ File System ]
              |          |     384-dim vectors  Transactions  Raw media
              |          |     Semantic search  Contacts      Photos/Video
              |          |     Document chunks  Calendar      Audio files
              |          |     Email content    File metadata PDFs/Docs
              |          |           |           |          |
===========================================================================================
  RAG PIPELINE  (Retrieval Augmented Generation)
===========================================================================================
              |          |           |
              |          |     User Query
              |          |        |
              |          |     Embed query (all-MiniLM-L6-v2, 384-dim)
              |          |        |
              |          |     FAISS similarity search (cosine, top-k)
              |          |        |
              |          |     Retrieve top chunks + similarity scores
              |          |        |
              |          |     Build context prompt with sources
              |          |        |
              |          |        v
===========================================================================================
  LLM PIPELINE  (Dual-Mode Processing)
===========================================================================================
              |
     +--------+--------+
     |                  |
[ LOCAL LLM ]       [ CLOUD LLM ]
Mistral 7B          Claude API
via Ollama          (Anthropic)
Port 11434          |
(Docker)            Receives ONLY:
     |              - Anonymized context
100% private        - PII tokens replaced
No data leaves      - User controls access
     |                  |
     +--------+---------+
              |
        [ Response ]
        De-anonymize tokens
        Attach source citations
        Privacy report
              |
              v
        [ User sees answer ]
        with sources + privacy indicator

===========================================================================================
  DATA CONNECTORS  (Everything Comes TO You)
===========================================================================================

[Local Folders]  [Cloud Storage]  [Email/Cal]  [Banking]  [Contacts]  [Photos]
C:, D:, USB      OneDrive         IMAP       Plaid API   Phone       Libraries
External SSD     Google Drive     CalDAV     Investment  Social      Albums
NAS drives       Dropbox          OAuth      Accounts    LinkedIn    EXIF data
                 iCloud
     |               |               |           |          |           |
     +---------------+---------------+-----------+----------+-----------+
                                     |
                            Pull data DOWN to local
                            Route to correct store:
                            - Text docs  -->  FAISS (chunk + embed)
                            - Structured -->  SQLite (tables)
                            - Media      -->  File system (metadata indexed)
  • FAISS Vector Store - Unstructured text (documents, emails, notes). Queried via semantic similarity for natural language questions like "what's my mom's birthday?"
  • SQLite Database - Structured data (financial transactions, contacts, calendar events). Queried via SQL for aggregation, graphing, and filtering. Chosen over Postgres because it is free, zero-config, and a single portable file.
  • Local File System - Raw media files (photos, videos, audio). Metadata indexed in SQLite, text content indexed in FAISS.
  • Local LLM (Mistral via Ollama) - Acts as a privacy gatekeeper. Handles queries locally and strips PII before anything reaches a cloud LLM.
  • Cloud LLM (Claude API) - Optional tier for complex reasoning. Receives only anonymized, tokenized context. User controls when this is used.
  • RAG Pipeline - Documents are chunked, embedded with all-MiniLM-L6-v2 (384 dimensions), and stored in a FAISS index. At query time, the user's question is embedded and matched against stored chunks using cosine similarity. Top-k results are assembled into a context prompt for the LLM.
  • spaCy NER - Named entity recognition identifies PII (names, addresses, phone numbers) in outbound context. Sensitive tokens are replaced before cloud processing and restored in the response.

Access From Any Device

Agentic Node runs as a local service accessible from any device on the user's network. The Node itself lives on one machine (or a dedicated appliance), but the web-based interface works from any browser.

🖥
Desktop
Primary host running Docker containers with full local processing power
💻
Laptop
Access via browser on the same network or run Node locally while traveling
📱
Mobile
Progressive web app for on-the-go queries to your personal agent
🌐
Remote
Secure tunnel access from any location via VPN or Tailscale

Privacy-First Data Hierarchy

Not all data sources are equal. The platform implements a three-tier privacy hierarchy that determines how data is ingested and stored:

1
Truly Local
HIGHEST TRUST
Files on disk, local drives, USB drives. Local email clients. All indexed and stored on YOUR machine. Data never moves.
2
Cloud Pull
MEDIUM TRUST
OneDrive, Google Drive, Dropbox, iCloud. Email via IMAP. Calendar via CalDAV. Data travels TO you, then stays local.
3
API Connections
LOWER TRUST · HIGHEST VALUE
Bank APIs via Plaid, investment accounts. Data fetched and cached locally. Incremental sync: pull history once, then only new transactions.

Application Screenshots

Connections Dashboard showing 2 connected local folders with RAG progress bars
Connections Dashboard Data source management with real-time sync status, RAG indexing progress bars (34/55 searchable), storage metrics, and the "Everything comes TO you" privacy banner.
Agent Javis chat interface with RAG-powered response
Agent Chat Interface Named personal agent ("Agent Javis") with source citations, privacy indicators, and LLM tier selection (Local/Auto/Cloud).
Add Data Source modal with 7 connector types
Add Data Source Seven connector types: Local Folder, Cloud Storage, Email, Calendar, Banking, Contacts, and Photos.

Connections Dashboard

The platform includes a visual dashboard for managing all data source connections. Users can browse their local file system, connect folders, and monitor sync status, including a real-time progress bar showing how many files have been chunked and embedded into the RAG pipeline.

Supported connector types:

  • Local Folders - Browse and select any folder on any mounted drive. Files are cataloged in SQLite and text documents are automatically chunked and embedded into FAISS for search.
  • Cloud Storage - OneDrive, Google Drive, Dropbox, iCloud. Pulls files down to local storage.
  • Email - IMAP/OAuth integration. Pulls emails down for local indexing.
  • Calendar - CalDAV/OAuth. Syncs events to structured database.
  • Banking - Plaid API. Fetches transaction history for financial queries and visualization.
  • Contacts & Photos - Additional connector types for comprehensive data aggregation.

Each connection card shows: connection status, item count, storage size, last sync time, and a RAG indexing progress bar indicating how many files are searchable through the AI agent.

The Personal Agent

The front-end experience is a conversational AI agent that users can name and interact with naturally. The agent understands context from all connected data sources and provides sourced answers with citation links back to the original documents.

Key capabilities:

  • Natural language queries across all personal data: "When is my mom's birthday?" or "What did I spend on groceries last month?"
  • Source citations with similarity match percentages
  • Privacy indicator showing whether data stayed local or was sent to the cloud
  • Three LLM modes: Local (100% private, Mistral), Cloud (faster, Claude API with anonymization), Auto (intelligent routing)
  • User-configurable agent name persisted across sessions

Engineering Challenges & Solutions

Building a platform that orchestrates Docker containers, local LLMs, vector databases, and streaming APIs surfaced several non-trivial engineering challenges. Here are the most instructive ones:

🔎 The Double-Text Streaming Bug
SYMPTOM
Every token from the LLM was duplicated: "Based Based on on the the provided provided context context..."
DEBUGGING PROCESS
Hypothesis 1: React 18 StrictMode double-invokes state updates in dev mode. Removed StrictMode. Did not fix it.
Hypothesis 2: Frontend state mutation. The streaming callback was mutating state directly instead of creating new objects. Fixed immutability pattern. Did not fix it.
Hypothesis 3: Backend SSE streaming. The httpx library's aiter_lines() was yielding duplicate lines from Ollama's NDJSON response stream.
FIX
Switched from streaming to non-streaming Ollama API call (stream: false). The complete response returns at once, eliminating any doubling. Confirmed the issue was isolated to the streaming pipeline layer.
LESSON
When debugging, isolate the problem layer by layer. The doubling could have originated in frontend rendering, frontend SSE parsing, backend SSE emitting, backend Ollama streaming, or Ollama itself. By eliminating the entire streaming pipeline, we confirmed where the bug lived without needing to inspect every layer.
🔌 Docker Container Isolation: Ollama Model Missing
SYMPTOM
Backend connected to Ollama but returned errors. Docker Ollama showed "total blobs: 0" despite Mistral being installed on the host.
FIX
Pulled the model inside the Docker container: docker exec agentic-node-ollama-1 ollama pull mistral
LESSON
Docker containers are isolated environments. The host Ollama and Docker Ollama are completely separate instances with separate model storage. This is a common misconception when transitioning from local development to containerized deployment.
⚡ Port Conflict: Ghost Container
SYMPTOM
Backend container failed to start: "Bind for 0.0.0.0:8000 failed: port is already allocated"
FIX
Used netstat -ano | findstr :8000 to identify the process, found a leftover container from a previous project still running. Stopped it with docker stop and removed it.
LESSON
Always check for port conflicts when containers fail to start. Docker Desktop's GUI or docker ps reveals all running containers. Auto-restart policies on old containers can cause persistent conflicts.
📁 Host Filesystem Access from Docker
SYMPTOM
Local folder connections showed "Folder not found" because the backend runs inside a Docker container that cannot see the host filesystem.
FIX
Mounted host drives (C:, D:, OneDrive) as read-only Docker volumes. Built a path translation layer that converts Windows host paths to container mount paths, and a visual folder browser API so users navigate their drives through the UI.
LESSON
Docker's isolation is a feature, not a bug. For a local-first platform that needs host filesystem access, volume mounts with explicit path translation provide security (read-only) while enabling functionality.

Technology Stack

Python FastAPI React TypeScript Tailwind CSS Vite FAISS Sentence-Transformers SQLite Ollama Mistral 7B Claude API Docker Compose spaCy NER RAG Architecture SSE Streaming PII Tokenization REST APIs Vector Embeddings Cosine Similarity

Roadmap

  • Cloud storage connectors (OneDrive, Google Drive, Dropbox, iCloud)
  • Email and calendar integration via IMAP/CalDAV
  • Banking integration via Plaid API for financial queries and visualization
  • Progressive token streaming for real-time response display
  • Encrypted-at-rest with user-held keys for optional cloud backup
  • Hardware vision: always-on router appliance with SSD storage
  • Developer marketplace for third-party connectors
← Back to All Work

Interested in Privacy-First AI?

Let's discuss how local-first architectures and personal AI agents can transform how people interact with their data.

Get in Touch