Inside Foliome: How the Agent Actually Works

Inside Foliome: How the Agent Actually Works

This is the technical companion to Foliome: Your Money, Your Machine, Your Agent. That post covers the why. This one covers the how.

Two Layers of Data

Foliome stores financial data in two layers, and the separation is deliberate.

Layer 1 is raw JSON. Every sync writes the institution's output — transactions, balances, holdings — as a JSON file to data/sync-output/. These files are schema-agnostic and human-reviewable. If something goes wrong with import, you can go back to the source. If a bank changes its export format, the raw data is preserved exactly as it was downloaded.

Layer 2 is a normalized SQLite database. An import step reads Layer 1 JSON and writes it into canonical tables: balances, transactions, investment_transactions, holdings, statement_balances, sync_status. This is what the agent queries. Deduplication uses a composite key of institution|account_id|txn_date|amount|description_hash with upsert logic, so re-importing the same data is safe.

The two-layer design means you never lose raw data, and you always have a queryable database. If I want to change the schema — add a column, split a table — I can re-run the import from Layer 1 without re-syncing every institution.

How the Agent Builds Integrations

Foliome doesn't ship "support for 10 banks." It ships primitives and a skill called /learn-institution that uses those primitives to build an integration with any institution.

You point the agent at a bank URL. It launches a Playwright browser, visually explores the login page, and starts building a deterministic config step by step. It figures out the login selectors, the MFA flow, how to navigate to transaction downloads, and what the download pattern looks like.

The agent handles the full taxonomy of bank website patterns:

  • Iframes and shadow DOM components
  • Custom web elements and backdrop overlays
  • Landing pages that hide the login form behind a button
  • Multi-step login flows with method selection
  • Calendar modals for date range selection
  • Asynchronous report generation (request, wait, download)

Every bank has its own combination of these problems. Each institution I added forced the primitives to grow to handle the next pattern.

The result is a deterministic Playwright config file. First run is guided exploration with the LLM in the loop, reasoning through screenshots. Every run after that replays the config — no screenshots, no reasoning, no tokens. Milliseconds, not minutes.

Six Download Patterns + API Connectors

I didn't set out to build a taxonomy of bank download patterns, but six distinct browser-based ones emerged as I added institutions:

  1. Central export — One download dialog exports all accounts. The ideal.
  2. Per-account navigation — Navigate to each account individually, download one at a time.
  3. Calendar date range — Select dates from a modal before exporting.
  4. Single export button — One click, everything downloads. Bless these banks.
  5. Async report — Request a report, wait for generation, then download the result.
  6. PDF statement — No CSV export available. LiteParse extracts text from the PDF, the agent parses out transactions.

Separately, API connectors handle institutions that offer REST APIs with OAuth 2.0 or bearer token authentication (Schwab, Mercury). These run in their own phase during sync — no browser needed, finishing in seconds.

Nine anonymized templates in readers/institutions/templates/ cover every pattern, with a GUIDE.md that maps common bank behaviors to the right template. If you're adding your own bank, the guide gives the agent a head start on recognizing which pattern it's dealing with.

Sync Orchestration

Running /sync kicks off all institutions in parallel. The orchestrator runs in two phases:

Phase 1: API connectors. Schwab (OAuth 2.0) and Mercury (bearer token) finish in seconds. No browser needed.

Phase 2: Browser-based institutions. All remaining banks launch simultaneously, each in its own Playwright context with a persistent Chrome profile.

When MFA triggers, the system sends a notification to Telegram with the institution name and MFA type. If multiple banks trigger MFA at the same time, you can respond with all codes in a single message — the orchestrator routes each code to the correct bank session via a file-based MFA bridge (data/mfa-pending/). Each institution writes a pending file, the bridge matches incoming codes to pending requests, and the sync process reads them back.

MFA types handled: SMS, email, push notifications, device codes, TOTP (authenticator apps). For email-based MFA, there's a Gmail API integration that reads the code directly from your inbox.

Persistent Chrome profiles mean MFA triggers less often over time. Most banks remember the authenticated session after the first login. Some of mine haven't asked for MFA in weeks.

Graduated Error Recovery

When something breaks during sync, the system doesn't just retry and hope. It escalates through four levels:

  1. Retry — Try the same action again. Handles transient network issues and slow-loading elements.
  2. Self-recover — Retry using recovery selectors defined in the config. If the bank shows an interstitial, an overlay, or a "we've updated our terms" modal, the config knows how to dismiss it and continue.
  3. Adaptive bridge — The agent takes a screenshot of the current page, reasons about what's different from what it expected, and attempts to navigate past the problem. If it succeeds, it saves what it learned back to the config so the next run handles it at level 2 instead.
  4. Skip + notify — If all else fails, skip the institution and send a Telegram notification with the screenshot and error context. The other banks still sync successfully.

This is what "the agent maintains its own integrations" means in practice. When a bank redesigns its site, the first sync might hit level 3. The adaptive bridge figures out the new layout, updates the config, and every subsequent sync runs clean at level 0. No support tickets. No waiting.

Transaction Classification Pipeline

Transactions get categorized through a four-tier pipeline that costs nothing to run:

Tier 1: Account-type implied. Transfers from a mortgage account are "Mortgage." Investment account transactions are "Investment." Loan payments from loan accounts are "Loan Payment." Six categories are inferred from account type alone, before any model runs.

Tier 2: Merchant rules. A local config file maps known merchant strings to categories. "TRADER JOE'S" → Groceries. "NETFLIX" → Subscriptions. Instant lookup, zero inference.

Tier 3: Fine-tuned DistilBERT v2. For unknown merchants, a locally-running ONNX model classifies the transaction. I fine-tuned this specifically on US bank transaction descriptions. Off-the-shelf classifiers fail on real bank data because the descriptions are messy: truncated merchant names, reference numbers mixed in, inconsistent formatting across institutions. The model and training dataset are both open source.

Tier 4: Bank category fallback. If the bank's export includes its own categorization field, use it as a last resort.

The /category-override skill lets you permanently reclassify a merchant. The correction persists for all future transactions from that merchant. I've overridden about 20 merchants total out of thousands of transactions — mostly small local businesses the model hadn't seen.

Statement Balance Extraction

Transaction downloads give you spending data, but they don't always give you accurate balances. Banks calculate balances including pending transactions, holds, and timing differences that don't appear in CSV exports.

Foliome extracts statement balances through five patterns, discovered automatically during /learn-institution setup:

PatternSourceMethod
S-APDF statementsLiteParse text extraction
S-BHTML statement list pagesDOM scraping
S-CDashboard pageText extraction from balance display
S-DNot availableSome institutions don't expose it
S-ECSV exportsBalance column in the downloaded file

This gives the morning brief and net worth calculations accurate numbers rather than approximations derived from summing transactions.

Security Model

Credentials are isolated from the agent by design:

  • Bank configs reference environment variable names, not credential values
  • At runtime, a Node script reads credentials from your local store and Playwright types them into the bank's login form
  • The LLM is never in the loop during login — credential injection is pure automation
  • Two credential storage options: .env file with dotenvx encryption, or Bitwarden vault integration
  • A security gate verifies domain and HTTPS before any credentials are entered — the automation won't type into a page it doesn't recognize
  • Parameterized SQL for all database writes — no injection vectors
  • Prompt injection defense in sanitize-text.js strips potentially malicious content from bank-sourced text before it enters the agent context
  • All data stays local: JSON files and SQLite, never transmitted externally

The Dashboard: Telegram Mini App

The dashboard is a React + TypeScript SPA served by a local Node.js server, accessed as a Telegram Mini App. Seven tabs:

TabWhat It Shows
BriefPersonalized daily financial narrative powered by agent memory and live data
OverviewAccount balances and net worth across all institutions
TransactionsSearchable, filterable transaction history
BudgetMonthly spending by category against configurable limits
PortfolioInvestment holdings across brokerage, retirement, and 529 accounts
SubscriptionsRecurring charge detection and tracking
WikiAgent knowledge base — goals, preferences, patterns, reflections

Authentication uses Telegram's native HMAC-SHA256 initData verification. The server validates that requests genuinely come from Telegram, issues session tokens with expiry, and serves the SPA. The dashboard queries the same SQLite database that powers the agent's skills — same data, different interface.

Agent Memory: Wiki Agent Memory

The agent's persistent memory follows the Wiki Agent Memory pattern described by Andrej Karpathy — interlinked markdown files the agent reads and writes as part of normal conversation. No external dependencies, no vector database, no MCP server. Just files at data/wiki/, organized by type: goals, preferences, concerns, context, patterns, and monthly reflections.

When the morning brief says "the 529 contribution deadline is approaching and you mentioned wanting to max it out this year," that context came from a conversation weeks ago. The agent captures financial intent mid-conversation by spawning a background subagent to write the wiki page while the main conversation continues. It recalls by reading the wiki index to find relevant pages. The /reflect skill periodically consolidates duplicates, updates goals with real data from the database, discovers spending patterns, and writes monthly reflections.

The agent doesn't start from scratch every session. It maintains a mental model of your financial life and builds on it over time.

Adding Your Own Banks

Foliome ships nine anonymized institution templates covering every download pattern. The process:

  1. /getting-started — Guided first-bank setup that walks you through credentials, bank exploration, first sync, MFA, and import.
  2. Add credentials — To your .env file or Bitwarden vault.
  3. /learn-institution — Point the agent at the bank URL. It walks through login, MFA, navigation, and download step by step, building a deterministic config.
  4. /sync — The new institution syncs alongside everything else.

The GUIDE.md in readers/institutions/templates/ maps common bank patterns to templates, so the agent recognizes what it's looking at. If your bank uses a central export dialog, the agent pulls the central export template. If it's per-account navigation with a calendar date picker, it combines those templates. The primitives compose.

The Full Stack

Putting it all together:

/learn-institution  →  Deterministic Playwright config
        ↓
     /sync          →  Phase 1: API connectors (seconds)
        ↓               Phase 2: Browser automation (parallel, with MFA routing)
        ↓               Error recovery: retry → self-recover → adaptive → skip
        ↓
   Layer 1 JSON     →  Raw sync output, schema-agnostic, human-reviewable
        ↓
   Layer 2 SQLite   →  Normalized tables, deduped, queryable
        ↓
   classify.js      →  4-tier pipeline: implied → rules → DistilBERT → fallback
        ↓
   Agent skills     →  Morning brief, spending queries, alerts, reminders
        ↓
   Dashboard        →  Telegram Mini App with 7 tabs
        ↓
   Wiki memory      →  Persistent memory across sessions (Karpathy's Wiki Agent Memory)

Ten skills. Ten institutions. Five years of transaction history. A dashboard on my phone. An agent that builds and maintains its own integrations, classifies transactions locally, and remembers what matters.

Everything local. Everything open source.

Foliome: github.com/wnstnb/foliome