Brainspark Notes
Updated

Where this came from

I’ve used Roam Research for 5+ years and various text-based notes apps for decades.

Most notetaking apps require you to be the curator. You have to decide what to link, remember what you’ve already written, build and maintain structure, review and tend the “garden.”

I partially solved the input friction with a private Slack workspace. Channels for various contexts, an #inbox for dumping. Low friction capture. But the connections still don’t form themselves.

The core bet

People don’t fail to capture ideas. They fail to re-encounter them at the moment they would be useful.

Brainspark’s bet is that modern language models can understand what saved items are actually about (conceptually, not just lexically), place them in a shared semantic space, and resurface them later based on contextual relevance rather than recall or search.

If this works even occasionally, it creates value no traditional notes or bookmarking system can provide.

Design principles

Low-friction input. Saving something should be easier than deciding whether it’s worth saving.

Zero required curation. Users should never need to organize, tag, link, or maintain their archive.

Context over recall. The system should proactively surface information based on what the user is currently thinking about, not what they remember to search for.

Scarcity over completeness. Surfacing nothing is better than surfacing something weak or obvious.

Explanation builds trust. When something is surfaced, the system should explain why.

The litmus test for any feature: if it requires manual tagging, folder organization, or regular maintenance, it’s a regression. The user’s job is to throw things in and ask questions. Everything else is the system’s job.

Explicit non-goals

To protect the core value proposition, these are out of scope:

  • Manual tagging or folders
  • User-defined taxonomies or graphs
  • Required review or “inbox zero” workflows
  • Explicit item-to-item linking
  • Tasks, reminders, or to-do semantics
  • Optimizing for completeness, coverage, or recall

If a feature improves organization but increases user effort, it’s a regression unless it clearly improves surfaced insight quality.

Core concepts

Items

An item is anything you save: a URL, a text snippet, an image, a voice memo transcript, a daily note.

Every item always has an associated date. For notes, the date is explicit. For captured items, the date reflects when it was created or saved. The date anchors items in time and enables chronological browsing.

Concepts

Concepts are model-derived explanatory handles used internally to support retrieval, grouping, and explanation. They are non-canonical, not user-maintained, and subject to regeneration as models improve.

Concepts help the system think. They are not a taxonomy the user is responsible for managing.

Far associations

A far association is a surfaced connection between items that share low lexical overlap, originate from different surface domains, and share underlying conceptual structure (incentives, feedback loops, decentralization, etc).

Near-duplicates or obvious topical similarity do not qualify. This is the good stuff.

Daily notes

Each day has an optional, freeform note. What you’re working on, meeting notes, questions, half-formed ideas. Daily notes are processed like any other item (summarized, concepts extracted, embedded) and serve as contextual signals for resurfacing.

If strong associations exist, the system surfaces a small number of related past items. If none exist, it says nothing. Silence is a valid outcome.

Digests

A periodic summary (weekly by default) that includes what you saved and, optionally, unexpected connections. The unexpected connections section only appears when high-confidence far associations exist. The system never invents connections to fill space.

Primary experiences

Capture

Save something with minimal friction. Paste a link, type or paste text, upload an image, write notes during the day. The system acknowledges receipt and processes asynchronously.

The user’s job ends at capture.

Daily context

A date-anchored editor with chronological navigation. Inline surfaced items with explanations appear when confidence is high.

Passive resurfacing

The system proactively surfaces items when confidence is high: in the daily notes view and in periodic digests. Silence is a valid and desirable outcome.

Chronological archive

A calendar or timeline view. Click any date to see all items from that day. Supports reflection and temporal pattern recognition.

Retrieval

Users can ask questions or search their archive. This is a supporting feature, not the core value driver.

Processing model

All items go through the same pipeline:

  1. Content extraction. Fetch URL, OCR image, or use raw text.
  2. Summary generation. Claude generates a concise summary.
  3. Concept extraction. Claude extracts 3-7 high-level conceptual handles (not surface-level tags, but things like “distributed decision-making” or “information asymmetry”).
  4. Embedding. OpenAI generates a vector embedding.

All derived fields are cacheable and regenerable. Model outputs are treated as disposable.

Resurfacing logic

Conservative by design. Only return items above a high similarity threshold. Filter out near-duplicates and obvious matches. Only return genuine far associations. Generate explanations for why each was surfaced. Max 3 items at a time.

If nothing clears the bar, return nothing. Silence is valid.

Data import

Before the system is useful, import historical data from existing sources. One-time migration per source, run via CLI scripts.

Slack export. Import from a Slack workspace export, filtering by channel.

Roam Research export. Import pages or daily notes from a Roam JSON export.

Plain text / Markdown. Import from a folder of text files.

Import principles: preserve original timestamps for proper chronological ordering, batch process to manage API costs, deduplicate before creating new items, tag imports with source and batch ID.

Tech stack

  • Frontend: Next.js (PWA support, good DX)
  • Hosting: Vercel
  • Backend: Python (FastAPI)
  • Database: Supabase (Postgres + pgvector for vector search, auth built-in)
  • Embeddings: OpenAI text-embedding-3-small
  • LLM: Claude for reasoning and synthesis
  • Email: Resend or Postmark for digests
  • Scheduled jobs: Vercel Cron for weekly digest

MVP scope

Focus on proving the core loop: capture something, have it resurface later at a useful moment.

Includes:

  • Item creation (URL, text, image)
  • Processing pipeline (extract, summarize, embed)
  • Basic item list and detail views
  • Daily notes editor with auto-save
  • History/calendar view
  • Related items suggestions (conservative)
  • Weekly digest via email
  • RAG-based chat for retrieval
  • Data import scripts (Slack, Roam, plain text)

Excludes (future phases):

  • Telegram bot integration
  • Email input channel
  • Browser extension
  • iOS share sheet
  • Proactive push notifications
  • Concept clustering visualization

Success criteria:

  • The user experiences at least one genuinely surprising resurfaced item within 30 days of active use

Failure signal:

  • Regular usage without meaningful resurfacing
  • Reliance on manual search for value

Future possibilities

Input enhancements. Voice memos (transcribe and process), Kindle highlights import, Twitter/X bookmarks sync, podcast episode notes.

Advanced associations. Concept graph visualization, serendipity mode (surface random old items), time-based patterns (“You think about X every Q4”).

Output enhancements. Custom digest schedules, “brief me” feature (generate a briefing on any topic from your archive).

Guiding question

When evaluating any feature or decision:

“Is this helping past ideas meet the present moment, without asking the user to do extra work?”

If the answer is unclear, the feature should not be built yet.

Open questions

Technical:

  • Optimal similarity threshold for resurfacing? Too low and you get noise, too high and you miss connections.
  • How to handle very large archives? Embedding search should scale, but processing costs could add up.
  • Best approach for filtering “obvious” matches?

Product:

  • How aggressively should the system surface items? The PRD says conservative, but where exactly is the line?
  • What makes a good explanation for why something was surfaced?
  • Should there be any way to give feedback on surfaced items (helpful/not helpful)?

Data:

  • How long to retain detailed processing logs?
  • What’s the right batch size for imports to balance speed vs API costs?