You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

5.6 KiB

Melchior: Technical Documentation

Usage

TODO Write me.

Backend Architecture

This thing is pretty simple, right now; it's a simple backend that takes objects and shoves them in to PostgreSQL. The backend is written in Python Flask as a minimum viable prototype, soon to be rewritten in Clojure.

digraph {
      client -> backend
      backend -> postgres
      ingester -> backend
}

/compost/melchior-docs/src/branch/master/images/arch.png

Postgres was chosen as the data store due to the combination of interesting data-types, reliablity, and full-text search APIs.

The Data Model is … sort of silly. There's a raw_entities table with minimal denormalized columns, and a jsonb column with the raw data in it. I'm trying to be careful putting indices on this table, instead interfaces should be built on materialized views dependent on the type of data being pulled out.

Individual instances in the raw_entities table are referred to as Facts within document, code, and interfaces. Users are expected to publish the truth and verifying this is left as an excercise to the reader.

Fact Types

Facts, can be roughly defined as:

  • time, date and location in which an Event takes place
  • location of a digital artifact along with metadata to identify the artifact
  • location of a physical artifact along with metadata surrounding the artifact
  • object is a bit obtuse, but is a physical "thing"
  • event is a time period, two datetimes with metadata attached to them.
  • location is probably too abstract; it feels like an attribute attached to other facts, but there might be value in presenting it directly.
  • statement is a sentence or paragraph of either written text, recorded voice, or thoughts; linked statements form a conversation or train of thought depending on the source.
  • page is either an OCRd document, or the text of an HTML page.
  • photo is, well, a photo. A jpeg or png with metadata attached.
  • file is any other file, with whatever metadata can be extracted from them.
  • person is self-explanatory. Basically looking at a vcard and arbitrary key-values otherwise.

Frontends

Ingesters

Ingesters could come in many forms. The simplist is probably memex.ingesters.simple, which we can use as an example for future designs.

Current ingesters:

  • Simple URL importer
  • Qutebrowser web history

Others I want to build:

  • RSS feeds
  • Custom CSV/Line importers with a simple DSL:

    • Process GPS logs from phone's Tasker module or custom application
  • Walk filesystem, pull metadata from the files

    • MP3 tags
    • EXIF tags
    • Index Maildir files
  • Chat Logs
  • Events

    • Org-mode
    • `.ics` files
  • Contacts
  • Tweets
  • Other web browser histories

Ingesters should be designed to be idempotent, and be able to be run on a repeating system such as Cron or SystemD timers.

Of course, users will also be able to use a read-write query interface to input facts themselves, turning this in to a genera-purpose information storage, search and retrieval system.

Query Frontends

For now, I'm only going to build/support three frontends:

  • Command line search tool
  • Simple web-frontend that could be wrapped in Apache Cordova
  • Read-only publishing frontend

The read-only publishing frontend makes Facts which the user has marked with certain configurable tags available as a read-only interface, designed to publish subsets of Photos, GPS logs (run/bike times, "check-ins"), Documents (blog posts, short-form updates), and entire strings of Facts (users could publish a long-form research post, along with snapshots of the resources they collected in writing it), as HTML, RSS, etc.

Long-term I have this crazy stupid vision for a Matrix bot which will supercede these, and make them dumb interfaces using Matrix as an RPC to a central botserver. The bot would support e2e encryption, meaning that new devices couldn't see the query history of old devices, and it would require a verification stage for new devices. Plain-english queries would be deconstructed in to a search dialect and the results could be sent in-line, with the raw document attached as invisible JSON metadata, allowing rich clients to be built.

Even crazier, would be allowing other Memex instances to query yours, building a decentralized network of Memex instances. Obviously a security model will have to be put in to place for this to happen, but it'd be pretty nifty.

Streaming Frontends

Streaming frontends work by "tailing" either the Raw Entities table or one of the materialized views to "forward" those facts to other systems. Usecases include syndicating short-form update facts to Twitter, Facebook, etc, uploading photos to Flickr, and doing other "post-processing" on them. This is conceptual, and the interface for this has yet to be defined.

Hacking

There's a test suite; run make test with a PostgreSQL running. If you're adding code, or a module, please include tests.

License

Memex is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Memex is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with Memex. If not, see <http://www.gnu.org/licenses/>.