5.6 KiB
Melchior: Technical Documentation
Usage
TODO Write me.
Backend Architecture
This thing is pretty simple, right now; it's a simple backend that takes objects and shoves them in to PostgreSQL. The backend is written in Python Flask as a minimum viable prototype, soon to be rewritten in Clojure.
digraph {
client -> backend
backend -> postgres
ingester -> backend
}
Postgres was chosen as the data store due to the combination of interesting data-types, reliablity, and full-text search APIs.
The Data Model is … sort of silly. There's a raw_entities
table with minimal denormalized
columns, and a jsonb column with the raw data in it. I'm trying to be careful putting indices on
this table, instead interfaces should be built on materialized views dependent on the type of data
being pulled out.
Individual instances in the raw_entities
table are referred to as Facts within document, code, and
interfaces. Users are expected to publish the truth and verifying this is left as an excercise to
the reader.
Fact Types
Facts, can be roughly defined as:
- time, date and location in which an Event takes place
- location of a digital artifact along with metadata to identify the artifact
- location of a physical artifact along with metadata surrounding the artifact
object
is a bit obtuse, but is a physical "thing"event
is a time period, two datetimes with metadata attached to them.location
is probably too abstract; it feels like an attribute attached to other facts, but there might be value in presenting it directly.statement
is a sentence or paragraph of either written text, recorded voice, or thoughts; linked statements form a conversation or train of thought depending on the source.page
is either an OCRd document, or the text of an HTML page.photo
is, well, a photo. A jpeg or png with metadata attached.file
is any other file, with whatever metadata can be extracted from them.person
is self-explanatory. Basically looking at a vcard and arbitrary key-values otherwise.
Frontends
Ingesters
Ingesters could come in many forms. The simplist is probably memex.ingesters.simple
, which we can
use as an example for future designs.
Current ingesters:
- Simple URL importer
- Qutebrowser web history
Others I want to build:
- RSS feeds
-
Custom CSV/Line importers with a simple DSL:
- Process GPS logs from phone's Tasker module or custom application
-
Walk filesystem, pull metadata from the files
- MP3 tags
- EXIF tags
- Index Maildir files
- Chat Logs
-
Events
- Org-mode
- `.ics` files
- Contacts
- Tweets
- Other web browser histories
Ingesters should be designed to be idempotent, and be able to be run on a repeating system such as Cron or SystemD timers.
Of course, users will also be able to use a read-write query interface to input facts themselves, turning this in to a genera-purpose information storage, search and retrieval system.
Query Frontends
For now, I'm only going to build/support three frontends:
- Command line search tool
- Simple web-frontend that could be wrapped in Apache Cordova
- Read-only publishing frontend
The read-only publishing frontend makes Facts which the user has marked with certain configurable tags available as a read-only interface, designed to publish subsets of Photos, GPS logs (run/bike times, "check-ins"), Documents (blog posts, short-form updates), and entire strings of Facts (users could publish a long-form research post, along with snapshots of the resources they collected in writing it), as HTML, RSS, etc.
Long-term I have this crazy stupid vision for a Matrix bot which will supercede these, and make them dumb interfaces using Matrix as an RPC to a central botserver. The bot would support e2e encryption, meaning that new devices couldn't see the query history of old devices, and it would require a verification stage for new devices. Plain-english queries would be deconstructed in to a search dialect and the results could be sent in-line, with the raw document attached as invisible JSON metadata, allowing rich clients to be built.
Even crazier, would be allowing other Memex instances to query yours, building a decentralized network of Memex instances. Obviously a security model will have to be put in to place for this to happen, but it'd be pretty nifty.
Streaming Frontends
Streaming frontends work by "tailing" either the Raw Entities table or one of the materialized views to "forward" those facts to other systems. Usecases include syndicating short-form update facts to Twitter, Facebook, etc, uploading photos to Flickr, and doing other "post-processing" on them. This is conceptual, and the interface for this has yet to be defined.
Hacking
There's a test suite; run make test
with a PostgreSQL running. If you're adding code, or a module,
please include tests.
License
Memex is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Memex is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with Memex. If not, see <http://www.gnu.org/licenses/>.