complete-computing-environment/universal_aggregator.org

5.2 KiB

Universal Aggregator

The Universal Aggregator is a powerful collection of tools designed to take a feed of data-items and store them in a Maildir folder. This can be used to create human-legible archive of messages, twitter posts, rss feeds and various other scraped-data. It is a suite of Golang operator programs and some scrapers written in JavaScript which I do not use.

Usage

this is deprecated; in theory [[id:arroyo/feed-cache][Arroyo Feed Cache Generator]] still works, but I don't use any of this any more.

Universal Aggregator is composed of a number of components, starting with the "grey goo spawner" ggs, which is a Golang program designed to run commands at intervals, and is designed to handle concurrency and work-sharing reasonably. In a really remarkable set of choices, ggs embeds a ggsrc file inside of a CONFIG_WRAPPER and then runs that like it's a shell script. The ggsrc file is for all intents and purposes a shell script and I can use this to my advantage to provide multiple paths for bringing data in to the system, running the shell script with an alternate rss command defined for example. I want to use this fact to also provide multiple paths for bringing data out of the system. By having functions which are defined differently depending on whether they are being run within ggs or Emacs a system for verifying feeds and inspecting their state can be built within org-mode, a small piece of Hypermedia which presents the state of the feed alongside the feed itself.

The files are generated and written to my homeserver with the Arroyo Feed Cache.

# nix-shell -p ansible --run 'ansible -i inventory -m assemble -a "remote_src=no src=ggs dest=/home/rrix/Maildir/ggsrc" fontkeming.fail'
nix-shell -p ansible --run 'ansible -i inventory --become -m systemd -a "name=ua state=restarted" fontkeming.fail'

The files come from places like:

Over time, I think I'll slowly factor those snippets in to pages dedicated to the source, many of them exist already, the work just needs to be done. I need to spend more time thinking about how to make this more managable, I want to build some prototypes for querying the org-roam database for things like the CCE loader table, maybe, this could be shared with the CCE to make itsl loaders better.

And some overflow:

ggs/70-overflow.ggs

NWS Seattle Area Forecast Discussion https://afd.fontkeming.fail/AFDSEW.xml News
King County Metro Alerts https://kcmetro-rss.buttslol.net/D/40 News
Lectronice's Tokipona Blog https://tokipona.lectronice.com/atom/d Media
Polygon - All https://www.polygon.com/rss/index.xml Media
VICE US - undefined US https://waypoint.vice.com/en_us/rss Media
Kotaku https://kotaku.com/rss Media
Privacy Enhancing Tech Symposium Papers https://content.sciendo.com/journalnewarticlerss/journals/popets/popets-overview.xml Tech
Privacy Enhancing Tech Symposium https://www.youtube.com/feeds/videos.xml?channel_id=UC-m6oi7a-8LffTk64J3tq-w Videos
My SongKick feeds http://acousti.co/feeds/upcoming/songkick-670 Art