X-Git-Url: https://zdv2.bktei.com/gitweb/EVA-2020-02.git/blobdiff_plain/3df184eb452e1c7034697db5435d49c812f8b40b..791bffac5ada00fc38302eb71f07a40269801b34:/exec/bkgpslog-plan.org?ds=sidebyside diff --git a/exec/bkgpslog-plan.org b/exec/bkgpslog-plan.org index da23704..cc5c839 100644 --- a/exec/bkgpslog-plan.org +++ b/exec/bkgpslog-plan.org @@ -1,3 +1,6 @@ +2020-07-12T21:16Z; bktei> Note: This file is now retired since ~bklog~ +has replaced ~bkgpslog~. + * bkgpslog task list ** DONE Add job control for short buffer length CLOSED: [2020-07-02 Thu 16:04] @@ -27,11 +30,241 @@ when a new session is started. is checked and its status as a valid tar file is validated. This was done using a new function ~checkMakeTar~. ** DONE Add VERSION if output tar deleted between writes + CLOSED: [2020-07-02 Thu 20:22] 2020-07-02T20:21Z; bktei> Added bkgpslog-specified function magicWriteVersion() to be called whenever a new time-stamped ~VERSION~ file needs to be generated and appended to the output tar file ~PATHOUT_TAR~. +** DONE Rewrite buffer loop to reduce lag between gpspipe runs + + CLOSED: [2020-07-03 Fri 20:57] +2020-07-03T17:10Z; bktei> As is, there is still a 5-6 second lag +between when ~gpspipe~ times out at the end of a buffer round and when +~gpspipe~ is called by the subsequent buffer round. I believe this can +be reduced by moving variable manipulations inside the +asynchronously-executed magicWriteBuffer() function. Ideally, the +while loop should look like: + +#+BEGIN_EXAMPLE +while( $SECONDS < $SCRIPT_TTL); do + gpspipe-r > "$DIR_TMP"/buffer.nmea + writeBuffer & +done +#+END_EXAMPLE +2020-07-03T20:56Z; bktei> I simplified it futher to something like +this: +#+BEGIN_EXAMPLE +while( $SECONDS < $SCRIPT_TTL); do + writeBuffer & + sleep $SCRIPT_TTL +done +#+END_EXAMPLE + +Raspberry Pi Zero W shows approximately 71ms of drift per buffer round +with 10s buffer. +** DONE Feature: Recipient watch folder + CLOSED: [2020-07-12 Sun 21:08] +2020-07-03T21:28Z; bktei> This feature would be to scan the contents +of a specified directory at the start of every buffer round in order +to determine encryption (age) recipients. This would allow a device to +dynamically encrypt location data in response to automated changes +made by other tools. For example, if such a directory were +synchronized via Syncthing and changes to such a directory were +managed by a trusted remote server, then that server could respond to +human requests to secure location data. + +Two specific privacy subfeatures come to mind: + +1. Parallel encryption: Given a set of ~n~ public keys, encrypt data + with a single ~age~ command with options causing all ~n~ pubkeys to + be recipients. In order to decrypt the data, any individual private + key could be used. No coordination between key owners would be + required to decrypt. + +2. Sequential encryption: Given a set of ~n~ public keys, encrypt data + with ~n~ sequential ~age~ commands all piped in series with each + ~age~ command utilizing only one of the ~n~ public keys. In order + to decrypt the data, all ~n~ private keys would be required to + decrypt the data. Since coordination is required, it is less + convenient than parallel encryption. + +In either case, a directory would be useful for holding configuration +files specifying how to execute which or combination of which features +at the start of every buffer round. + +I don't yet know how to program the rules, although I think it'd be +easier to simply add an option providing ~bkgpslog~ with a directory +to watch. When examining the directory, check for a file with the +appropriate file extension (ex: .pubkey) and then read the first line +into the script's pubKey array. + +2020-07-12T21:08Z; bktei> ~-R~ watch directory option added in ~bkgpslog~ ver +~0.4.0~. + +** DONE Feature: Simplify option to reduce output size + CLOSED: [2020-07-12 Sun 21:15] + +~gpsbabel~ [[https://www.gpsbabel.org/htmldoc-development/filter_simplify.html][features]] a ~simplify~ option to trim data points from GPS +data. There are several methods for prioritizing which points to keep +and which to trim, although the following seems useful given some +sample data I've recorded in a test run of ninfacyzga-01: + +#+BEGIN_EXAMPLE +gpsbabel -i nmea -f all.nmea -x simplify,error=10,relative -o gpx \ +-F all-simp-rel-10.gpx +#+END_EXAMPLE + +An error level of "10" with the "relative" option seems to retain all +desireable features for GPS data while reducing the number of points +along straightaways. File size is reduced by a factor of +about 11. Noise from local stay-in-place drift isn't removed; a +relative error of about 1000 is required to remove stay-in-place drift +noise but this also trims all but 100m-size features of the recorded +path. A relative error of 1000 reduces file size by a factor of +about 450. + +#+BEGIN_EXAMPLE + 67M relerror-0.001.kml + 66M relerror-0.01.kml + 58M relerror-0.1.kml + 21M relerror-1.kml +5.8M relerror-10.kml +797K relerror-100.kml +152K relerror-1000.kml +#+END_EXAMPLE + +2020-07-12T21:13Z; bktei> Instead of programming data simplification +in ~bkgpslog~, the data simplification step should be performed via +~bklog~'s ~-p~ option which specifies a processing command string to +be ~eval~'d before data is compressed, encrypted, and written to +disk. In other words, handling the simplification of data beyond +allowing for a general command string specified by ~-p~ is outside the +scope of ~bkgpslog~ or its successor ~bklog~. + +** DONE Feature: Generalize bkgpslog to bklog function + CLOSED: [2020-07-12 Sun 21:11] +2020-07-05T02:42Z; bktei> Transform ~bkgpslog~ into a modular +component called ~bklog~ such that it processes a stdout stream of any +external command, not just ~gpspipe -r~. This would permit reuse of +the ~bkgpslog~ code for logging not just GPS data but things like +pressure, temperature, system statistics, etc. +2020-07-05T16:35Z; bktei> +: bklog -r age1asdf -o log.tar # encrypt/compress stdin to log.tar +: bklog -x -f log.tar -i age.key -O /tmp # extract and decrypt + +Making ~bklog~ follow the [[https://en.wikipedia.org/wiki/Unix_philosophy][Unix philosophy]] means that it shouldn't care +what kind of text is fed to it. + +*** ~bklog~ Design vs. Unix Philosophy +**** Pubkey dir watching +The feature of periodically checking a directory for changes in the +pubkeys it contains should be justified by its usefulness; if the +complexity cannot be justified then the feature should be removed. +**** Defaults vs options +Many options can cause the tool to become complex in unjustifiable +ways. Currently I am adding options because I want the ability to +modify the script's behavior without having to modify the source code +on the machine in which the code is running. I should consider +removing features at some point and having the program force defaults +on the user. For example, allowing the specification of a temporary +directory, while useful for me, is probably not useful for most people +who don't know or care about the difference between ~/tmp~ and +~/dev/shm~. +**** Script time to live (TTL) +I initially implemented a script time-to-live feature because I was +unsure in my ability to program script that could run for long periods +of time without causing a runaway usage of memory. I still think it's +a good idea to offer a script TTL option to the user but I think the +default should be to simply run forver. + +2020-07-12T21:11Z; bktei> ~bklog~ script created and tested as of +commit ~aedd19f~. + +** DONE TODO: Evaluate ~rsyslog~ as stand-in for this work + CLOSED: [2020-07-12 Sun 21:09] +2020-07-05T02:57Z; bktei> I searched for "debian iot logging" ("iot" +as in "Internet of Things", the current buzzword for small low-power +computers being used to provide microservices for owners in their own +home) and came across several search results mentioning ~syslog~ and +~rsyslog~. + +https://www.thissmarthouse.net/consolidating-iot-logs-into-mysql-using-rsyslog/ +https://rsyslog.readthedocs.io/en/latest/tutorials/tls.html +https://serverfault.com/questions/20840/how-would-you-send-syslog-securely-over-the-public-internet +https://www.rsyslog.com/ + +My impression is that ~rsyslog~ is a complex software package designed +to offer many features, some of which possibly might satisfy my +needs. + +However, as stated in the repository README, the objective of the +~ninfacyzga-01~ project is "Observing facts of the new". This means +that the goal is not only to record location data but any data that +can be captured by a sensor. This means the capture of the following +environmental phenomena are within the scope of this device: + +*** Sounds (microphone) +*** Light (camera) +*** Temperature (thermocouple) +*** Air Pressure (barometer) +*** Acceleration Vector (acceleromter / gyroscope) +*** Magnetic Field Vector (magnetometer) + +This brings up the issue of respecting privacy of others in shared +spaces through which ~ninfacyzga-01~ may pass through. ~ninfacyzga-01~ +should encrypt data it records according to rules set by its +owner. + +One permissive rule could be that if ~ninfacyzga-01~ detects that a +person (let's call her Alice) enters a room, it should add Alice's +encryption public key to the list of recipients against which it +encrypts data without Alice having to know how ~ninfacyzga-01~ is +programmed (she might have a ~calkuptcana~ agent on her person that +broadcasts her privacy preferences). Meanwhile, ~ninfacyzga-01~ may +publish its observations to a repository that Alice and other members +of the shared communal space have access to (ex: a read-only shared +directory on a local network WiFi). Alice could download all the files +in the shared repository but she would only be able to decrypt files +generated when she was physically near enough to ~ninfacyzga-01~ for +it to detect that her presence was within some spatial boundary. + +A more restrictive rule could resemble the permissive rule in that +~ninfacyzga-01~ uses Alice's encryption public key only when she is +physically near by, except that it encrypts logged files against +public keys in a sequential manner. This would mean that all people +who were near ~ninfacyzga-01~ would have to pass around each log file +to eachother so that they could decrypt the content. + +That said, according to [[https://www.rsyslog.com/doc/master/tutorials/database.html][this ~rsyslog~ page]], ~rsyslog~ is more a data +wrangling system for collecting data from disparate sources of +different types and outputting data to text files on disk than a +system committed to the server-client model of database storage. So, I +think converting ~bkgpslog~ into a ~bklog~ script that appends +encrypted and compressed data to a tar file for later extraction +(possibly the same script with future features) would be best. + +2020-07-12T21:10Z; bktei> rsyslog is outside the scope of what +~bkgpslog~ does (record location observations). A different tool +should be used to retrieve and synchronize data. The dumb storage +method of "tar files in a syncthing folder" works for now. +** TODO: Place persistent recip. updates in asynchronous coproc +2020-07-06T19:37Z; bktei> In order to update the recipient list, the +magicParseRecipientDir() function needs to be run each buffer period +in order to scan for changes in the recipient list. However, such a +scan takes time; if the magicGatherWriteBuffer() function must pause +until magicParseRecipientDir() completes, then a significant pause +between buffer sessions may occur, causing detectable gaps in location +data between buffer rounds. + +I looked for ways in which I might start magicParseRecipientDir() +asynchronously immediately before running the data collection command +and then collect its output at the start of the next buffer round. One +way using the ~coproc~ Bash built-in is described [[https://stackoverflow.com/a/20018504/10850071][here]]. I'd have to +make the asynchronous function output the recipient list to stdout +which would then be ~read~ into the ~recPubKeysValid~ array in the +main loop. However, for now, I'm putting the magicParseRecipientDir() +as-is in the main loop and accepting the delay for now. * bkgpslog narrative ** Initialize environment *** Init variables