| 1 | 2020-07-12T21:16Z; bktei> Note: This file is now retired since ~bklog~ |
| 2 | has replaced ~bkgpslog~. |
| 3 | |
| 4 | * bkgpslog task list |
| 5 | ** DONE Add job control for short buffer length |
| 6 | CLOSED: [2020-07-02 Thu 16:04] |
| 7 | 2020-07-02T14:56Z; bktei> File write operations were bundled into a |
| 8 | magicWriteBuffer function that is called then detached from the script |
| 9 | shell (job control), but the detached job is not tracked by the main |
| 10 | script. A problem may arise if two instances of magicWriteBuffer |
| 11 | attempt to write to the same tar simultaneously. Two instances of |
| 12 | magicWriteBuffer may exist if the buffer length is low (ex: 1 second); |
| 13 | the default buffer length of 60 seconds should reduce the probability |
| 14 | of a collision but it should be possible for the main script to track |
| 15 | the process ID of a magicWriteBuffer() as soon as it detaches and then |
| 16 | checking (via ~$!~ as described [[https://bashitout.com/2013/05/18/Ampersands-on-the-command-line.html][here]]) that the process is still alive. |
| 17 | 2020-07-02T15:23Z; bktei> I found that the Bash ~wait~ built-in can be |
| 18 | used to delay processing until a specified job completes. The ~wait~ |
| 19 | command will pause script execution until all backgrounded processes |
| 20 | complete. |
| 21 | 2020-07-02T16:03Z; bktei> Added ~wait~. |
| 22 | ** DONE Rewrite tar initialization function |
| 23 | CLOSED: [2020-07-02 Thu 17:23] |
| 24 | 2020-07-02T17:23Z; bktei> Simplify tar initialization function so |
| 25 | VERSION file is used to test appendability of tar as well as to mark |
| 26 | when a new session is started. |
| 27 | ** DONE Consolidate tar checking/creation into function |
| 28 | CLOSED: [2020-07-02 Thu 18:33] |
| 29 | 2020-07-02T18:33Z; bktei> Simplify how the output tar file's existence |
| 30 | is checked and its status as a valid tar file is validated. This was |
| 31 | done using a new function ~checkMakeTar~. |
| 32 | ** DONE Add VERSION if output tar deleted between writes |
| 33 | |
| 34 | CLOSED: [2020-07-02 Thu 20:22] |
| 35 | 2020-07-02T20:21Z; bktei> Added bkgpslog-specified function |
| 36 | magicWriteVersion() to be called whenever a new time-stamped ~VERSION~ |
| 37 | file needs to be generated and appended to the output tar file |
| 38 | ~PATHOUT_TAR~. |
| 39 | ** DONE Rewrite buffer loop to reduce lag between gpspipe runs |
| 40 | |
| 41 | CLOSED: [2020-07-03 Fri 20:57] |
| 42 | 2020-07-03T17:10Z; bktei> As is, there is still a 5-6 second lag |
| 43 | between when ~gpspipe~ times out at the end of a buffer round and when |
| 44 | ~gpspipe~ is called by the subsequent buffer round. I believe this can |
| 45 | be reduced by moving variable manipulations inside the |
| 46 | asynchronously-executed magicWriteBuffer() function. Ideally, the |
| 47 | while loop should look like: |
| 48 | |
| 49 | #+BEGIN_EXAMPLE |
| 50 | while( $SECONDS < $SCRIPT_TTL); do |
| 51 | gpspipe-r > "$DIR_TMP"/buffer.nmea |
| 52 | writeBuffer & |
| 53 | done |
| 54 | #+END_EXAMPLE |
| 55 | 2020-07-03T20:56Z; bktei> I simplified it futher to something like |
| 56 | this: |
| 57 | #+BEGIN_EXAMPLE |
| 58 | while( $SECONDS < $SCRIPT_TTL); do |
| 59 | writeBuffer & |
| 60 | sleep $SCRIPT_TTL |
| 61 | done |
| 62 | #+END_EXAMPLE |
| 63 | |
| 64 | Raspberry Pi Zero W shows approximately 71ms of drift per buffer round |
| 65 | with 10s buffer. |
| 66 | ** DONE Feature: Recipient watch folder |
| 67 | CLOSED: [2020-07-12 Sun 21:08] |
| 68 | 2020-07-03T21:28Z; bktei> This feature would be to scan the contents |
| 69 | of a specified directory at the start of every buffer round in order |
| 70 | to determine encryption (age) recipients. This would allow a device to |
| 71 | dynamically encrypt location data in response to automated changes |
| 72 | made by other tools. For example, if such a directory were |
| 73 | synchronized via Syncthing and changes to such a directory were |
| 74 | managed by a trusted remote server, then that server could respond to |
| 75 | human requests to secure location data. |
| 76 | |
| 77 | Two specific privacy subfeatures come to mind: |
| 78 | |
| 79 | 1. Parallel encryption: Given a set of ~n~ public keys, encrypt data |
| 80 | with a single ~age~ command with options causing all ~n~ pubkeys to |
| 81 | be recipients. In order to decrypt the data, any individual private |
| 82 | key could be used. No coordination between key owners would be |
| 83 | required to decrypt. |
| 84 | |
| 85 | 2. Sequential encryption: Given a set of ~n~ public keys, encrypt data |
| 86 | with ~n~ sequential ~age~ commands all piped in series with each |
| 87 | ~age~ command utilizing only one of the ~n~ public keys. In order |
| 88 | to decrypt the data, all ~n~ private keys would be required to |
| 89 | decrypt the data. Since coordination is required, it is less |
| 90 | convenient than parallel encryption. |
| 91 | |
| 92 | In either case, a directory would be useful for holding configuration |
| 93 | files specifying how to execute which or combination of which features |
| 94 | at the start of every buffer round. |
| 95 | |
| 96 | I don't yet know how to program the rules, although I think it'd be |
| 97 | easier to simply add an option providing ~bkgpslog~ with a directory |
| 98 | to watch. When examining the directory, check for a file with the |
| 99 | appropriate file extension (ex: .pubkey) and then read the first line |
| 100 | into the script's pubKey array. |
| 101 | |
| 102 | 2020-07-12T21:08Z; bktei> ~-R~ watch directory option added in ~bkgpslog~ ver |
| 103 | ~0.4.0~. |
| 104 | |
| 105 | ** DONE Feature: Simplify option to reduce output size |
| 106 | CLOSED: [2020-07-12 Sun 21:15] |
| 107 | |
| 108 | ~gpsbabel~ [[https://www.gpsbabel.org/htmldoc-development/filter_simplify.html][features]] a ~simplify~ option to trim data points from GPS |
| 109 | data. There are several methods for prioritizing which points to keep |
| 110 | and which to trim, although the following seems useful given some |
| 111 | sample data I've recorded in a test run of ninfacyzga-01: |
| 112 | |
| 113 | #+BEGIN_EXAMPLE |
| 114 | gpsbabel -i nmea -f all.nmea -x simplify,error=10,relative -o gpx \ |
| 115 | -F all-simp-rel-10.gpx |
| 116 | #+END_EXAMPLE |
| 117 | |
| 118 | An error level of "10" with the "relative" option seems to retain all |
| 119 | desireable features for GPS data while reducing the number of points |
| 120 | along straightaways. File size is reduced by a factor of |
| 121 | about 11. Noise from local stay-in-place drift isn't removed; a |
| 122 | relative error of about 1000 is required to remove stay-in-place drift |
| 123 | noise but this also trims all but 100m-size features of the recorded |
| 124 | path. A relative error of 1000 reduces file size by a factor of |
| 125 | about 450. |
| 126 | |
| 127 | #+BEGIN_EXAMPLE |
| 128 | 67M relerror-0.001.kml |
| 129 | 66M relerror-0.01.kml |
| 130 | 58M relerror-0.1.kml |
| 131 | 21M relerror-1.kml |
| 132 | 5.8M relerror-10.kml |
| 133 | 797K relerror-100.kml |
| 134 | 152K relerror-1000.kml |
| 135 | #+END_EXAMPLE |
| 136 | |
| 137 | 2020-07-12T21:13Z; bktei> Instead of programming data simplification |
| 138 | in ~bkgpslog~, the data simplification step should be performed via |
| 139 | ~bklog~'s ~-p~ option which specifies a processing command string to |
| 140 | be ~eval~'d before data is compressed, encrypted, and written to |
| 141 | disk. In other words, handling the simplification of data beyond |
| 142 | allowing for a general command string specified by ~-p~ is outside the |
| 143 | scope of ~bkgpslog~ or its successor ~bklog~. |
| 144 | |
| 145 | ** DONE Feature: Generalize bkgpslog to bklog function |
| 146 | CLOSED: [2020-07-12 Sun 21:11] |
| 147 | 2020-07-05T02:42Z; bktei> Transform ~bkgpslog~ into a modular |
| 148 | component called ~bklog~ such that it processes a stdout stream of any |
| 149 | external command, not just ~gpspipe -r~. This would permit reuse of |
| 150 | the ~bkgpslog~ code for logging not just GPS data but things like |
| 151 | pressure, temperature, system statistics, etc. |
| 152 | 2020-07-05T16:35Z; bktei> |
| 153 | : bklog -r age1asdf -o log.tar # encrypt/compress stdin to log.tar |
| 154 | : bklog -x -f log.tar -i age.key -O /tmp # extract and decrypt |
| 155 | |
| 156 | Making ~bklog~ follow the [[https://en.wikipedia.org/wiki/Unix_philosophy][Unix philosophy]] means that it shouldn't care |
| 157 | what kind of text is fed to it. |
| 158 | |
| 159 | *** ~bklog~ Design vs. Unix Philosophy |
| 160 | **** Pubkey dir watching |
| 161 | The feature of periodically checking a directory for changes in the |
| 162 | pubkeys it contains should be justified by its usefulness; if the |
| 163 | complexity cannot be justified then the feature should be removed. |
| 164 | **** Defaults vs options |
| 165 | Many options can cause the tool to become complex in unjustifiable |
| 166 | ways. Currently I am adding options because I want the ability to |
| 167 | modify the script's behavior without having to modify the source code |
| 168 | on the machine in which the code is running. I should consider |
| 169 | removing features at some point and having the program force defaults |
| 170 | on the user. For example, allowing the specification of a temporary |
| 171 | directory, while useful for me, is probably not useful for most people |
| 172 | who don't know or care about the difference between ~/tmp~ and |
| 173 | ~/dev/shm~. |
| 174 | **** Script time to live (TTL) |
| 175 | I initially implemented a script time-to-live feature because I was |
| 176 | unsure in my ability to program script that could run for long periods |
| 177 | of time without causing a runaway usage of memory. I still think it's |
| 178 | a good idea to offer a script TTL option to the user but I think the |
| 179 | default should be to simply run forver. |
| 180 | |
| 181 | 2020-07-12T21:11Z; bktei> ~bklog~ script created and tested as of |
| 182 | commit ~aedd19f~. |
| 183 | |
| 184 | ** DONE TODO: Evaluate ~rsyslog~ as stand-in for this work |
| 185 | CLOSED: [2020-07-12 Sun 21:09] |
| 186 | 2020-07-05T02:57Z; bktei> I searched for "debian iot logging" ("iot" |
| 187 | as in "Internet of Things", the current buzzword for small low-power |
| 188 | computers being used to provide microservices for owners in their own |
| 189 | home) and came across several search results mentioning ~syslog~ and |
| 190 | ~rsyslog~. |
| 191 | |
| 192 | https://www.thissmarthouse.net/consolidating-iot-logs-into-mysql-using-rsyslog/ |
| 193 | https://rsyslog.readthedocs.io/en/latest/tutorials/tls.html |
| 194 | https://serverfault.com/questions/20840/how-would-you-send-syslog-securely-over-the-public-internet |
| 195 | https://www.rsyslog.com/ |
| 196 | |
| 197 | My impression is that ~rsyslog~ is a complex software package designed |
| 198 | to offer many features, some of which possibly might satisfy my |
| 199 | needs. |
| 200 | |
| 201 | However, as stated in the repository README, the objective of the |
| 202 | ~ninfacyzga-01~ project is "Observing facts of the new". This means |
| 203 | that the goal is not only to record location data but any data that |
| 204 | can be captured by a sensor. This means the capture of the following |
| 205 | environmental phenomena are within the scope of this device: |
| 206 | |
| 207 | *** Sounds (microphone) |
| 208 | *** Light (camera) |
| 209 | *** Temperature (thermocouple) |
| 210 | *** Air Pressure (barometer) |
| 211 | *** Acceleration Vector (acceleromter / gyroscope) |
| 212 | *** Magnetic Field Vector (magnetometer) |
| 213 | |
| 214 | This brings up the issue of respecting privacy of others in shared |
| 215 | spaces through which ~ninfacyzga-01~ may pass through. ~ninfacyzga-01~ |
| 216 | should encrypt data it records according to rules set by its |
| 217 | owner. |
| 218 | |
| 219 | One permissive rule could be that if ~ninfacyzga-01~ detects that a |
| 220 | person (let's call her Alice) enters a room, it should add Alice's |
| 221 | encryption public key to the list of recipients against which it |
| 222 | encrypts data without Alice having to know how ~ninfacyzga-01~ is |
| 223 | programmed (she might have a ~calkuptcana~ agent on her person that |
| 224 | broadcasts her privacy preferences). Meanwhile, ~ninfacyzga-01~ may |
| 225 | publish its observations to a repository that Alice and other members |
| 226 | of the shared communal space have access to (ex: a read-only shared |
| 227 | directory on a local network WiFi). Alice could download all the files |
| 228 | in the shared repository but she would only be able to decrypt files |
| 229 | generated when she was physically near enough to ~ninfacyzga-01~ for |
| 230 | it to detect that her presence was within some spatial boundary. |
| 231 | |
| 232 | A more restrictive rule could resemble the permissive rule in that |
| 233 | ~ninfacyzga-01~ uses Alice's encryption public key only when she is |
| 234 | physically near by, except that it encrypts logged files against |
| 235 | public keys in a sequential manner. This would mean that all people |
| 236 | who were near ~ninfacyzga-01~ would have to pass around each log file |
| 237 | to eachother so that they could decrypt the content. |
| 238 | |
| 239 | That said, according to [[https://www.rsyslog.com/doc/master/tutorials/database.html][this ~rsyslog~ page]], ~rsyslog~ is more a data |
| 240 | wrangling system for collecting data from disparate sources of |
| 241 | different types and outputting data to text files on disk than a |
| 242 | system committed to the server-client model of database storage. So, I |
| 243 | think converting ~bkgpslog~ into a ~bklog~ script that appends |
| 244 | encrypted and compressed data to a tar file for later extraction |
| 245 | (possibly the same script with future features) would be best. |
| 246 | |
| 247 | 2020-07-12T21:10Z; bktei> rsyslog is outside the scope of what |
| 248 | ~bkgpslog~ does (record location observations). A different tool |
| 249 | should be used to retrieve and synchronize data. The dumb storage |
| 250 | method of "tar files in a syncthing folder" works for now. |
| 251 | ** TODO: Place persistent recip. updates in asynchronous coproc |
| 252 | 2020-07-06T19:37Z; bktei> In order to update the recipient list, the |
| 253 | magicParseRecipientDir() function needs to be run each buffer period |
| 254 | in order to scan for changes in the recipient list. However, such a |
| 255 | scan takes time; if the magicGatherWriteBuffer() function must pause |
| 256 | until magicParseRecipientDir() completes, then a significant pause |
| 257 | between buffer sessions may occur, causing detectable gaps in location |
| 258 | data between buffer rounds. |
| 259 | |
| 260 | I looked for ways in which I might start magicParseRecipientDir() |
| 261 | asynchronously immediately before running the data collection command |
| 262 | and then collect its output at the start of the next buffer round. One |
| 263 | way using the ~coproc~ Bash built-in is described [[https://stackoverflow.com/a/20018504/10850071][here]]. I'd have to |
| 264 | make the asynchronous function output the recipient list to stdout |
| 265 | which would then be ~read~ into the ~recPubKeysValid~ array in the |
| 266 | main loop. However, for now, I'm putting the magicParseRecipientDir() |
| 267 | as-is in the main loop and accepting the delay for now. |
| 268 | * bkgpslog narrative |
| 269 | ** Initialize environment |
| 270 | *** Init variables |
| 271 | **** Save timeStart (YYYYmmddTHHMMSS±zz) |
| 272 | *** Define Functions |
| 273 | **** Define Debugging functions |
| 274 | **** Define Argument Processing function |
| 275 | **** Define Main function |
| 276 | ** Run Main Function |
| 277 | *** Process Arguments |
| 278 | *** Set output encryption and compression option strings |
| 279 | *** Check that critical apps and dirs are available, displag missing ones. |
| 280 | *** Set lifespans of script and buffer |
| 281 | *** Init temp working dir ~DIR_TMP~ |
| 282 | Make temporary dir in tmpfs dir: ~/dev/shm/$(nonce)..bkgpslog/~ (~DIR_TMP~) |
| 283 | *** Initialize ~tar~ archive |
| 284 | **** Write ~bkgpslog~ version to ~$DIR_TMP/VERSION~ |
| 285 | **** Create empty ~tar~ archive in ~DIR_OUT~ at ~PATHOUT_TAR~ |
| 286 | |
| 287 | Set output file name to: |
| 288 | : PATHOUT_TAR="$DIR_OUT/YYYYmmdd..hostname_location.gz.age.tar" |
| 289 | Usage: ~iso8601Period $timeStart $timeEnd~ |
| 290 | |
| 291 | **** Append ~VERSION~ file to ~PATHOUT_TAR~ |
| 292 | |
| 293 | Append ~$DIR_TMP/VERSION~ to ~PATHOUT_TAR~ via ~tar --append~ |
| 294 | |
| 295 | *** Read/Write Loop (Record gps data until script lifespan ends) |
| 296 | **** Determine output file paths |
| 297 | **** Define GPS conversion commands |
| 298 | **** Fill Bash variable buffer from ~gpspipe~ |
| 299 | **** Process bufferBash, save secured chunk set to ~DIR_TMP~ |
| 300 | **** Append each secured chunk to ~PATHOUT_TAR~ |
| 301 | : tar --append --directory=DIR_TMP --file=PATHOUT_TAR $(basename PATHOUT_{NMEA,GPX,KML} ) |
| 302 | **** Remove secured chunk from ~DIR_TMP~ |