fix(exec/bklog):Add kill switch if `while read` returns empty buffer
[EVA-2020-02.git] / exec / bkgpslog-plan.org
CommitLineData
e879cdc3
SBS
12020-07-12T21:16Z; bktei> Note: This file is now retired since ~bklog~
2has replaced ~bkgpslog~.
3
872c737e
SBS
4* bkgpslog task list
5** DONE Add job control for short buffer length
6 CLOSED: [2020-07-02 Thu 16:04]
72020-07-02T14:56Z; bktei> File write operations were bundled into a
8magicWriteBuffer function that is called then detached from the script
9shell (job control), but the detached job is not tracked by the main
10script. A problem may arise if two instances of magicWriteBuffer
11attempt to write to the same tar simultaneously. Two instances of
12magicWriteBuffer may exist if the buffer length is low (ex: 1 second);
13the default buffer length of 60 seconds should reduce the probability
14of a collision but it should be possible for the main script to track
15the process ID of a magicWriteBuffer() as soon as it detaches and then
16checking (via ~$!~ as described [[https://bashitout.com/2013/05/18/Ampersands-on-the-command-line.html][here]]) that the process is still alive.
172020-07-02T15:23Z; bktei> I found that the Bash ~wait~ built-in can be
18used to delay processing until a specified job completes. The ~wait~
19command will pause script execution until all backgrounded processes
20complete.
212020-07-02T16:03Z; bktei> Added ~wait~.
f6fb18bd
SBS
22** DONE Rewrite tar initialization function
23 CLOSED: [2020-07-02 Thu 17:23]
242020-07-02T17:23Z; bktei> Simplify tar initialization function so
25VERSION file is used to test appendability of tar as well as to mark
26when a new session is started.
27** DONE Consolidate tar checking/creation into function
28 CLOSED: [2020-07-02 Thu 18:33]
292020-07-02T18:33Z; bktei> Simplify how the output tar file's existence
30is checked and its status as a valid tar file is validated. This was
31done using a new function ~checkMakeTar~.
3df184eb 32** DONE Add VERSION if output tar deleted between writes
f75428fe 33
3df184eb
SBS
34 CLOSED: [2020-07-02 Thu 20:22]
352020-07-02T20:21Z; bktei> Added bkgpslog-specified function
36magicWriteVersion() to be called whenever a new time-stamped ~VERSION~
37file needs to be generated and appended to the output tar file
38~PATHOUT_TAR~.
3592a7e9 39** DONE Rewrite buffer loop to reduce lag between gpspipe runs
9ae33467 40
3592a7e9 41 CLOSED: [2020-07-03 Fri 20:57]
f75428fe
SBS
422020-07-03T17:10Z; bktei> As is, there is still a 5-6 second lag
43between when ~gpspipe~ times out at the end of a buffer round and when
44~gpspipe~ is called by the subsequent buffer round. I believe this can
45be reduced by moving variable manipulations inside the
46asynchronously-executed magicWriteBuffer() function. Ideally, the
47while loop should look like:
48
49#+BEGIN_EXAMPLE
50while( $SECONDS < $SCRIPT_TTL); do
51 gpspipe-r > "$DIR_TMP"/buffer.nmea
52 writeBuffer &
53done
54#+END_EXAMPLE
3592a7e9
SBS
552020-07-03T20:56Z; bktei> I simplified it futher to something like
56this:
57#+BEGIN_EXAMPLE
58while( $SECONDS < $SCRIPT_TTL); do
59 writeBuffer &
60 sleep $SCRIPT_TTL
61done
62#+END_EXAMPLE
9ae33467 63
3592a7e9
SBS
64Raspberry Pi Zero W shows approximately 71ms of drift per buffer round
65with 10s buffer.
e879cdc3
SBS
66** DONE Feature: Recipient watch folder
67 CLOSED: [2020-07-12 Sun 21:08]
9ae33467
SBS
682020-07-03T21:28Z; bktei> This feature would be to scan the contents
69of a specified directory at the start of every buffer round in order
70to determine encryption (age) recipients. This would allow a device to
71dynamically encrypt location data in response to automated changes
72made by other tools. For example, if such a directory were
73synchronized via Syncthing and changes to such a directory were
74managed by a trusted remote server, then that server could respond to
75human requests to secure location data.
76
77Two specific privacy subfeatures come to mind:
78
791. Parallel encryption: Given a set of ~n~ public keys, encrypt data
80 with a single ~age~ command with options causing all ~n~ pubkeys to
81 be recipients. In order to decrypt the data, any individual private
82 key could be used. No coordination between key owners would be
83 required to decrypt.
84
852. Sequential encryption: Given a set of ~n~ public keys, encrypt data
86 with ~n~ sequential ~age~ commands all piped in series with each
87 ~age~ command utilizing only one of the ~n~ public keys. In order
88 to decrypt the data, all ~n~ private keys would be required to
89 decrypt the data. Since coordination is required, it is less
90 convenient than parallel encryption.
91
92In either case, a directory would be useful for holding configuration
93files specifying how to execute which or combination of which features
94at the start of every buffer round.
95
96I don't yet know how to program the rules, although I think it'd be
97easier to simply add an option providing ~bkgpslog~ with a directory
98to watch. When examining the directory, check for a file with the
99appropriate file extension (ex: .pubkey) and then read the first line
100into the script's pubKey array.
101
e879cdc3
SBS
1022020-07-12T21:08Z; bktei> ~-R~ watch directory option added in ~bkgpslog~ ver
103~0.4.0~.
104
105** DONE Feature: Simplify option to reduce output size
106 CLOSED: [2020-07-12 Sun 21:15]
9ae33467
SBS
107
108~gpsbabel~ [[https://www.gpsbabel.org/htmldoc-development/filter_simplify.html][features]] a ~simplify~ option to trim data points from GPS
109data. There are several methods for prioritizing which points to keep
110and which to trim, although the following seems useful given some
111sample data I've recorded in a test run of ninfacyzga-01:
112
113#+BEGIN_EXAMPLE
114gpsbabel -i nmea -f all.nmea -x simplify,error=10,relative -o gpx \
115-F all-simp-rel-10.gpx
116#+END_EXAMPLE
117
118An error level of "10" with the "relative" option seems to retain all
119desireable features for GPS data while reducing the number of points
120along straightaways. File size is reduced by a factor of
121about 11. Noise from local stay-in-place drift isn't removed; a
122relative error of about 1000 is required to remove stay-in-place drift
123noise but this also trims all but 100m-size features of the recorded
124path. A relative error of 1000 reduces file size by a factor of
125about 450.
126
127#+BEGIN_EXAMPLE
128 67M relerror-0.001.kml
129 66M relerror-0.01.kml
130 58M relerror-0.1.kml
131 21M relerror-1.kml
1325.8M relerror-10.kml
133797K relerror-100.kml
134152K relerror-1000.kml
135#+END_EXAMPLE
136
e879cdc3
SBS
1372020-07-12T21:13Z; bktei> Instead of programming data simplification
138in ~bkgpslog~, the data simplification step should be performed via
139~bklog~'s ~-p~ option which specifies a processing command string to
140be ~eval~'d before data is compressed, encrypted, and written to
141disk. In other words, handling the simplification of data beyond
142allowing for a general command string specified by ~-p~ is outside the
143scope of ~bkgpslog~ or its successor ~bklog~.
144
145** DONE Feature: Generalize bkgpslog to bklog function
146 CLOSED: [2020-07-12 Sun 21:11]
320ac29c
SBS
1472020-07-05T02:42Z; bktei> Transform ~bkgpslog~ into a modular
148component called ~bklog~ such that it processes a stdout stream of any
149external command, not just ~gpspipe -r~. This would permit reuse of
150the ~bkgpslog~ code for logging not just GPS data but things like
151pressure, temperature, system statistics, etc.
1522020-07-05T16:35Z; bktei>
153: bklog -r age1asdf -o log.tar # encrypt/compress stdin to log.tar
154: bklog -x -f log.tar -i age.key -O /tmp # extract and decrypt
155
156Making ~bklog~ follow the [[https://en.wikipedia.org/wiki/Unix_philosophy][Unix philosophy]] means that it shouldn't care
157what kind of text is fed to it.
158
159*** ~bklog~ Design vs. Unix Philosophy
160**** Pubkey dir watching
161The feature of periodically checking a directory for changes in the
162pubkeys it contains should be justified by its usefulness; if the
163complexity cannot be justified then the feature should be removed.
164**** Defaults vs options
165Many options can cause the tool to become complex in unjustifiable
166ways. Currently I am adding options because I want the ability to
167modify the script's behavior without having to modify the source code
168on the machine in which the code is running. I should consider
169removing features at some point and having the program force defaults
170on the user. For example, allowing the specification of a temporary
171directory, while useful for me, is probably not useful for most people
172who don't know or care about the difference between ~/tmp~ and
173~/dev/shm~.
174**** Script time to live (TTL)
175I initially implemented a script time-to-live feature because I was
176unsure in my ability to program script that could run for long periods
177of time without causing a runaway usage of memory. I still think it's
178a good idea to offer a script TTL option to the user but I think the
179default should be to simply run forver.
e879cdc3
SBS
180
1812020-07-12T21:11Z; bktei> ~bklog~ script created and tested as of
182commit ~aedd19f~.
183
184** DONE TODO: Evaluate ~rsyslog~ as stand-in for this work
185 CLOSED: [2020-07-12 Sun 21:09]
320ac29c
SBS
1862020-07-05T02:57Z; bktei> I searched for "debian iot logging" ("iot"
187as in "Internet of Things", the current buzzword for small low-power
188computers being used to provide microservices for owners in their own
189home) and came across several search results mentioning ~syslog~ and
190~rsyslog~.
191
192https://www.thissmarthouse.net/consolidating-iot-logs-into-mysql-using-rsyslog/
193https://rsyslog.readthedocs.io/en/latest/tutorials/tls.html
194https://serverfault.com/questions/20840/how-would-you-send-syslog-securely-over-the-public-internet
195https://www.rsyslog.com/
196
197My impression is that ~rsyslog~ is a complex software package designed
198to offer many features, some of which possibly might satisfy my
199needs.
200
201However, as stated in the repository README, the objective of the
202~ninfacyzga-01~ project is "Observing facts of the new". This means
203that the goal is not only to record location data but any data that
204can be captured by a sensor. This means the capture of the following
205environmental phenomena are within the scope of this device:
206
207*** Sounds (microphone)
208*** Light (camera)
209*** Temperature (thermocouple)
210*** Air Pressure (barometer)
211*** Acceleration Vector (acceleromter / gyroscope)
212*** Magnetic Field Vector (magnetometer)
213
214This brings up the issue of respecting privacy of others in shared
215spaces through which ~ninfacyzga-01~ may pass through. ~ninfacyzga-01~
216should encrypt data it records according to rules set by its
217owner.
218
219One permissive rule could be that if ~ninfacyzga-01~ detects that a
220person (let's call her Alice) enters a room, it should add Alice's
221encryption public key to the list of recipients against which it
222encrypts data without Alice having to know how ~ninfacyzga-01~ is
223programmed (she might have a ~calkuptcana~ agent on her person that
224broadcasts her privacy preferences). Meanwhile, ~ninfacyzga-01~ may
225publish its observations to a repository that Alice and other members
226of the shared communal space have access to (ex: a read-only shared
227directory on a local network WiFi). Alice could download all the files
228in the shared repository but she would only be able to decrypt files
229generated when she was physically near enough to ~ninfacyzga-01~ for
230it to detect that her presence was within some spatial boundary.
231
232A more restrictive rule could resemble the permissive rule in that
233~ninfacyzga-01~ uses Alice's encryption public key only when she is
234physically near by, except that it encrypts logged files against
235public keys in a sequential manner. This would mean that all people
236who were near ~ninfacyzga-01~ would have to pass around each log file
237to eachother so that they could decrypt the content.
238
239That said, according to [[https://www.rsyslog.com/doc/master/tutorials/database.html][this ~rsyslog~ page]], ~rsyslog~ is more a data
240wrangling system for collecting data from disparate sources of
241different types and outputting data to text files on disk than a
242system committed to the server-client model of database storage. So, I
243think converting ~bkgpslog~ into a ~bklog~ script that appends
244encrypted and compressed data to a tar file for later extraction
245(possibly the same script with future features) would be best.
246
e879cdc3
SBS
2472020-07-12T21:10Z; bktei> rsyslog is outside the scope of what
248~bkgpslog~ does (record location observations). A different tool
249should be used to retrieve and synchronize data. The dumb storage
250method of "tar files in a syncthing folder" works for now.
1a1738c4
SBS
251** TODO: Place persistent recip. updates in asynchronous coproc
2522020-07-06T19:37Z; bktei> In order to update the recipient list, the
253magicParseRecipientDir() function needs to be run each buffer period
254in order to scan for changes in the recipient list. However, such a
255scan takes time; if the magicGatherWriteBuffer() function must pause
256until magicParseRecipientDir() completes, then a significant pause
257between buffer sessions may occur, causing detectable gaps in location
258data between buffer rounds.
259
260I looked for ways in which I might start magicParseRecipientDir()
261asynchronously immediately before running the data collection command
262and then collect its output at the start of the next buffer round. One
263way using the ~coproc~ Bash built-in is described [[https://stackoverflow.com/a/20018504/10850071][here]]. I'd have to
264make the asynchronous function output the recipient list to stdout
265which would then be ~read~ into the ~recPubKeysValid~ array in the
266main loop. However, for now, I'm putting the magicParseRecipientDir()
267as-is in the main loop and accepting the delay for now.
6c30388f
SBS
268* bkgpslog narrative
269** Initialize environment
270*** Init variables
271**** Save timeStart (YYYYmmddTHHMMSS±zz)
272*** Define Functions
273**** Define Debugging functions
274**** Define Argument Processing function
275**** Define Main function
276** Run Main Function
277*** Process Arguments
278*** Set output encryption and compression option strings
279*** Check that critical apps and dirs are available, displag missing ones.
280*** Set lifespans of script and buffer
281*** Init temp working dir ~DIR_TMP~
282Make temporary dir in tmpfs dir: ~/dev/shm/$(nonce)..bkgpslog/~ (~DIR_TMP~)
283*** Initialize ~tar~ archive
284**** Write ~bkgpslog~ version to ~$DIR_TMP/VERSION~
285**** Create empty ~tar~ archive in ~DIR_OUT~ at ~PATHOUT_TAR~
286
287Set output file name to:
288: PATHOUT_TAR="$DIR_OUT/YYYYmmdd..hostname_location.gz.age.tar"
289Usage: ~iso8601Period $timeStart $timeEnd~
290
291**** Append ~VERSION~ file to ~PATHOUT_TAR~
292
293Append ~$DIR_TMP/VERSION~ to ~PATHOUT_TAR~ via ~tar --append~
294
295*** Read/Write Loop (Record gps data until script lifespan ends)
296**** Determine output file paths
297**** Define GPS conversion commands
298**** Fill Bash variable buffer from ~gpspipe~
299**** Process bufferBash, save secured chunk set to ~DIR_TMP~
300**** Append each secured chunk to ~PATHOUT_TAR~
301: tar --append --directory=DIR_TMP --file=PATHOUT_TAR $(basename PATHOUT_{NMEA,GPX,KML} )
302**** Remove secured chunk from ~DIR_TMP~