doc(exec):Create coding plan for bklog, retire bkgpslog plan
[EVA-2020-02.git] / exec / bkgpslog-plan.org
1 2020-07-12T21:16Z; bktei> Note: This file is now retired since ~bklog~
2 has replaced ~bkgpslog~.
3
4 * bkgpslog task list
5 ** DONE Add job control for short buffer length
6 CLOSED: [2020-07-02 Thu 16:04]
7 2020-07-02T14:56Z; bktei> File write operations were bundled into a
8 magicWriteBuffer function that is called then detached from the script
9 shell (job control), but the detached job is not tracked by the main
10 script. A problem may arise if two instances of magicWriteBuffer
11 attempt to write to the same tar simultaneously. Two instances of
12 magicWriteBuffer may exist if the buffer length is low (ex: 1 second);
13 the default buffer length of 60 seconds should reduce the probability
14 of a collision but it should be possible for the main script to track
15 the process ID of a magicWriteBuffer() as soon as it detaches and then
16 checking (via ~$!~ as described [[https://bashitout.com/2013/05/18/Ampersands-on-the-command-line.html][here]]) that the process is still alive.
17 2020-07-02T15:23Z; bktei> I found that the Bash ~wait~ built-in can be
18 used to delay processing until a specified job completes. The ~wait~
19 command will pause script execution until all backgrounded processes
20 complete.
21 2020-07-02T16:03Z; bktei> Added ~wait~.
22 ** DONE Rewrite tar initialization function
23 CLOSED: [2020-07-02 Thu 17:23]
24 2020-07-02T17:23Z; bktei> Simplify tar initialization function so
25 VERSION file is used to test appendability of tar as well as to mark
26 when a new session is started.
27 ** DONE Consolidate tar checking/creation into function
28 CLOSED: [2020-07-02 Thu 18:33]
29 2020-07-02T18:33Z; bktei> Simplify how the output tar file's existence
30 is checked and its status as a valid tar file is validated. This was
31 done using a new function ~checkMakeTar~.
32 ** DONE Add VERSION if output tar deleted between writes
33
34 CLOSED: [2020-07-02 Thu 20:22]
35 2020-07-02T20:21Z; bktei> Added bkgpslog-specified function
36 magicWriteVersion() to be called whenever a new time-stamped ~VERSION~
37 file needs to be generated and appended to the output tar file
38 ~PATHOUT_TAR~.
39 ** DONE Rewrite buffer loop to reduce lag between gpspipe runs
40
41 CLOSED: [2020-07-03 Fri 20:57]
42 2020-07-03T17:10Z; bktei> As is, there is still a 5-6 second lag
43 between when ~gpspipe~ times out at the end of a buffer round and when
44 ~gpspipe~ is called by the subsequent buffer round. I believe this can
45 be reduced by moving variable manipulations inside the
46 asynchronously-executed magicWriteBuffer() function. Ideally, the
47 while loop should look like:
48
49 #+BEGIN_EXAMPLE
50 while( $SECONDS < $SCRIPT_TTL); do
51 gpspipe-r > "$DIR_TMP"/buffer.nmea
52 writeBuffer &
53 done
54 #+END_EXAMPLE
55 2020-07-03T20:56Z; bktei> I simplified it futher to something like
56 this:
57 #+BEGIN_EXAMPLE
58 while( $SECONDS < $SCRIPT_TTL); do
59 writeBuffer &
60 sleep $SCRIPT_TTL
61 done
62 #+END_EXAMPLE
63
64 Raspberry Pi Zero W shows approximately 71ms of drift per buffer round
65 with 10s buffer.
66 ** DONE Feature: Recipient watch folder
67 CLOSED: [2020-07-12 Sun 21:08]
68 2020-07-03T21:28Z; bktei> This feature would be to scan the contents
69 of a specified directory at the start of every buffer round in order
70 to determine encryption (age) recipients. This would allow a device to
71 dynamically encrypt location data in response to automated changes
72 made by other tools. For example, if such a directory were
73 synchronized via Syncthing and changes to such a directory were
74 managed by a trusted remote server, then that server could respond to
75 human requests to secure location data.
76
77 Two specific privacy subfeatures come to mind:
78
79 1. Parallel encryption: Given a set of ~n~ public keys, encrypt data
80 with a single ~age~ command with options causing all ~n~ pubkeys to
81 be recipients. In order to decrypt the data, any individual private
82 key could be used. No coordination between key owners would be
83 required to decrypt.
84
85 2. Sequential encryption: Given a set of ~n~ public keys, encrypt data
86 with ~n~ sequential ~age~ commands all piped in series with each
87 ~age~ command utilizing only one of the ~n~ public keys. In order
88 to decrypt the data, all ~n~ private keys would be required to
89 decrypt the data. Since coordination is required, it is less
90 convenient than parallel encryption.
91
92 In either case, a directory would be useful for holding configuration
93 files specifying how to execute which or combination of which features
94 at the start of every buffer round.
95
96 I don't yet know how to program the rules, although I think it'd be
97 easier to simply add an option providing ~bkgpslog~ with a directory
98 to watch. When examining the directory, check for a file with the
99 appropriate file extension (ex: .pubkey) and then read the first line
100 into the script's pubKey array.
101
102 2020-07-12T21:08Z; bktei> ~-R~ watch directory option added in ~bkgpslog~ ver
103 ~0.4.0~.
104
105 ** DONE Feature: Simplify option to reduce output size
106 CLOSED: [2020-07-12 Sun 21:15]
107
108 ~gpsbabel~ [[https://www.gpsbabel.org/htmldoc-development/filter_simplify.html][features]] a ~simplify~ option to trim data points from GPS
109 data. There are several methods for prioritizing which points to keep
110 and which to trim, although the following seems useful given some
111 sample data I've recorded in a test run of ninfacyzga-01:
112
113 #+BEGIN_EXAMPLE
114 gpsbabel -i nmea -f all.nmea -x simplify,error=10,relative -o gpx \
115 -F all-simp-rel-10.gpx
116 #+END_EXAMPLE
117
118 An error level of "10" with the "relative" option seems to retain all
119 desireable features for GPS data while reducing the number of points
120 along straightaways. File size is reduced by a factor of
121 about 11. Noise from local stay-in-place drift isn't removed; a
122 relative error of about 1000 is required to remove stay-in-place drift
123 noise but this also trims all but 100m-size features of the recorded
124 path. A relative error of 1000 reduces file size by a factor of
125 about 450.
126
127 #+BEGIN_EXAMPLE
128 67M relerror-0.001.kml
129 66M relerror-0.01.kml
130 58M relerror-0.1.kml
131 21M relerror-1.kml
132 5.8M relerror-10.kml
133 797K relerror-100.kml
134 152K relerror-1000.kml
135 #+END_EXAMPLE
136
137 2020-07-12T21:13Z; bktei> Instead of programming data simplification
138 in ~bkgpslog~, the data simplification step should be performed via
139 ~bklog~'s ~-p~ option which specifies a processing command string to
140 be ~eval~'d before data is compressed, encrypted, and written to
141 disk. In other words, handling the simplification of data beyond
142 allowing for a general command string specified by ~-p~ is outside the
143 scope of ~bkgpslog~ or its successor ~bklog~.
144
145 ** DONE Feature: Generalize bkgpslog to bklog function
146 CLOSED: [2020-07-12 Sun 21:11]
147 2020-07-05T02:42Z; bktei> Transform ~bkgpslog~ into a modular
148 component called ~bklog~ such that it processes a stdout stream of any
149 external command, not just ~gpspipe -r~. This would permit reuse of
150 the ~bkgpslog~ code for logging not just GPS data but things like
151 pressure, temperature, system statistics, etc.
152 2020-07-05T16:35Z; bktei>
153 : bklog -r age1asdf -o log.tar # encrypt/compress stdin to log.tar
154 : bklog -x -f log.tar -i age.key -O /tmp # extract and decrypt
155
156 Making ~bklog~ follow the [[https://en.wikipedia.org/wiki/Unix_philosophy][Unix philosophy]] means that it shouldn't care
157 what kind of text is fed to it.
158
159 *** ~bklog~ Design vs. Unix Philosophy
160 **** Pubkey dir watching
161 The feature of periodically checking a directory for changes in the
162 pubkeys it contains should be justified by its usefulness; if the
163 complexity cannot be justified then the feature should be removed.
164 **** Defaults vs options
165 Many options can cause the tool to become complex in unjustifiable
166 ways. Currently I am adding options because I want the ability to
167 modify the script's behavior without having to modify the source code
168 on the machine in which the code is running. I should consider
169 removing features at some point and having the program force defaults
170 on the user. For example, allowing the specification of a temporary
171 directory, while useful for me, is probably not useful for most people
172 who don't know or care about the difference between ~/tmp~ and
173 ~/dev/shm~.
174 **** Script time to live (TTL)
175 I initially implemented a script time-to-live feature because I was
176 unsure in my ability to program script that could run for long periods
177 of time without causing a runaway usage of memory. I still think it's
178 a good idea to offer a script TTL option to the user but I think the
179 default should be to simply run forver.
180
181 2020-07-12T21:11Z; bktei> ~bklog~ script created and tested as of
182 commit ~aedd19f~.
183
184 ** DONE TODO: Evaluate ~rsyslog~ as stand-in for this work
185 CLOSED: [2020-07-12 Sun 21:09]
186 2020-07-05T02:57Z; bktei> I searched for "debian iot logging" ("iot"
187 as in "Internet of Things", the current buzzword for small low-power
188 computers being used to provide microservices for owners in their own
189 home) and came across several search results mentioning ~syslog~ and
190 ~rsyslog~.
191
192 https://www.thissmarthouse.net/consolidating-iot-logs-into-mysql-using-rsyslog/
193 https://rsyslog.readthedocs.io/en/latest/tutorials/tls.html
194 https://serverfault.com/questions/20840/how-would-you-send-syslog-securely-over-the-public-internet
195 https://www.rsyslog.com/
196
197 My impression is that ~rsyslog~ is a complex software package designed
198 to offer many features, some of which possibly might satisfy my
199 needs.
200
201 However, as stated in the repository README, the objective of the
202 ~ninfacyzga-01~ project is "Observing facts of the new". This means
203 that the goal is not only to record location data but any data that
204 can be captured by a sensor. This means the capture of the following
205 environmental phenomena are within the scope of this device:
206
207 *** Sounds (microphone)
208 *** Light (camera)
209 *** Temperature (thermocouple)
210 *** Air Pressure (barometer)
211 *** Acceleration Vector (acceleromter / gyroscope)
212 *** Magnetic Field Vector (magnetometer)
213
214 This brings up the issue of respecting privacy of others in shared
215 spaces through which ~ninfacyzga-01~ may pass through. ~ninfacyzga-01~
216 should encrypt data it records according to rules set by its
217 owner.
218
219 One permissive rule could be that if ~ninfacyzga-01~ detects that a
220 person (let's call her Alice) enters a room, it should add Alice's
221 encryption public key to the list of recipients against which it
222 encrypts data without Alice having to know how ~ninfacyzga-01~ is
223 programmed (she might have a ~calkuptcana~ agent on her person that
224 broadcasts her privacy preferences). Meanwhile, ~ninfacyzga-01~ may
225 publish its observations to a repository that Alice and other members
226 of the shared communal space have access to (ex: a read-only shared
227 directory on a local network WiFi). Alice could download all the files
228 in the shared repository but she would only be able to decrypt files
229 generated when she was physically near enough to ~ninfacyzga-01~ for
230 it to detect that her presence was within some spatial boundary.
231
232 A more restrictive rule could resemble the permissive rule in that
233 ~ninfacyzga-01~ uses Alice's encryption public key only when she is
234 physically near by, except that it encrypts logged files against
235 public keys in a sequential manner. This would mean that all people
236 who were near ~ninfacyzga-01~ would have to pass around each log file
237 to eachother so that they could decrypt the content.
238
239 That said, according to [[https://www.rsyslog.com/doc/master/tutorials/database.html][this ~rsyslog~ page]], ~rsyslog~ is more a data
240 wrangling system for collecting data from disparate sources of
241 different types and outputting data to text files on disk than a
242 system committed to the server-client model of database storage. So, I
243 think converting ~bkgpslog~ into a ~bklog~ script that appends
244 encrypted and compressed data to a tar file for later extraction
245 (possibly the same script with future features) would be best.
246
247 2020-07-12T21:10Z; bktei> rsyslog is outside the scope of what
248 ~bkgpslog~ does (record location observations). A different tool
249 should be used to retrieve and synchronize data. The dumb storage
250 method of "tar files in a syncthing folder" works for now.
251 ** TODO: Place persistent recip. updates in asynchronous coproc
252 2020-07-06T19:37Z; bktei> In order to update the recipient list, the
253 magicParseRecipientDir() function needs to be run each buffer period
254 in order to scan for changes in the recipient list. However, such a
255 scan takes time; if the magicGatherWriteBuffer() function must pause
256 until magicParseRecipientDir() completes, then a significant pause
257 between buffer sessions may occur, causing detectable gaps in location
258 data between buffer rounds.
259
260 I looked for ways in which I might start magicParseRecipientDir()
261 asynchronously immediately before running the data collection command
262 and then collect its output at the start of the next buffer round. One
263 way using the ~coproc~ Bash built-in is described [[https://stackoverflow.com/a/20018504/10850071][here]]. I'd have to
264 make the asynchronous function output the recipient list to stdout
265 which would then be ~read~ into the ~recPubKeysValid~ array in the
266 main loop. However, for now, I'm putting the magicParseRecipientDir()
267 as-is in the main loop and accepting the delay for now.
268 * bkgpslog narrative
269 ** Initialize environment
270 *** Init variables
271 **** Save timeStart (YYYYmmddTHHMMSS±zz)
272 *** Define Functions
273 **** Define Debugging functions
274 **** Define Argument Processing function
275 **** Define Main function
276 ** Run Main Function
277 *** Process Arguments
278 *** Set output encryption and compression option strings
279 *** Check that critical apps and dirs are available, displag missing ones.
280 *** Set lifespans of script and buffer
281 *** Init temp working dir ~DIR_TMP~
282 Make temporary dir in tmpfs dir: ~/dev/shm/$(nonce)..bkgpslog/~ (~DIR_TMP~)
283 *** Initialize ~tar~ archive
284 **** Write ~bkgpslog~ version to ~$DIR_TMP/VERSION~
285 **** Create empty ~tar~ archive in ~DIR_OUT~ at ~PATHOUT_TAR~
286
287 Set output file name to:
288 : PATHOUT_TAR="$DIR_OUT/YYYYmmdd..hostname_location.gz.age.tar"
289 Usage: ~iso8601Period $timeStart $timeEnd~
290
291 **** Append ~VERSION~ file to ~PATHOUT_TAR~
292
293 Append ~$DIR_TMP/VERSION~ to ~PATHOUT_TAR~ via ~tar --append~
294
295 *** Read/Write Loop (Record gps data until script lifespan ends)
296 **** Determine output file paths
297 **** Define GPS conversion commands
298 **** Fill Bash variable buffer from ~gpspipe~
299 **** Process bufferBash, save secured chunk set to ~DIR_TMP~
300 **** Append each secured chunk to ~PATHOUT_TAR~
301 : tar --append --directory=DIR_TMP --file=PATHOUT_TAR $(basename PATHOUT_{NMEA,GPX,KML} )
302 **** Remove secured chunk from ~DIR_TMP~