debug(bklog):Test tar --append
[EVA-2020-02.git] / exec / bkgpslog-plan.org
1 * bkgpslog task list
2 ** DONE Add job control for short buffer length
3 CLOSED: [2020-07-02 Thu 16:04]
4 2020-07-02T14:56Z; bktei> File write operations were bundled into a
5 magicWriteBuffer function that is called then detached from the script
6 shell (job control), but the detached job is not tracked by the main
7 script. A problem may arise if two instances of magicWriteBuffer
8 attempt to write to the same tar simultaneously. Two instances of
9 magicWriteBuffer may exist if the buffer length is low (ex: 1 second);
10 the default buffer length of 60 seconds should reduce the probability
11 of a collision but it should be possible for the main script to track
12 the process ID of a magicWriteBuffer() as soon as it detaches and then
13 checking (via ~$!~ as described [[https://bashitout.com/2013/05/18/Ampersands-on-the-command-line.html][here]]) that the process is still alive.
14 2020-07-02T15:23Z; bktei> I found that the Bash ~wait~ built-in can be
15 used to delay processing until a specified job completes. The ~wait~
16 command will pause script execution until all backgrounded processes
17 complete.
18 2020-07-02T16:03Z; bktei> Added ~wait~.
19 ** DONE Rewrite tar initialization function
20 CLOSED: [2020-07-02 Thu 17:23]
21 2020-07-02T17:23Z; bktei> Simplify tar initialization function so
22 VERSION file is used to test appendability of tar as well as to mark
23 when a new session is started.
24 ** DONE Consolidate tar checking/creation into function
25 CLOSED: [2020-07-02 Thu 18:33]
26 2020-07-02T18:33Z; bktei> Simplify how the output tar file's existence
27 is checked and its status as a valid tar file is validated. This was
28 done using a new function ~checkMakeTar~.
29 ** DONE Add VERSION if output tar deleted between writes
30
31 CLOSED: [2020-07-02 Thu 20:22]
32 2020-07-02T20:21Z; bktei> Added bkgpslog-specified function
33 magicWriteVersion() to be called whenever a new time-stamped ~VERSION~
34 file needs to be generated and appended to the output tar file
35 ~PATHOUT_TAR~.
36 ** DONE Rewrite buffer loop to reduce lag between gpspipe runs
37
38 CLOSED: [2020-07-03 Fri 20:57]
39 2020-07-03T17:10Z; bktei> As is, there is still a 5-6 second lag
40 between when ~gpspipe~ times out at the end of a buffer round and when
41 ~gpspipe~ is called by the subsequent buffer round. I believe this can
42 be reduced by moving variable manipulations inside the
43 asynchronously-executed magicWriteBuffer() function. Ideally, the
44 while loop should look like:
45
46 #+BEGIN_EXAMPLE
47 while( $SECONDS < $SCRIPT_TTL); do
48 gpspipe-r > "$DIR_TMP"/buffer.nmea
49 writeBuffer &
50 done
51 #+END_EXAMPLE
52 2020-07-03T20:56Z; bktei> I simplified it futher to something like
53 this:
54 #+BEGIN_EXAMPLE
55 while( $SECONDS < $SCRIPT_TTL); do
56 writeBuffer &
57 sleep $SCRIPT_TTL
58 done
59 #+END_EXAMPLE
60
61 Raspberry Pi Zero W shows approximately 71ms of drift per buffer round
62 with 10s buffer.
63 ** TODO Feature: Recipient watch folder
64 2020-07-03T21:28Z; bktei> This feature would be to scan the contents
65 of a specified directory at the start of every buffer round in order
66 to determine encryption (age) recipients. This would allow a device to
67 dynamically encrypt location data in response to automated changes
68 made by other tools. For example, if such a directory were
69 synchronized via Syncthing and changes to such a directory were
70 managed by a trusted remote server, then that server could respond to
71 human requests to secure location data.
72
73 Two specific privacy subfeatures come to mind:
74
75 1. Parallel encryption: Given a set of ~n~ public keys, encrypt data
76 with a single ~age~ command with options causing all ~n~ pubkeys to
77 be recipients. In order to decrypt the data, any individual private
78 key could be used. No coordination between key owners would be
79 required to decrypt.
80
81 2. Sequential encryption: Given a set of ~n~ public keys, encrypt data
82 with ~n~ sequential ~age~ commands all piped in series with each
83 ~age~ command utilizing only one of the ~n~ public keys. In order
84 to decrypt the data, all ~n~ private keys would be required to
85 decrypt the data. Since coordination is required, it is less
86 convenient than parallel encryption.
87
88 In either case, a directory would be useful for holding configuration
89 files specifying how to execute which or combination of which features
90 at the start of every buffer round.
91
92 I don't yet know how to program the rules, although I think it'd be
93 easier to simply add an option providing ~bkgpslog~ with a directory
94 to watch. When examining the directory, check for a file with the
95 appropriate file extension (ex: .pubkey) and then read the first line
96 into the script's pubKey array.
97
98 ** TODO Feature: Simplify option to reduce output size
99
100 ~gpsbabel~ [[https://www.gpsbabel.org/htmldoc-development/filter_simplify.html][features]] a ~simplify~ option to trim data points from GPS
101 data. There are several methods for prioritizing which points to keep
102 and which to trim, although the following seems useful given some
103 sample data I've recorded in a test run of ninfacyzga-01:
104
105 #+BEGIN_EXAMPLE
106 gpsbabel -i nmea -f all.nmea -x simplify,error=10,relative -o gpx \
107 -F all-simp-rel-10.gpx
108 #+END_EXAMPLE
109
110 An error level of "10" with the "relative" option seems to retain all
111 desireable features for GPS data while reducing the number of points
112 along straightaways. File size is reduced by a factor of
113 about 11. Noise from local stay-in-place drift isn't removed; a
114 relative error of about 1000 is required to remove stay-in-place drift
115 noise but this also trims all but 100m-size features of the recorded
116 path. A relative error of 1000 reduces file size by a factor of
117 about 450.
118
119 #+BEGIN_EXAMPLE
120 67M relerror-0.001.kml
121 66M relerror-0.01.kml
122 58M relerror-0.1.kml
123 21M relerror-1.kml
124 5.8M relerror-10.kml
125 797K relerror-100.kml
126 152K relerror-1000.kml
127 #+END_EXAMPLE
128
129 ** TODO Feature: Generalize bkgpslog to bklog function
130 2020-07-05T02:42Z; bktei> Transform ~bkgpslog~ into a modular
131 component called ~bklog~ such that it processes a stdout stream of any
132 external command, not just ~gpspipe -r~. This would permit reuse of
133 the ~bkgpslog~ code for logging not just GPS data but things like
134 pressure, temperature, system statistics, etc.
135 2020-07-05T16:35Z; bktei>
136 : bklog -r age1asdf -o log.tar # encrypt/compress stdin to log.tar
137 : bklog -x -f log.tar -i age.key -O /tmp # extract and decrypt
138
139 Making ~bklog~ follow the [[https://en.wikipedia.org/wiki/Unix_philosophy][Unix philosophy]] means that it shouldn't care
140 what kind of text is fed to it.
141
142 *** ~bklog~ Design vs. Unix Philosophy
143 **** Pubkey dir watching
144 The feature of periodically checking a directory for changes in the
145 pubkeys it contains should be justified by its usefulness; if the
146 complexity cannot be justified then the feature should be removed.
147 **** Defaults vs options
148 Many options can cause the tool to become complex in unjustifiable
149 ways. Currently I am adding options because I want the ability to
150 modify the script's behavior without having to modify the source code
151 on the machine in which the code is running. I should consider
152 removing features at some point and having the program force defaults
153 on the user. For example, allowing the specification of a temporary
154 directory, while useful for me, is probably not useful for most people
155 who don't know or care about the difference between ~/tmp~ and
156 ~/dev/shm~.
157 **** Script time to live (TTL)
158 I initially implemented a script time-to-live feature because I was
159 unsure in my ability to program script that could run for long periods
160 of time without causing a runaway usage of memory. I still think it's
161 a good idea to offer a script TTL option to the user but I think the
162 default should be to simply run forver.
163 ** TODO: Evaluate ~rsyslog~ as stand-in for this work
164 2020-07-05T02:57Z; bktei> I searched for "debian iot logging" ("iot"
165 as in "Internet of Things", the current buzzword for small low-power
166 computers being used to provide microservices for owners in their own
167 home) and came across several search results mentioning ~syslog~ and
168 ~rsyslog~.
169
170 https://www.thissmarthouse.net/consolidating-iot-logs-into-mysql-using-rsyslog/
171 https://rsyslog.readthedocs.io/en/latest/tutorials/tls.html
172 https://serverfault.com/questions/20840/how-would-you-send-syslog-securely-over-the-public-internet
173 https://www.rsyslog.com/
174
175 My impression is that ~rsyslog~ is a complex software package designed
176 to offer many features, some of which possibly might satisfy my
177 needs.
178
179 However, as stated in the repository README, the objective of the
180 ~ninfacyzga-01~ project is "Observing facts of the new". This means
181 that the goal is not only to record location data but any data that
182 can be captured by a sensor. This means the capture of the following
183 environmental phenomena are within the scope of this device:
184
185 *** Sounds (microphone)
186 *** Light (camera)
187 *** Temperature (thermocouple)
188 *** Air Pressure (barometer)
189 *** Acceleration Vector (acceleromter / gyroscope)
190 *** Magnetic Field Vector (magnetometer)
191
192 This brings up the issue of respecting privacy of others in shared
193 spaces through which ~ninfacyzga-01~ may pass through. ~ninfacyzga-01~
194 should encrypt data it records according to rules set by its
195 owner.
196
197 One permissive rule could be that if ~ninfacyzga-01~ detects that a
198 person (let's call her Alice) enters a room, it should add Alice's
199 encryption public key to the list of recipients against which it
200 encrypts data without Alice having to know how ~ninfacyzga-01~ is
201 programmed (she might have a ~calkuptcana~ agent on her person that
202 broadcasts her privacy preferences). Meanwhile, ~ninfacyzga-01~ may
203 publish its observations to a repository that Alice and other members
204 of the shared communal space have access to (ex: a read-only shared
205 directory on a local network WiFi). Alice could download all the files
206 in the shared repository but she would only be able to decrypt files
207 generated when she was physically near enough to ~ninfacyzga-01~ for
208 it to detect that her presence was within some spatial boundary.
209
210 A more restrictive rule could resemble the permissive rule in that
211 ~ninfacyzga-01~ uses Alice's encryption public key only when she is
212 physically near by, except that it encrypts logged files against
213 public keys in a sequential manner. This would mean that all people
214 who were near ~ninfacyzga-01~ would have to pass around each log file
215 to eachother so that they could decrypt the content.
216
217 That said, according to [[https://www.rsyslog.com/doc/master/tutorials/database.html][this ~rsyslog~ page]], ~rsyslog~ is more a data
218 wrangling system for collecting data from disparate sources of
219 different types and outputting data to text files on disk than a
220 system committed to the server-client model of database storage. So, I
221 think converting ~bkgpslog~ into a ~bklog~ script that appends
222 encrypted and compressed data to a tar file for later extraction
223 (possibly the same script with future features) would be best.
224
225 ** TODO: Place persistent recip. updates in asynchronous coproc
226 2020-07-06T19:37Z; bktei> In order to update the recipient list, the
227 magicParseRecipientDir() function needs to be run each buffer period
228 in order to scan for changes in the recipient list. However, such a
229 scan takes time; if the magicGatherWriteBuffer() function must pause
230 until magicParseRecipientDir() completes, then a significant pause
231 between buffer sessions may occur, causing detectable gaps in location
232 data between buffer rounds.
233
234 I looked for ways in which I might start magicParseRecipientDir()
235 asynchronously immediately before running the data collection command
236 and then collect its output at the start of the next buffer round. One
237 way using the ~coproc~ Bash built-in is described [[https://stackoverflow.com/a/20018504/10850071][here]]. I'd have to
238 make the asynchronous function output the recipient list to stdout
239 which would then be ~read~ into the ~recPubKeysValid~ array in the
240 main loop. However, for now, I'm putting the magicParseRecipientDir()
241 as-is in the main loop and accepting the delay for now.
242 * bkgpslog narrative
243 ** Initialize environment
244 *** Init variables
245 **** Save timeStart (YYYYmmddTHHMMSS±zz)
246 *** Define Functions
247 **** Define Debugging functions
248 **** Define Argument Processing function
249 **** Define Main function
250 ** Run Main Function
251 *** Process Arguments
252 *** Set output encryption and compression option strings
253 *** Check that critical apps and dirs are available, displag missing ones.
254 *** Set lifespans of script and buffer
255 *** Init temp working dir ~DIR_TMP~
256 Make temporary dir in tmpfs dir: ~/dev/shm/$(nonce)..bkgpslog/~ (~DIR_TMP~)
257 *** Initialize ~tar~ archive
258 **** Write ~bkgpslog~ version to ~$DIR_TMP/VERSION~
259 **** Create empty ~tar~ archive in ~DIR_OUT~ at ~PATHOUT_TAR~
260
261 Set output file name to:
262 : PATHOUT_TAR="$DIR_OUT/YYYYmmdd..hostname_location.gz.age.tar"
263 Usage: ~iso8601Period $timeStart $timeEnd~
264
265 **** Append ~VERSION~ file to ~PATHOUT_TAR~
266
267 Append ~$DIR_TMP/VERSION~ to ~PATHOUT_TAR~ via ~tar --append~
268
269 *** Read/Write Loop (Record gps data until script lifespan ends)
270 **** Determine output file paths
271 **** Define GPS conversion commands
272 **** Fill Bash variable buffer from ~gpspipe~
273 **** Process bufferBash, save secured chunk set to ~DIR_TMP~
274 **** Append each secured chunk to ~PATHOUT_TAR~
275 : tar --append --directory=DIR_TMP --file=PATHOUT_TAR $(basename PATHOUT_{NMEA,GPX,KML} )
276 **** Remove secured chunk from ~DIR_TMP~