debug(bklog):Format DEBUG messages to align better
[EVA-2020-02.git] / exec / bkgpslog-plan.org
CommitLineData
872c737e
SBS
1* bkgpslog task list
2** DONE Add job control for short buffer length
3 CLOSED: [2020-07-02 Thu 16:04]
42020-07-02T14:56Z; bktei> File write operations were bundled into a
5magicWriteBuffer function that is called then detached from the script
6shell (job control), but the detached job is not tracked by the main
7script. A problem may arise if two instances of magicWriteBuffer
8attempt to write to the same tar simultaneously. Two instances of
9magicWriteBuffer may exist if the buffer length is low (ex: 1 second);
10the default buffer length of 60 seconds should reduce the probability
11of a collision but it should be possible for the main script to track
12the process ID of a magicWriteBuffer() as soon as it detaches and then
13checking (via ~$!~ as described [[https://bashitout.com/2013/05/18/Ampersands-on-the-command-line.html][here]]) that the process is still alive.
142020-07-02T15:23Z; bktei> I found that the Bash ~wait~ built-in can be
15used to delay processing until a specified job completes. The ~wait~
16command will pause script execution until all backgrounded processes
17complete.
182020-07-02T16:03Z; bktei> Added ~wait~.
f6fb18bd
SBS
19** DONE Rewrite tar initialization function
20 CLOSED: [2020-07-02 Thu 17:23]
212020-07-02T17:23Z; bktei> Simplify tar initialization function so
22VERSION file is used to test appendability of tar as well as to mark
23when a new session is started.
24** DONE Consolidate tar checking/creation into function
25 CLOSED: [2020-07-02 Thu 18:33]
262020-07-02T18:33Z; bktei> Simplify how the output tar file's existence
27is checked and its status as a valid tar file is validated. This was
28done using a new function ~checkMakeTar~.
3df184eb 29** DONE Add VERSION if output tar deleted between writes
f75428fe 30
3df184eb
SBS
31 CLOSED: [2020-07-02 Thu 20:22]
322020-07-02T20:21Z; bktei> Added bkgpslog-specified function
33magicWriteVersion() to be called whenever a new time-stamped ~VERSION~
34file needs to be generated and appended to the output tar file
35~PATHOUT_TAR~.
3592a7e9 36** DONE Rewrite buffer loop to reduce lag between gpspipe runs
9ae33467 37
3592a7e9 38 CLOSED: [2020-07-03 Fri 20:57]
f75428fe
SBS
392020-07-03T17:10Z; bktei> As is, there is still a 5-6 second lag
40between when ~gpspipe~ times out at the end of a buffer round and when
41~gpspipe~ is called by the subsequent buffer round. I believe this can
42be reduced by moving variable manipulations inside the
43asynchronously-executed magicWriteBuffer() function. Ideally, the
44while loop should look like:
45
46#+BEGIN_EXAMPLE
47while( $SECONDS < $SCRIPT_TTL); do
48 gpspipe-r > "$DIR_TMP"/buffer.nmea
49 writeBuffer &
50done
51#+END_EXAMPLE
3592a7e9
SBS
522020-07-03T20:56Z; bktei> I simplified it futher to something like
53this:
54#+BEGIN_EXAMPLE
55while( $SECONDS < $SCRIPT_TTL); do
56 writeBuffer &
57 sleep $SCRIPT_TTL
58done
59#+END_EXAMPLE
9ae33467 60
3592a7e9
SBS
61Raspberry Pi Zero W shows approximately 71ms of drift per buffer round
62with 10s buffer.
9ae33467
SBS
63** TODO Feature: Recipient watch folder
642020-07-03T21:28Z; bktei> This feature would be to scan the contents
65of a specified directory at the start of every buffer round in order
66to determine encryption (age) recipients. This would allow a device to
67dynamically encrypt location data in response to automated changes
68made by other tools. For example, if such a directory were
69synchronized via Syncthing and changes to such a directory were
70managed by a trusted remote server, then that server could respond to
71human requests to secure location data.
72
73Two specific privacy subfeatures come to mind:
74
751. Parallel encryption: Given a set of ~n~ public keys, encrypt data
76 with a single ~age~ command with options causing all ~n~ pubkeys to
77 be recipients. In order to decrypt the data, any individual private
78 key could be used. No coordination between key owners would be
79 required to decrypt.
80
812. Sequential encryption: Given a set of ~n~ public keys, encrypt data
82 with ~n~ sequential ~age~ commands all piped in series with each
83 ~age~ command utilizing only one of the ~n~ public keys. In order
84 to decrypt the data, all ~n~ private keys would be required to
85 decrypt the data. Since coordination is required, it is less
86 convenient than parallel encryption.
87
88In either case, a directory would be useful for holding configuration
89files specifying how to execute which or combination of which features
90at the start of every buffer round.
91
92I don't yet know how to program the rules, although I think it'd be
93easier to simply add an option providing ~bkgpslog~ with a directory
94to watch. When examining the directory, check for a file with the
95appropriate file extension (ex: .pubkey) and then read the first line
96into the script's pubKey array.
97
98** TODO Feature: Simplify option to reduce output size
99
100~gpsbabel~ [[https://www.gpsbabel.org/htmldoc-development/filter_simplify.html][features]] a ~simplify~ option to trim data points from GPS
101data. There are several methods for prioritizing which points to keep
102and which to trim, although the following seems useful given some
103sample data I've recorded in a test run of ninfacyzga-01:
104
105#+BEGIN_EXAMPLE
106gpsbabel -i nmea -f all.nmea -x simplify,error=10,relative -o gpx \
107-F all-simp-rel-10.gpx
108#+END_EXAMPLE
109
110An error level of "10" with the "relative" option seems to retain all
111desireable features for GPS data while reducing the number of points
112along straightaways. File size is reduced by a factor of
113about 11. Noise from local stay-in-place drift isn't removed; a
114relative error of about 1000 is required to remove stay-in-place drift
115noise but this also trims all but 100m-size features of the recorded
116path. A relative error of 1000 reduces file size by a factor of
117about 450.
118
119#+BEGIN_EXAMPLE
120 67M relerror-0.001.kml
121 66M relerror-0.01.kml
122 58M relerror-0.1.kml
123 21M relerror-1.kml
1245.8M relerror-10.kml
125797K relerror-100.kml
126152K relerror-1000.kml
127#+END_EXAMPLE
128
320ac29c
SBS
129** TODO Feature: Generalize bkgpslog to bklog function
1302020-07-05T02:42Z; bktei> Transform ~bkgpslog~ into a modular
131component called ~bklog~ such that it processes a stdout stream of any
132external command, not just ~gpspipe -r~. This would permit reuse of
133the ~bkgpslog~ code for logging not just GPS data but things like
134pressure, temperature, system statistics, etc.
1352020-07-05T16:35Z; bktei>
136: bklog -r age1asdf -o log.tar # encrypt/compress stdin to log.tar
137: bklog -x -f log.tar -i age.key -O /tmp # extract and decrypt
138
139Making ~bklog~ follow the [[https://en.wikipedia.org/wiki/Unix_philosophy][Unix philosophy]] means that it shouldn't care
140what kind of text is fed to it.
141
142*** ~bklog~ Design vs. Unix Philosophy
143**** Pubkey dir watching
144The feature of periodically checking a directory for changes in the
145pubkeys it contains should be justified by its usefulness; if the
146complexity cannot be justified then the feature should be removed.
147**** Defaults vs options
148Many options can cause the tool to become complex in unjustifiable
149ways. Currently I am adding options because I want the ability to
150modify the script's behavior without having to modify the source code
151on the machine in which the code is running. I should consider
152removing features at some point and having the program force defaults
153on the user. For example, allowing the specification of a temporary
154directory, while useful for me, is probably not useful for most people
155who don't know or care about the difference between ~/tmp~ and
156~/dev/shm~.
157**** Script time to live (TTL)
158I initially implemented a script time-to-live feature because I was
159unsure in my ability to program script that could run for long periods
160of time without causing a runaway usage of memory. I still think it's
161a good idea to offer a script TTL option to the user but I think the
162default should be to simply run forver.
163** TODO: Evaluate ~rsyslog~ as stand-in for this work
1642020-07-05T02:57Z; bktei> I searched for "debian iot logging" ("iot"
165as in "Internet of Things", the current buzzword for small low-power
166computers being used to provide microservices for owners in their own
167home) and came across several search results mentioning ~syslog~ and
168~rsyslog~.
169
170https://www.thissmarthouse.net/consolidating-iot-logs-into-mysql-using-rsyslog/
171https://rsyslog.readthedocs.io/en/latest/tutorials/tls.html
172https://serverfault.com/questions/20840/how-would-you-send-syslog-securely-over-the-public-internet
173https://www.rsyslog.com/
174
175My impression is that ~rsyslog~ is a complex software package designed
176to offer many features, some of which possibly might satisfy my
177needs.
178
179However, as stated in the repository README, the objective of the
180~ninfacyzga-01~ project is "Observing facts of the new". This means
181that the goal is not only to record location data but any data that
182can be captured by a sensor. This means the capture of the following
183environmental phenomena are within the scope of this device:
184
185*** Sounds (microphone)
186*** Light (camera)
187*** Temperature (thermocouple)
188*** Air Pressure (barometer)
189*** Acceleration Vector (acceleromter / gyroscope)
190*** Magnetic Field Vector (magnetometer)
191
192This brings up the issue of respecting privacy of others in shared
193spaces through which ~ninfacyzga-01~ may pass through. ~ninfacyzga-01~
194should encrypt data it records according to rules set by its
195owner.
196
197One permissive rule could be that if ~ninfacyzga-01~ detects that a
198person (let's call her Alice) enters a room, it should add Alice's
199encryption public key to the list of recipients against which it
200encrypts data without Alice having to know how ~ninfacyzga-01~ is
201programmed (she might have a ~calkuptcana~ agent on her person that
202broadcasts her privacy preferences). Meanwhile, ~ninfacyzga-01~ may
203publish its observations to a repository that Alice and other members
204of the shared communal space have access to (ex: a read-only shared
205directory on a local network WiFi). Alice could download all the files
206in the shared repository but she would only be able to decrypt files
207generated when she was physically near enough to ~ninfacyzga-01~ for
208it to detect that her presence was within some spatial boundary.
209
210A more restrictive rule could resemble the permissive rule in that
211~ninfacyzga-01~ uses Alice's encryption public key only when she is
212physically near by, except that it encrypts logged files against
213public keys in a sequential manner. This would mean that all people
214who were near ~ninfacyzga-01~ would have to pass around each log file
215to eachother so that they could decrypt the content.
216
217That said, according to [[https://www.rsyslog.com/doc/master/tutorials/database.html][this ~rsyslog~ page]], ~rsyslog~ is more a data
218wrangling system for collecting data from disparate sources of
219different types and outputting data to text files on disk than a
220system committed to the server-client model of database storage. So, I
221think converting ~bkgpslog~ into a ~bklog~ script that appends
222encrypted and compressed data to a tar file for later extraction
223(possibly the same script with future features) would be best.
224
1a1738c4
SBS
225** TODO: Place persistent recip. updates in asynchronous coproc
2262020-07-06T19:37Z; bktei> In order to update the recipient list, the
227magicParseRecipientDir() function needs to be run each buffer period
228in order to scan for changes in the recipient list. However, such a
229scan takes time; if the magicGatherWriteBuffer() function must pause
230until magicParseRecipientDir() completes, then a significant pause
231between buffer sessions may occur, causing detectable gaps in location
232data between buffer rounds.
233
234I looked for ways in which I might start magicParseRecipientDir()
235asynchronously immediately before running the data collection command
236and then collect its output at the start of the next buffer round. One
237way using the ~coproc~ Bash built-in is described [[https://stackoverflow.com/a/20018504/10850071][here]]. I'd have to
238make the asynchronous function output the recipient list to stdout
239which would then be ~read~ into the ~recPubKeysValid~ array in the
240main loop. However, for now, I'm putting the magicParseRecipientDir()
241as-is in the main loop and accepting the delay for now.
6c30388f
SBS
242* bkgpslog narrative
243** Initialize environment
244*** Init variables
245**** Save timeStart (YYYYmmddTHHMMSS±zz)
246*** Define Functions
247**** Define Debugging functions
248**** Define Argument Processing function
249**** Define Main function
250** Run Main Function
251*** Process Arguments
252*** Set output encryption and compression option strings
253*** Check that critical apps and dirs are available, displag missing ones.
254*** Set lifespans of script and buffer
255*** Init temp working dir ~DIR_TMP~
256Make temporary dir in tmpfs dir: ~/dev/shm/$(nonce)..bkgpslog/~ (~DIR_TMP~)
257*** Initialize ~tar~ archive
258**** Write ~bkgpslog~ version to ~$DIR_TMP/VERSION~
259**** Create empty ~tar~ archive in ~DIR_OUT~ at ~PATHOUT_TAR~
260
261Set output file name to:
262: PATHOUT_TAR="$DIR_OUT/YYYYmmdd..hostname_location.gz.age.tar"
263Usage: ~iso8601Period $timeStart $timeEnd~
264
265**** Append ~VERSION~ file to ~PATHOUT_TAR~
266
267Append ~$DIR_TMP/VERSION~ to ~PATHOUT_TAR~ via ~tar --append~
268
269*** Read/Write Loop (Record gps data until script lifespan ends)
270**** Determine output file paths
271**** Define GPS conversion commands
272**** Fill Bash variable buffer from ~gpspipe~
273**** Process bufferBash, save secured chunk set to ~DIR_TMP~
274**** Append each secured chunk to ~PATHOUT_TAR~
275: tar --append --directory=DIR_TMP --file=PATHOUT_TAR $(basename PATHOUT_{NMEA,GPX,KML} )
276**** Remove secured chunk from ~DIR_TMP~