Add AsyncDbWorker: a persistent background thread with dedup queue that
executes DB2 writes asynchronously, keeping the mon 20ms cycle free of
blocking I/O.
Changes:
- async_db_worker.h/.cc: singleton worker, submit() with rule_id dedup,
drain_and_stop() for clean shutdown
- eqp_stat.h/.cc: new update_static(ruleid, shear_times, running_time)
overload that skips redundant DB reads for known values (reduces
5 SELECTs to 3 per persist cycle)
- exp_times.cc: extract persist_exp_times() as a standalone function,
update_history_times() snapshots values and submits to worker
(returns immediately), reset_dev_data() uses direct SHM update
- eqpalg_icei.cpp: alg_mgr_.reset() → drain_and_stop() in destructor
ensures all algorithm threads are stopped before draining the worker
Risk: re-run cmake .. to pick up the new async_db_worker.cc file.
Three fixes in update_history_times():
1. Wrap DB operations in try-catch — exception no longer skips the
snapshot restore, preventing permanent loss of accumulated counts
2. Treat get_history_times() -1 return (DB failure) as skip, not as
"record exists" → no more silent UPDATE on non-existent rows
3. Only call update_static and advance last_load_time_ on success,
so a failed persist retries on the next cycle instead of waiting
another rw_time_ minutes
Mon's update_map_rule() called update_cold() which blindly copied
RuleStatLocal's stat_values (always empty in mon) and fetch_mark
(always false in mon) into SHM, destroying accumulated data and
breaking the mon-cron handshake.
stat_values and fetch_mark are managed exclusively by the
add_stat_value/get_stat_value handshake. The cold sync path only
needs to transport running_time and shear_times.
Dynamic shared-memory vectors no longer cause segfaults from
unbounded growth, so the brute-force file deletion on every
start is unnecessary. Consistent with e21b2af which removed
the same pattern for TaskData_boost.mmap.
Commit e21b2af changed TaskShm map value from DataRecord (flat array)
to TaskRecord (struct wrapping shm_vector_f), but three call sites in
exp_base.cpp didn't drill into the .data_record member — they called
size()/operator[]/push_back() on TaskRecord itself, which has none.
DataRecord used a fixed float[129600000] consuming 5GB disk even when
collecting only a few hundred data points. Replaced with shm_vector_f
that grows on demand via push_back. Removes the need for rm -rf on
process exit — vector destructor frees memory back to the segment.
Also drops now-unnecessary task_data_size member.
HandlerExec in task mode now sets is_running_=false when rule_pointers_
and once_exec_queue_ are both empty. Manager cleanup uses two-phase
lock (shared_lock scan + unique_lock destroy/erase) synchronized with
exec_task via handles_mutex. exec_task checks is_running_ before submit
and destroys dead handlers to prevent task loss. Also fix logReset
self-assignment no-op.
The workaround was needed because bipc::string items in shared memory would
segfault on restart when tag names exceeded SSO length. Now that display
data (items, etc.) lives in local-memory DisplayCache and only cold doubles
remain in shared memory, the dangling-allocator bug no longer exists.
Deleting the file also broke mon-cron IPC across restarts.