summaryrefslogtreecommitdiff
path: root/src/ingestor.cpp
Commit message (Collapse)AuthorAge
* Add ThreadSafeT helperDan Goodliffe2026-05-17
| | | | | | | Wraps a templated value and a templated mutex (defaults to shared_mutex) and provides safe access, locked with either a shared_lock (const value) or lock_guard (non-const value). Applies this to existingEntities.
* Check terminated flag in jobPurgeOldLogswebstat-0.5.1Dan Goodliffe2026-05-13
| | | | | | It's not critical to run to completion during shutdown, we can pick up where we left off on the next run. This will allow us to bail out instead of holding up the shutdown process.
* Fix checking of background job completenessDan Goodliffe2026-05-13
| | | | | | | | | | | | | OK, this is on me. future::valid does not tell you if the thread has completed and the result is ready. It tells you if there is some state you can get() maybe later. Here we replace those checks with a 0s wait and a test to see if the status is ready. The exit steps are updated to reflect this. Calling reset will invoke the future's destructor, which blocks until the thread is joined as needed. This should hopefully address the issue where the main thread would still block if it attempted to runJobAsNeeded for a job that was still running.
* Remove maxBatchesDan Goodliffe2026-05-13
| | | | It's not required; DB insertion occurs only in a background thread now.
* Force use of std::launch::asyncwebstat-0.5Dan Goodliffe2026-05-10
| | | | | Deferred policy is not good enough to avoid blocking the main thread... it would just mean in blocked later and that's not the point here.
* Handle completed curl operations in a jobDan Goodliffe2026-05-07
| | | | | | Removes the need to block the main thread from reading stdin while performing post curl operation actions, such as updating user agent details.
* Add support for conditional job executionDan Goodliffe2026-05-07
| | | | | Performs a check before launching a job thread, rather than just having it exit immediately.
* Only call entity insert handler if detail is nullDan Goodliffe2026-05-05
| | | | | | | Improves handling of entity inserts where the entity already exists and already has detail; does not call the onInsert handler. This avoids repeatedly fetching UA detail every time the UA is first seen by a process.
* Add override of insert helper for tuplesDan Goodliffe2026-05-05
| | | | | When Fields... is more than a single type, returns a tuple of fields, instead of a single value.
* Start curl operations from any threadDan Goodliffe2026-05-02
| | | | | Ingest is now background only, so don't limit where they're started from. Adds some unfortunate locking around the curl maps.
* Ingest log lines in a background threadDan Goodliffe2026-05-02
| | | | This prevents halting reading input during data insertion.
* Rename jobIngestParkedLines to jobReadParkedLinesDan Goodliffe2026-05-01
| | | | | Actual ingest is performed by the main process, jobReadParkedLines just reads the park file and adds it to the queue.
* Append unparked lines to queue in finalise functionDan Goodliffe2026-05-01
| | | | Fixes issue where queuedLines would be accessed from background thread.
* Return a callable from jobsDan Goodliffe2026-05-01
| | | | Allows safely running finalisation code in the main thread if required.
* Explicitly wait for and finalize any running jobs on exitDan Goodliffe2026-05-01
|
* Limit the number lines stored at onceDan Goodliffe2026-05-01
| | | | | | | Limits the number lines inserted per transactions, and the number of transactions before returning to reading input. Prevents long running transactions in the case when queued lines has grown in size.
* Add BRIN index to access_log.request_time and improve purgeDan Goodliffe2026-04-23
| | | | Purge is now fully request_time based and not hacked around id ranges.
* Revert "Save point only if there are new entities"Dan Goodliffe2026-04-20
| | | | This reverts commit a6d31ff1d8703eae9375b7ec1cd01b323d7e8e6e.
* Save point only if there are new entitiesDan Goodliffe2026-04-15
| | | | Line insert is only a single operation with no new entities.
* 4 fields is more than enough for Entity to be a fully-fledged typeDan Goodliffe2026-04-15
|
* Replace use of crc32 for entity idDan Goodliffe2026-04-15
| | | | | Entity value is MD5 hashed same as DB unique key, but the id itself is now taken from the DB primary key which is sequence generated.
* Introduce MD5 from libmd, use it for hashing queuedLines for park pathDan Goodliffe2026-04-11
|
* Return path of parked lines log file from parkQueuedLogLinesDan Goodliffe2026-04-10
| | | | Or the last errno on failure.
* Revise stats and add signal handlers to log them and reset themwebstat-0.3Dan Goodliffe2026-03-25
| | | | Also logs them on main loop exit.
* Employ temporary/short files to handle errors reading/writing park logsDan Goodliffe2026-03-22
|
* Add missing -WshadowDan Goodliffe2026-03-22
|
* Add logging :-oDan Goodliffe2026-03-20
| | | | | | | Adds virtual log function, real implementation writes to syslog. Test implementation writes to BOOST_TEST_MESSAGE, perf implementation discards. Replaces existing prints to stderr and adds logs to all key points.
* Insert log entries in batchesDan Goodliffe2026-03-20
| | | | | | | | | | Store log lines in memory until threshold is reach or idle occurs, then insert all the lines in a single transaction. Save points handle the case of insertion errors. On success the queue is cleared. Parked lines also saved in bulk, only necessary if queued lines could not be inserted on shutdown, else the queue simply grows until ability to insert is restored. Importing parked lines just adds them to the queue and the normal process then follows.
* Gracefully handle SIGTERMDan Goodliffe2026-03-19
| | | | | Apache sends SIGTERM to the logger process to it shutdown. Honestly I thought it would just close stdin and I should have checked.
* Count and return the number of parked lines ingestedDan Goodliffe2026-03-18
|
* Use std::future over std::thread for background jobsDan Goodliffe2026-03-17
| | | | | Easier checking if a job has completed [successfully] and reseting state for the next time.
* Don't start new curl operations outside the main threadDan Goodliffe2026-03-17
| | | | | Neither the curl handle, not the operation map is thread safe. This isn't ideal, but it does solve the problem in a safe manor.
* Execute jobs even when processing incoming logsDan Goodliffe2026-03-17
| | | | | Jobs run on background threads now, so we can happily run them even when we're busy.
* Run jobs on a background threadDan Goodliffe2026-03-17
|
* Process new field, content-type, in input streamDan Goodliffe2026-01-18
|
* Attempt to save uninsertable log lines to the entities tableDan Goodliffe2026-01-17
| | | | | | If that fails, we still park them as before, such as when the DB is unavailable. Those which are saved as entities require investigation why they couldn't be saved, much like UnparsableLines.
* Add job for puring old access log entries from the databaseDan Goodliffe2025-12-20
|
* Add a few no lint commentsDan Goodliffe2025-12-20
|
* Replace that awful magic number heavy mapping functionDan Goodliffe2025-10-16
| | | | | Now a tuple of mapping functors and we pass each value through its corresponding converter.
* Refactor handling of new entity insertDan Goodliffe2025-10-15
| | | | | Replaces weird select with one thing with a function pointer stored in the type definition array.
* Update comments on custom_log formatDan Goodliffe2025-10-15
|
* Allows handle curl things if there are anywebstat-0.2.2Dan Goodliffe2025-10-10
|
* Fix premature remembering of saved entity idswebstat-0.2.1Dan Goodliffe2025-10-09
| | | | | | | Don't persist entity ids saved to the DB until the transaction is committed. Prevents the issue where a later DB operation fails, the transaction is rolled back, but we still think the entity has been saved.
* Add parked line import jobwebstat-0.2Dan Goodliffe2025-10-06
| | | | Periodically, on idle, scan for and import previously parked lines.
* Add point to execute scheduled jobs when idleDan Goodliffe2025-10-02
|
* Write log lines to files on errorDan Goodliffe2025-09-30
| | | | | We call this parking, later we can reattempt ingestion after whatever caused the failure has been fixed.
* Create settings structureDan Goodliffe2025-09-24
| | | | | | | Holds all the settings and their defaults for use in program_options and tests. Disables missing-field-initializers in tests because its over sensitive to structures with defaults where you only provide some values specifically.
* Write unparsable lines to the entity tableDan Goodliffe2025-09-23
| | | | Diagnostics and the ability to ingest later.
* Make DB pool protected for access from unit testsDan Goodliffe2025-09-23
|
* Create and perform UA lookup curl op when new user agent is encounteredDan Goodliffe2025-09-13
|