summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* Don't order rows when purgingDan Goodliffe5 days
| | | | | | | The need to purge rows in request_time order is not necessary (all rows will be deleted at some point in the process anyway) and is otherwise extremely expensive. Removing the order by clause is a roughly 300x speed up.
* Switch to std::fstream and std::print for parkLogLinesDan Goodliffe2026-05-20
|
* Change bytesToHexRange into a transform view variableDan Goodliffe2026-05-20
| | | | | Simply create the transform constexpr and then apply it to any collection with the restriction of passing it to a function.
* Add index for uninsertable lines with no failure detailDan Goodliffe2026-05-19
| | | | This index would be empty under normal conditions.
* Run the retry uninsertable process batchedDan Goodliffe2026-05-19
| | | | | Standard sized batch in a transaction, ordered by entity id. Includes early exit if terminated.
* Add job to retry insertion of log lines which had previously failedDan Goodliffe2026-05-18
| | | | | | Entities are reparsed and reinserted, removed on success. Failure to parse updates the entity type to UnparsableLine. Failure to insert again updates the detail with the reason.
* Extract ENTITY_IDS helperDan Goodliffe2026-05-17
| | | | Transform view for getting just the hash and record of a stored entity.
* Add ThreadSafeT helperDan Goodliffe2026-05-17
| | | | | | | Wraps a templated value and a templated mutex (defaults to shared_mutex) and provides safe access, locked with either a shared_lock (const value) or lock_guard (non-const value). Applies this to existingEntities.
* Check terminated flag in jobPurgeOldLogswebstat-0.5.1Dan Goodliffe2026-05-13
| | | | | | It's not critical to run to completion during shutdown, we can pick up where we left off on the next run. This will allow us to bail out instead of holding up the shutdown process.
* Fix checking of background job completenessDan Goodliffe2026-05-13
| | | | | | | | | | | | | OK, this is on me. future::valid does not tell you if the thread has completed and the result is ready. It tells you if there is some state you can get() maybe later. Here we replace those checks with a 0s wait and a test to see if the status is ready. The exit steps are updated to reflect this. Calling reset will invoke the future's destructor, which blocks until the thread is joined as needed. This should hopefully address the issue where the main thread would still block if it attempted to runJobAsNeeded for a job that was still running.
* Remove maxBatchesDan Goodliffe2026-05-13
| | | | It's not required; DB insertion occurs only in a background thread now.
* Force use of std::launch::asyncwebstat-0.5Dan Goodliffe2026-05-10
| | | | | Deferred policy is not good enough to avoid blocking the main thread... it would just mean in blocked later and that's not the point here.
* Be specific about values from entities tableDan Goodliffe2026-05-10
| | | | Fixes compatibility with PostgreSQL 18 which fails due to ambiguity.
* Handle completed curl operations in a jobDan Goodliffe2026-05-07
| | | | | | Removes the need to block the main thread from reading stdin while performing post curl operation actions, such as updating user agent details.
* Add support for conditional job executionDan Goodliffe2026-05-07
| | | | | Performs a check before launching a job thread, rather than just having it exit immediately.
* Only call entity insert handler if detail is nullDan Goodliffe2026-05-05
| | | | | | | Improves handling of entity inserts where the entity already exists and already has detail; does not call the onInsert handler. This avoids repeatedly fetching UA detail every time the UA is first seen by a process.
* Add override of insert helper for tuplesDan Goodliffe2026-05-05
| | | | | When Fields... is more than a single type, returns a tuple of fields, instead of a single value.
* Switch to std::map for existingEntities cacheDan Goodliffe2026-05-05
| | | | Insertion cost of flat_map too high when map grows large.
* Start curl operations from any threadDan Goodliffe2026-05-02
| | | | | Ingest is now background only, so don't limit where they're started from. Adds some unfortunate locking around the curl maps.
* Ingest log lines in a background threadDan Goodliffe2026-05-02
| | | | This prevents halting reading input during data insertion.
* Rename jobIngestParkedLines to jobReadParkedLinesDan Goodliffe2026-05-01
| | | | | Actual ingest is performed by the main process, jobReadParkedLines just reads the park file and adds it to the queue.
* Append unparked lines to queue in finalise functionDan Goodliffe2026-05-01
| | | | Fixes issue where queuedLines would be accessed from background thread.
* Return a callable from jobsDan Goodliffe2026-05-01
| | | | Allows safely running finalisation code in the main thread if required.
* Explicitly wait for and finalize any running jobs on exitDan Goodliffe2026-05-01
|
* Limit the number lines stored at onceDan Goodliffe2026-05-01
| | | | | | | Limits the number lines inserted per transactions, and the number of transactions before returning to reading input. Prevents long running transactions in the case when queued lines has grown in size.
* Add indexes on all entity references in access_logDan Goodliffe2026-04-26
|
* Add BRIN index to access_log.request_time and improve purgeDan Goodliffe2026-04-23
| | | | Purge is now fully request_time based and not hacked around id ranges.
* Revert "Save point only if there are new entities"Dan Goodliffe2026-04-20
| | | | This reverts commit a6d31ff1d8703eae9375b7ec1cd01b323d7e8e6e.
* Swap int for integer in schemaDan Goodliffe2026-04-18
| | | | Plays better with apgdiff
* Save point only if there are new entitiesDan Goodliffe2026-04-15
| | | | Line insert is only a single operation with no new entities.
* 4 fields is more than enough for Entity to be a fully-fledged typeDan Goodliffe2026-04-15
|
* Replace use of crc32 for entity idDan Goodliffe2026-04-15
| | | | | Entity value is MD5 hashed same as DB unique key, but the id itself is now taken from the DB primary key which is sequence generated.
* Introduce MD5 from libmd, use it for hashing queuedLines for park pathDan Goodliffe2026-04-11
|
* Return path of parked lines log file from parkQueuedLogLinesDan Goodliffe2026-04-10
| | | | Or the last errno on failure.
* Parse escaping in query stringswebstat-0.3.1Dan Goodliffe2026-03-27
|
* Revise stats and add signal handlers to log them and reset themwebstat-0.3Dan Goodliffe2026-03-25
| | | | Also logs them on main loop exit.
* Employ temporary/short files to handle errors reading/writing park logsDan Goodliffe2026-03-22
|
* Add missing -WshadowDan Goodliffe2026-03-22
|
* Add logging :-oDan Goodliffe2026-03-20
| | | | | | | Adds virtual log function, real implementation writes to syslog. Test implementation writes to BOOST_TEST_MESSAGE, perf implementation discards. Replaces existing prints to stderr and adds logs to all key points.
* Insert log entries in batchesDan Goodliffe2026-03-20
| | | | | | | | | | Store log lines in memory until threshold is reach or idle occurs, then insert all the lines in a single transaction. Save points handle the case of insertion errors. On success the queue is cleared. Parked lines also saved in bulk, only necessary if queued lines could not be inserted on shutdown, else the queue simply grows until ability to insert is restored. Importing parked lines just adds them to the queue and the normal process then follows.
* Gracefully handle SIGTERMDan Goodliffe2026-03-19
| | | | | Apache sends SIGTERM to the logger process to it shutdown. Honestly I thought it would just close stdin and I should have checked.
* Count and return the number of parked lines ingestedDan Goodliffe2026-03-18
|
* Replace unique constraint on entity value with index on hashDan Goodliffe2026-03-18
| | | | | | | UNIQUE CONSTRAINT is limited to 2704 bytes, which prevents inserting large values. Here we swap to a unique index on the MD5 hash of the value. This should more than suffice given we already map to a 32bit for the id and the index size is much much smaller.
* Fix typo in access_log_view definitionDan Goodliffe2026-03-18
| | | | Replaces accidentally duplicated user_agent for correct content_type.
* Use std::future over std::thread for background jobsDan Goodliffe2026-03-17
| | | | | Easier checking if a job has completed [successfully] and reseting state for the next time.
* Don't start new curl operations outside the main threadDan Goodliffe2026-03-17
| | | | | Neither the curl handle, not the operation map is thread safe. This isn't ideal, but it does solve the problem in a safe manor.
* Execute jobs even when processing incoming logsDan Goodliffe2026-03-17
| | | | | Jobs run on background threads now, so we can happily run them even when we're busy.
* Run jobs on a background threadDan Goodliffe2026-03-17
|
* Process new field, content-type, in input streamDan Goodliffe2026-01-18
|
* Attempt to save uninsertable log lines to the entities tableDan Goodliffe2026-01-17
| | | | | | If that fails, we still park them as before, such as when the DB is unavailable. Those which are saved as entities require investigation why they couldn't be saved, much like UnparsableLines.