summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* Handle completed curl operations in a jobDan Goodliffe5 days
| | | | | | Removes the need to block the main thread from reading stdin while performing post curl operation actions, such as updating user agent details.
* Add support for conditional job executionDan Goodliffe5 days
| | | | | Performs a check before launching a job thread, rather than just having it exit immediately.
* Only call entity insert handler if detail is nullDan Goodliffe7 days
| | | | | | | Improves handling of entity inserts where the entity already exists and already has detail; does not call the onInsert handler. This avoids repeatedly fetching UA detail every time the UA is first seen by a process.
* Add override of insert helper for tuplesDan Goodliffe7 days
| | | | | When Fields... is more than a single type, returns a tuple of fields, instead of a single value.
* Switch to std::map for existingEntities cacheDan Goodliffe7 days
| | | | Insertion cost of flat_map too high when map grows large.
* Start curl operations from any threadDan Goodliffe10 days
| | | | | Ingest is now background only, so don't limit where they're started from. Adds some unfortunate locking around the curl maps.
* Ingest log lines in a background threadDan Goodliffe10 days
| | | | This prevents halting reading input during data insertion.
* Rename jobIngestParkedLines to jobReadParkedLinesDan Goodliffe11 days
| | | | | Actual ingest is performed by the main process, jobReadParkedLines just reads the park file and adds it to the queue.
* Append unparked lines to queue in finalise functionDan Goodliffe11 days
| | | | Fixes issue where queuedLines would be accessed from background thread.
* Return a callable from jobsDan Goodliffe11 days
| | | | Allows safely running finalisation code in the main thread if required.
* Explicitly wait for and finalize any running jobs on exitDan Goodliffe11 days
|
* Limit the number lines stored at onceDan Goodliffe11 days
| | | | | | | Limits the number lines inserted per transactions, and the number of transactions before returning to reading input. Prevents long running transactions in the case when queued lines has grown in size.
* Add indexes on all entity references in access_logDan Goodliffe2026-04-26
|
* Add BRIN index to access_log.request_time and improve purgeDan Goodliffe2026-04-23
| | | | Purge is now fully request_time based and not hacked around id ranges.
* Revert "Save point only if there are new entities"Dan Goodliffe2026-04-20
| | | | This reverts commit a6d31ff1d8703eae9375b7ec1cd01b323d7e8e6e.
* Swap int for integer in schemaDan Goodliffe2026-04-18
| | | | Plays better with apgdiff
* Save point only if there are new entitiesDan Goodliffe2026-04-15
| | | | Line insert is only a single operation with no new entities.
* 4 fields is more than enough for Entity to be a fully-fledged typeDan Goodliffe2026-04-15
|
* Replace use of crc32 for entity idDan Goodliffe2026-04-15
| | | | | Entity value is MD5 hashed same as DB unique key, but the id itself is now taken from the DB primary key which is sequence generated.
* Introduce MD5 from libmd, use it for hashing queuedLines for park pathDan Goodliffe2026-04-11
|
* Return path of parked lines log file from parkQueuedLogLinesDan Goodliffe2026-04-10
| | | | Or the last errno on failure.
* Parse escaping in query stringswebstat-0.3.1Dan Goodliffe2026-03-27
|
* Revise stats and add signal handlers to log them and reset themwebstat-0.3Dan Goodliffe2026-03-25
| | | | Also logs them on main loop exit.
* Employ temporary/short files to handle errors reading/writing park logsDan Goodliffe2026-03-22
|
* Add missing -WshadowDan Goodliffe2026-03-22
|
* Add logging :-oDan Goodliffe2026-03-20
| | | | | | | Adds virtual log function, real implementation writes to syslog. Test implementation writes to BOOST_TEST_MESSAGE, perf implementation discards. Replaces existing prints to stderr and adds logs to all key points.
* Insert log entries in batchesDan Goodliffe2026-03-20
| | | | | | | | | | Store log lines in memory until threshold is reach or idle occurs, then insert all the lines in a single transaction. Save points handle the case of insertion errors. On success the queue is cleared. Parked lines also saved in bulk, only necessary if queued lines could not be inserted on shutdown, else the queue simply grows until ability to insert is restored. Importing parked lines just adds them to the queue and the normal process then follows.
* Gracefully handle SIGTERMDan Goodliffe2026-03-19
| | | | | Apache sends SIGTERM to the logger process to it shutdown. Honestly I thought it would just close stdin and I should have checked.
* Count and return the number of parked lines ingestedDan Goodliffe2026-03-18
|
* Replace unique constraint on entity value with index on hashDan Goodliffe2026-03-18
| | | | | | | UNIQUE CONSTRAINT is limited to 2704 bytes, which prevents inserting large values. Here we swap to a unique index on the MD5 hash of the value. This should more than suffice given we already map to a 32bit for the id and the index size is much much smaller.
* Fix typo in access_log_view definitionDan Goodliffe2026-03-18
| | | | Replaces accidentally duplicated user_agent for correct content_type.
* Use std::future over std::thread for background jobsDan Goodliffe2026-03-17
| | | | | Easier checking if a job has completed [successfully] and reseting state for the next time.
* Don't start new curl operations outside the main threadDan Goodliffe2026-03-17
| | | | | Neither the curl handle, not the operation map is thread safe. This isn't ideal, but it does solve the problem in a safe manor.
* Execute jobs even when processing incoming logsDan Goodliffe2026-03-17
| | | | | Jobs run on background threads now, so we can happily run them even when we're busy.
* Run jobs on a background threadDan Goodliffe2026-03-17
|
* Process new field, content-type, in input streamDan Goodliffe2026-01-18
|
* Attempt to save uninsertable log lines to the entities tableDan Goodliffe2026-01-17
| | | | | | If that fails, we still park them as before, such as when the DB is unavailable. Those which are saved as entities require investigation why they couldn't be saved, much like UnparsableLines.
* pg_format schema.sql and sql/*.sqlDan Goodliffe2026-01-17
| | | | No changes.
* Add job for puring old access log entries from the databaseDan Goodliffe2025-12-20
|
* Add a few no lint commentsDan Goodliffe2025-12-20
|
* Add support for configuring frequency of parked line jobDan Goodliffe2025-12-20
|
* Add utility for parsing an ISO like durationDan Goodliffe2025-12-20
|
* Replace that awful magic number heavy mapping functionDan Goodliffe2025-10-16
| | | | | Now a tuple of mapping functors and we pass each value through its corresponding converter.
* Refactor handling of new entity insertDan Goodliffe2025-10-15
| | | | | Replaces weird select with one thing with a function pointer stored in the type definition array.
* Update comments on custom_log formatDan Goodliffe2025-10-15
|
* Add access_log_viewDan Goodliffe2025-10-15
| | | | | A pre-joined with entities view showing all the original data along with ids; ideal for human readable stuff.
* Allows handle curl things if there are anywebstat-0.2.2Dan Goodliffe2025-10-10
|
* Fix premature remembering of saved entity idswebstat-0.2.1Dan Goodliffe2025-10-09
| | | | | | | Don't persist entity ids saved to the DB until the transaction is committed. Prevents the issue where a later DB operation fails, the transaction is rolled back, but we still think the entity has been saved.
* Fix up QuotedString/CLFString parsingDan Goodliffe2025-10-09
| | | | | | | | Refactors CLFString in terms of QuotedString, but with the optional of being null (nullopt) Moves the whole decode function into QuotedString's parser, fixing support for escaping of " which would otherwise prematurely end the string in the middle.
* Add PROPFIND to http_verb listDan Goodliffe2025-10-09
|