| Commit message (Collapse) | Author | Age |
| |
|
|
|
| |
Simply create the transform constexpr and then apply it to any
collection with the restriction of passing it to a function.
|
| |
|
|
| |
This index would be empty under normal conditions.
|
| |
|
|
|
| |
Standard sized batch in a transaction, ordered by entity id.
Includes early exit if terminated.
|
| |
|
|
|
|
| |
Entities are reparsed and reinserted, removed on success.
Failure to parse updates the entity type to UnparsableLine.
Failure to insert again updates the detail with the reason.
|
| |
|
|
| |
Transform view for getting just the hash and record of a stored entity.
|
| |
|
|
|
|
|
| |
Wraps a templated value and a templated mutex (defaults to shared_mutex)
and provides safe access, locked with either a shared_lock (const value)
or lock_guard (non-const value).
Applies this to existingEntities.
|
| |
|
|
|
|
| |
It's not critical to run to completion during shutdown, we can pick up
where we left off on the next run. This will allow us to bail out
instead of holding up the shutdown process.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
OK, this is on me. future::valid does not tell you if the thread has
completed and the result is ready. It tells you if there is some state
you can get() maybe later. Here we replace those checks with a 0s wait
and a test to see if the status is ready.
The exit steps are updated to reflect this. Calling reset will invoke
the future's destructor, which blocks until the thread is joined as
needed.
This should hopefully address the issue where the main thread would
still block if it attempted to runJobAsNeeded for a job that was still
running.
|
| |
|
|
| |
It's not required; DB insertion occurs only in a background thread now.
|
| |
|
|
|
| |
Deferred policy is not good enough to avoid blocking the main thread...
it would just mean in blocked later and that's not the point here.
|
| |
|
|
| |
Fixes compatibility with PostgreSQL 18 which fails due to ambiguity.
|
| |
|
|
|
|
| |
Removes the need to block the main thread from reading stdin while
performing post curl operation actions, such as updating user agent
details.
|
| |
|
|
|
| |
Performs a check before launching a job thread, rather than just having
it exit immediately.
|
| |
|
|
|
|
|
| |
Improves handling of entity inserts where the entity already exists and
already has detail; does not call the onInsert handler. This avoids
repeatedly fetching UA detail every time the UA is first seen by a
process.
|
| |
|
|
|
| |
When Fields... is more than a single type, returns a tuple of fields,
instead of a single value.
|
| |
|
|
| |
Insertion cost of flat_map too high when map grows large.
|
| |
|
|
|
| |
Ingest is now background only, so don't limit where they're started
from. Adds some unfortunate locking around the curl maps.
|
| |
|
|
| |
This prevents halting reading input during data insertion.
|
| |
|
|
|
| |
Actual ingest is performed by the main process, jobReadParkedLines just
reads the park file and adds it to the queue.
|
| |
|
|
| |
Fixes issue where queuedLines would be accessed from background thread.
|
| |
|
|
| |
Allows safely running finalisation code in the main thread if required.
|
| | |
|
| |
|
|
|
|
|
| |
Limits the number lines inserted per transactions, and the number of
transactions before returning to reading input.
Prevents long running transactions in the case when queued lines has
grown in size.
|
| | |
|
| |
|
|
| |
Purge is now fully request_time based and not hacked around id ranges.
|
| |
|
|
| |
This reverts commit a6d31ff1d8703eae9375b7ec1cd01b323d7e8e6e.
|
| |
|
|
| |
Plays better with apgdiff
|
| |
|
|
| |
Line insert is only a single operation with no new entities.
|
| | |
|
| |
|
|
|
| |
Entity value is MD5 hashed same as DB unique key, but the id itself is
now taken from the DB primary key which is sequence generated.
|
| | |
|
| |
|
|
| |
Or the last errno on failure.
|
| | |
|
| |
|
|
| |
Also logs them on main loop exit.
|
| | |
|
| | |
|
| |
|
|
|
|
|
| |
Adds virtual log function, real implementation writes to syslog.
Test implementation writes to BOOST_TEST_MESSAGE, perf implementation
discards.
Replaces existing prints to stderr and adds logs to all key points.
|
| |
|
|
|
|
|
|
|
|
| |
Store log lines in memory until threshold is reach or idle occurs, then
insert all the lines in a single transaction. Save points handle the
case of insertion errors. On success the queue is cleared.
Parked lines also saved in bulk, only necessary if queued lines could
not be inserted on shutdown, else the queue simply grows until ability
to insert is restored. Importing parked lines just adds them to the
queue and the normal process then follows.
|
| |
|
|
|
| |
Apache sends SIGTERM to the logger process to it shutdown. Honestly I
thought it would just close stdin and I should have checked.
|
| | |
|
| |
|
|
|
|
|
| |
UNIQUE CONSTRAINT is limited to 2704 bytes, which prevents inserting
large values. Here we swap to a unique index on the MD5 hash of the
value. This should more than suffice given we already map to a 32bit for
the id and the index size is much much smaller.
|
| |
|
|
| |
Replaces accidentally duplicated user_agent for correct content_type.
|
| |
|
|
|
| |
Easier checking if a job has completed [successfully] and reseting state
for the next time.
|
| |
|
|
|
| |
Neither the curl handle, not the operation map is thread safe. This
isn't ideal, but it does solve the problem in a safe manor.
|
| |
|
|
|
| |
Jobs run on background threads now, so we can happily run them even when
we're busy.
|
| | |
|
| | |
|
| |
|
|
|
|
| |
If that fails, we still park them as before, such as when the DB is
unavailable. Those which are saved as entities require investigation why
they couldn't be saved, much like UnparsableLines.
|
| |
|
|
| |
No changes.
|
| | |
|