Releases: deckhouse/prompp
Releases · deckhouse/prompp
v2.53.2-0.4.0-rc1
Features
- Added feature flag
head_default_number_of_shards
to adjust the number of shards (default is 2). Increasing the number of shards improves write operations while potentially slightly slowing down read operations and increasing memory consumption. This feature flag is temporary and will be removed in favor of automatic shard count calculation in the future. - Introduced a two-stage process for series selection queries by matchers. The first stage parses the regular expression using prefix trees from the index, which executes quickly but requires locks on the index during its execution. The second stage handles posting operations, which are resource-intensive due to data decoding and set operations on series IDs. By separating these stages, write locking time is reduced and read parallelism is increased since posting operations can use lightweight snapshot states without blocking appends.
- Implemented optimistic non-exclusive relabeling locks for data updates. Since new series appear infrequently, if all data in a append operation is already cached in relabeling, that stage does not lock the series container or indexes. Exclusive locking only occurs when new data must be added. This mechanism works only when intra-shard parallelization is enabled (disabled by default).
- Added a mechanism for executing tasks on a specific shard instead of all shards. This capability is essential for upcoming performance improvements.
Enhancements
- Added metrics tracking the waiting time for locks and head rotations. These metrics improve observability of internal delays and contention, enabling better diagnostics and tuning opportunities.
- Moved lock management inside task execution rather than across the entire task duration depending on task type. This change can yield slight performance improvements when intra-shard parallelization is enabled by reducing unnecessary lock holding time.
v2.53.2-0.3.4
Fixes
- Processing Several Backslashes in the End of Label Value. Metric parser had incorrect processing of even number of backslashes at the end of label name or value.
- Handling Head in Querier on Rotation. In some cases on rotation querier may have lost data on rotation.
- Priority Semaphore on Head. In some specific setups exclusive tasks like reconfigure, rotate or shutdown could get stuck in lock awaiting after all normal-priority requests. We use semaphore with 2-level priority interface to push priority tasks in front of waiters queue.
v2.53.2-0.3.3
Fixes
- Fixed Snapshot Handling in ChunkQuerier. Last updates led to loosing snapshots in ChunkQuerier that caused incorrect behaviour of RemoteRead API.
v2.53.2-0.3.2
Fixes
- Fixed Task Duplication in WAL Commits: which was causing excessive disk access. Now, a commit task is queued only upon the first achievement of the sample limit in a WAL segment.
Enhancements
- Increased the Sample Limit in WAL Segments: The previous soft limit of 10K, hardcoded as a constant, is now converted to command-line flag with default raised to 100K.
Features
- Added a Feature-flag to Disable Commits During RemoteWrite Requests. This is an experimental flag and will be replaced with a generalized persistence level setting in the future.
v2.53.2-0.3.1
Fixes
- Fixed Channel Overflow and Shard Goroutine Deadlock: A bug that caused channel overflow and deadlocks in shard goroutines has been fixed. The change ensures that tasks are added to the channel only from external goroutines, preventing these issues.
- Fixed Series Snapshot Memory Hanging: We've corrected an issue where series snapshots were not getting cleared from memory due to problems with Finalizers in Go. The snapshots involved pointers to memory allocated in C++, and the garbage collector did not always trigger the Finalizer, causing memory to linger.
- Corrected Potential Object Retention Errors in fastCGo Calls: There were potential errors related to object retention during fastCGo calls. While most of these were specific to test code, some could cause runtime errors in rare situations. These have now been addressed to improve stability.
Enhancements
- Optimized Series Copying During Rotation: We've made series copying during rotation much more efficient, reducing the time required by 7.5 times. To avoid pauses in the garbage collector, we're using the standard CGo mechanism for this process. Currently, this feature is under a feature flag and is being tested on select clusters to ensure stability and correctness. Once these tests are successful, we plan to enable it for all clusters.
- Revamped Task Execution System on Shards: The task execution system on shards has been restructured to separate series processing from data handling. Each now operates with its own queues and locks, which is expected to boost the requests per second (RPS) for both read and write operations.
- New Feature Flag for Multiple Goroutines per Shard: We've introduced a feature flag that allows running multiple goroutines per shard. This change is aimed at improving the scalability of read request handling, while still maintaining proper locking for exclusive write operations. This setup is particularly beneficial in scenarios where read requests heavily outweigh write requests. We are actively testing this feature on our clusters to determine the best concurrency levels before rolling out automatic tuning options.
- Optimized Internal Encoders and Decoders: We use StreamVByte encoding in data storages. We optimize some operations inside this encoding to reduce instructions and memory jumps. This optimizations reduce CPU time by 10% on this operations.
v2.53.2-0.2.6
Fixes
- Fill Sources in meta.json: The compactor writes the compaction.sources section in the meta.json file as a union of its parent sources. Thus, by creating blocks with empty sources, we end up making all blocks without sources. On the other hand, Thanos compactor relies on the list of sources to delete outdated blocks. Accordingly, blocks with an empty list of sources are automatically subject to deletion.
v2.53.2-0.3.0
Enhancements
- Concurrent Data Ingestion: Removed the exclusive lock during data ingestion, allowing for concurrent processing of batches. Insertion tasks are split into four sequential subtasks: relabeling, resharding new series, cache updating, and data insertion. This change speeds up insertions but may impact read performance. Future updates will focus on balancing read/write priorities.
- Improved Series Snapshot Management: Redesigned snapshot handling to create new snapshots only on memory reallocation. This reduces RAM usage by ~10% and improves read request processing times. Further improvements expected with stabilized series copying during rotations.
- Optimized Series Insertion: Minor optimizations for new series insertion. Noticeable 5% time savings when copying series during rotations.
v2.53.2-0.2.5
Fixes
- Infinite Recursion During Head Conversion: Fixed a bug in the logic where converting the head to a historical block could lead to infinite recursion.
- Memory Retention Issue in RemoteRead API: Fixed a memory retention issue with recoded chunks during raw chunk requests via the RemoteRead API. A memory pointer was incorrectly held, allowing the garbage collector to reuse memory while it was still being accessed, potentially leading to segmentation faults.
v2.53.2-0.2.4
Fixes
- Feature Flag for Series Copy During Rotation: The series copy operation during rotation has been placed behind a feature flag. This change addresses the high cost of the operation, which could temporarily render the service unavailable.
v2.53.2-0.2.3
Fixes
- Regular Expression Handling: Fixed a bug in regular expression handling that occasionally led to out-of-bounds errors and crashes. The code handling regular expressions now has additional test coverage, including fuzz testing under ASAN, uncovering no further issues.
Features
- Active LabelSets Copy during Rotation: Active labelSets are now copied from the previous head during a rotation. This reduces index update load during the first scrape interval post-rotation. While the rotation itself no longer impacts resource consumption, there is a slight CPU usage spike due to the compactor running afterward.
- RemoteRead Support for Raw Chunk Data: Added support for requesting raw chunk data via the RemoteRead protocol, enabling integration with external systems like Thanos. Since Prom++ encodes chunks in the active head differently from Prometheus, chunks are re-encoded upon request. Although this is not as efficient as Prometheus, it is more cost-effective than a full data unpack via RemoteRead.
Enhancements
- WAL Encoding Tweaks: The condition for selecting alternative timestamp encoding in the WAL encoder has been fixed. This generally results in a more compact WAL. Compatibility is maintained, and the previous incorrect condition caused no issues other than slightly increased disk usage.
- Multi-Architecture Docker Images: Added support for building multi-architecture Docker images.
- WAL Encoder Cleanup: Removed unused code from the WAL encoder, leading to a slight reduction in CPU usage.