Description
Component(s)
pkg/ottl
Is your feature request related to a problem? Please describe.
just a copy of what i wrote as a comment in other issue as i thought we're discussing this
Why should we support grok?
grok if you ask me is much more readable and very common for our users.
what i have in mind is also custom pattern definition so you could do something like this
with ExtractGrokPattern
signature like this
ExtractGrokPattern(source, pattern, custom_patterns)
custom_patterns
is a map
and input string
my beagle is BLUE
you could do
ExtractGrokPattern(source, "my %{FAVORITE_DOG:dog} is colored %{RGB:color}", {
"FAVORITE_DOG" : "beagle",
"RGB" : "RED|GREEN|BLUE"
}
and this would result in
{
"dog": "beagle",
"color": "BLUE"
}
while this example is not that realistic nginx example from our pipeline shows the beauty of it
patterns:
- (%{NGINX_HOST} )?"?(?:%{NGINX_ADDRESS_LIST:result.access.remote_ip_list}|%{NOTSPACE:source.address})
- (-|%{DATA:user.name}) \[%{HTTPDATE:result.access.time}\] "%{DATA:result.access.info}"
%{NUMBER:http.response.status_code:long} %{NUMBER:http.response.body.bytes:long}
"(-|%{DATA:http.request.referrer})" "(-|%{DATA:user_agent.original})" %{NUMBER:result.access.http.request.length:long}
%{NUMBER:result.access.http.request.time:double} \[%{DATA:result.access.upstream.name}\]
\[%{DATA:result.access.upstream.alternative_name}\] (%{UPSTREAM_ADDRESS_LIST:result.access.upstream_address_list}|-)
(%{UPSTREAM_RESPONSE_LENGTH_LIST:result.access.upstream.response.length_list}|-) (%{UPSTREAM_RESPONSE_TIME_LIST:result.access.upstream.response.time_list}|-)
(%{UPSTREAM_RESPONSE_STATUS_CODE_LIST:result.access.upstream.response.status_code_list}|-) %{GREEDYDATA:result.access.http.request.id}
pattern_definitions:
NGINX_HOST: (?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port})?
NGINX_NOTSEPARATOR: "[^\t ,:]+"
NGINX_ADDRESS_LIST: (?:%{IP}|%{WORD})("?,?\s*(?:%{IP}|%{WORD}))*
UPSTREAM_ADDRESS_LIST: (?:%{IP}(:%{NUMBER})?)("?,?\s*(?:%{IP}(:%{NUMBER})?))*
UPSTREAM_RESPONSE_LENGTH_LIST: (?:%{NUMBER})("?,?\s*(?:%{NUMBER}))*
UPSTREAM_RESPONSE_TIME_LIST: (?:%{NUMBER})("?,?\s*(?:%{NUMBER}))*
UPSTREAM_RESPONSE_STATUS_CODE_LIST: (?:%{NUMBER})("?,?\s*(?:%{NUMBER}))*
IP: (?:\[?%{IPV6}\]?|%{IPV4})
this pattern is complex and writing this using regex would be ugly
Describe the solution you'd like
ExtractGrokPattern(source, pattern, custom_patterns)
on top of ExtractPattern to give user an option
Grok uses regex anyways but provides better experience
Describe alternatives you've considered
No response
Additional context
No response