Skip to content

Add "aws.ecs.task.id" detection to "resourcedetection" processor #8274

@mkielar

Description

@mkielar

Is your feature request related to a problem? Please describe.
I'm using OTEL as a sidecar with ECS Services. I use it to parse and filter StatsD Metrics, that AppMesh/Envoy produces, and then I use emfexporter to put the metrics to Cloudwatch via Cloudwatch Log Stream. This mostly works. However, when my ECS Service scales to multiple instances, I often see following error in my logs:

2022-03-07T10:05:33.439Z	warn	[email protected]/cwlog_client.go:84	cwlog_client: Error occurs in PutLogEvents, will search the next token and retry the request	
{
    "kind": "exporter",
    "name": "awsemf/statsd/envoy_metrics",
    "error": "InvalidSequenceTokenException: The given sequenceToken is invalid. The next expected sequenceToken is: 49626189108498028449043455519612405404976381845984773650\n{\n  RespMetadata: {\n    StatusCode: 400,\n    RequestID: \"a24432dd-4d17-44ae-b245-3877cfffabb7\"\n  },\n  ExpectedSequenceToken: \"49626189108498028449043455519612405404976381845984773650\",\n  Message_: \"The given sequenceToken is invalid. The next expected sequenceToken is: 49626189108498028449043455519612405404976381845984773650\"\n}"
}

This is caused by race-condition - now, two nodes write to the same log-stream in cloudwatch, and they corrupt each ther's sequenceToken that AWS API Required to put logs to CloudWatch.

Describe the solution you'd like
I was hoping to additionally configure resourcedetection processor:

  "resource":
    "attributes":
    - "action": "insert"
      "from_attribute": "aws.ecs.task.id"
      "key": "TaskId"
    - "action": "insert"
      "from_attribute": "aws.ecs.task.arn"
      "key": "TaskARN"
  "resourcedetection":
    "detectors":
    - "env"
    - "ecs"

so that I would be able to use the {TaskId} dynamic field when configuring emfexporter, like this:

"awsemf/statsd/envoy_metrics":
    "dimension_rollup_option": "NoDimensionRollup"
    "log_group_name": "/aws/ecs/dev/hello-world"
    "log_stream_name": "emf/otel/statsd/envoy_metrics/{TaskId}"
    "namespace": "dev/AppMeshEnvoy"

However, when I run my service, I can see that only the following is detected by resourcedetection:

2022-03-07T13:11:17.808Z	info	internal/resourcedetection.go:139	detected resource information	
{
    "kind": "processor",
    "name": "resourcedetection",
    "resource": {
        "aws.ecs.cluster.arn": "arn:aws:ecs:eu-west-1:506501033716:cluster/dev",
        "aws.ecs.launchtype": "fargate",
        "aws.ecs.task.arn": "arn:aws:ecs:eu-west-1:506501033716:task/dev/1a8d528834e046b183d4913feeaa16bc",
        "aws.ecs.task.family": "dev-hello-world",
        "aws.ecs.task.revision": "43",
        "cloud.account.id": "506501033716",
        "cloud.availability_zone": "eu-west-1a",
        "cloud.platform": "aws_ecs",
        "cloud.provider": "aws",
        "cloud.region": "eu-west-1"
    }
}

Describe alternatives you've considered
Tried to use TaskARN, but that just lead to not having LogStream created at all. Most likely, the reason is that TaskARNs contain characters that are illegal for LogStream Name, the the emfexporter fails silently, not being able to create one.

Additional context
N/A.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions