Skip to content

[Discuss] Improvements in CI pipeline for PRs #2870

Open
@david-luna

Description

@david-luna

Summary

The current CI pipeline that runs for pull requests provides a set of quality checks. Some of these checks are executed for the full monorepo even when the change-set is small and scoped to a single package. With the proper improvements we could speed up the feedback loop and resource consumption when working on a single package.

Current Status

As for now the pipeline has several tasks working in parallel like lint, pr title check, api peer check... and the unit and TAV tests. this discussion is focusing on these last 2 jobs because is where we could get more gains. Regarding tests the current pipeline could be depicted as (simplified version)

---
title: Current PR WorkFlow
---
flowchart TD
    UT[Unit Tests]
    TAV[Test All Versions]
    UTC[Compile All]
    UT18["Test All (nodejs@18)"]
    UT20["Test Diff (nodejs@20)"]
    UT22["Test Diff (nodejs@22)"]
    UTCOV[Upload Coverage]
    TAVC[Compile All]
    TAV18["Test Diff (nodejs@18)"]
    TAV20["Test Diff (nodejs@20)"]
    TAV22["Test Diff (nodejs@22)"]

    UT --> UTC
    UTC --> UT18
    UTC --> UT20
    UTC --> UT22
    UT18 --> UTCOV
    TAV --> TAVC
    TAVC --> TAV18
    TAVC --> TAV20
    TAVC --> TAV22
Loading

Some observations can be extracted form this setup:

  • sending the coverage of all instrumentations at once forces us to compile all packages. This may take up to 5min
  • Unit tests with a specific version of the package instrumented does not reflect the actual coverage of the tests. Some code paths are not taken because they depend on the version being instrumented.
  • Although we do TAV testing only on the affected packages we also compile everything.

Proposed changes

If we were able to do the checks (compile, test & coverage report) only for the affected packages that would reduce the time spent in most of the PRs. IMHO I think we can achieve that by relying on 2 features that we have in our tool belt. These are Codecov Flags and GH upload artifact action. The later was already used in #2493 to cache the compilation results and reuse them which reduced significantly the CPU usage.

Codecov Flags claims that you could encapsulate each project/package coverage report independently. This encapsulation will allow us to do only the compilation and test only for the affected packages and update their coverage independently.

As said before unit test is not enough to get the real coverage since running the tests with only one version of the instrumented library it will be unlikely to go through all code paths. Instead if we were able to run and collect coverage from test-all-versions the coverage report would be more realistic. Problem here is that TAV runs in different "CI nodes" with different nodejs versions and we need all test runs in the same place to generate a report for codecov. This can be achieved by using the upload artifact action to stash all test results in each machine for later download and merge in a single node and upload them to codecov

The process becomes more sequential but we gain in less usage of resources since we will only perform the compilation and tests of the affected packages which usually is reduced. The changed workflow would look like

---
title: Future PR WorkFlow
---
flowchart TD
    T[Tests]
    TC[Compile Affected]
    T18["Unit Test Affected (nodejs@18)"]
    T20["Unit Test Affected (nodejs@20)"]
    T22["Unit Test Affected (nodejs@22)"]
    TAV18["TAV Affected (nodejs@18)"]
    TAV20["TAV Affected (nodejs@20)"]
    TAV22["TAV Affected (nodejs@22)"]
    UP18["Stash coverage (nodejs@18)"]
    UP20["Stash coverage (nodejs@20)"]
    UP22["Stash coverage (nodejs@22)"]
    MC["Merge and upload coverage with flags"]

    T --> TC
    TC --> T18
    TC --> T20
    TC --> T22
    T18 --> TAV18
    T20 --> TAV20
    T22 --> TAV22
    TAV18 --> UP18
    TAV20 --> UP20
    TAV22 --> UP22
    UP18 --> MC
    UP20 --> MC
    UP22 --> MC
Loading

NOTE: the term affected is literal. Meaning that if a modified package has dependants in the same monorepo they will become affected too. So compilation and tests will be done for these as well (nx already takes care of it)

  • This only affects the JavaScript OpenTelemetry library
  • This may affect other libraries, but I would like to get opinions here first

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions