You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Malware detection is achieved using a combination of metadata and source code heuristics. Certain combinations of the results of these heuristics are indicators of a malicious package.
6
6
7
-
1.**Empty Project Link**: If the package contains project links (e.g., documentation, Git Repositories),
8
-
the analyzer will further operate the heuristic `Unreachable Project Links` to analyze if all the project links are unreachable.
9
-
2.**One Release**: Checks if there is only one release of the package. If the package contains multiple
10
-
releases, the checker will further check the release frequency through `High Release Frequency` and
11
-
`Unchanged Release` to see if the maintainers release multiple times in a short timeframe (threshold), and
12
-
whether the contents of the releases are identical.
13
-
3.**Closer Release Join Date**: Considers the date when the maintainer registered their account (if
14
-
available). The checker will calculate the gap between the latest release date and the maintainer's account
15
-
registration date.
16
-
4.**Suspicious Setup**: Checks whether the `setup.py` includes suspicious imports, such as `base64` for
17
-
encryption and `requests` for data exfiltration.
18
-
19
-
## Supported Ecosystem: PyPI
20
-
21
-
Define Seven Heuristics: `False` means suspicious and `True` means benign. `SKIP` means some metadata is missing, and the checker will skip the heuristic.
7
+
When a heuristic fails, with `HeuristicResult.FAIL`, then that is an indicator by that heuristic of suspicious behaviour. When a heuristic passes, with `HeuristicResult.PASS`, then that is an indicator of benign behavior. When a heuristic is skipped, returning `HeuristicResult.SKIP`, then this means that heuristic was not applicable to the package, due to either package details or dependencies on other heuristics. When a heuristic encounters a malformed package, a `HeuristicAnalyzerValueError` is raised. The following heuristics are currently run sequentially to gauge package maliciousness.
22
8
23
9
1.**Empty Project Link**
24
10
-**Description**: Checks whether the package contains any project links (e.g., documents or Git
25
-
Repositories). Many malicious activities do not include any project links.
26
-
-**Rule**: Return `FALSE` when there is only one project link; otherwise, return `TRUE`.
11
+
Repositories). Many malicious packages do not include any project links.
12
+
-**Rule**: Return `HeuristicResult.FAIL` when there are no project links; otherwise, return `HeuristicResult.PASS`.
27
13
28
14
2.**Unreachable Project Links**
29
15
-**Description**: Checks the accessibility of the project links. This is considered an auxiliary
30
16
heuristic since no cases have met this heuristic.
31
-
-**Rule**: Return `FALSE` if all project links are unreachable; otherwise, return `TRUE`.
17
+
-**Rule**: Return `HeuristicResult.FAIL` if all project links are unreachable; otherwise, return `HeuristicResult.PASS`.
18
+
-**Dependency**: Will be run if the Empty Project Link heuristic passes.
32
19
33
20
3.**One Release**
34
21
-**Description**: Checks whether the package has only one release.
35
-
-**Rule**: Return `FALSE` if the package contains only one release; otherwise, return `TRUE`.
22
+
-**Rule**: Return `HeuristicResult.FAIL` if the package contains only one release; otherwise, return `HeuristicResult.PASS`.
36
23
37
24
4.**High Release Frequency**
38
25
-**Description**: Checks if the package released multiple versions within a short timeframe. We calculate
39
26
the release frequency and define a default frequency threshold of 2 days.
40
-
-**Rule**: Return `FALSE` if the frequency is higher than the threshold; otherwise, return `TRUE`.
27
+
-**Rule**: Return `HeuristicResult.FAIL` if the frequency is higher than the threshold; otherwise, return `HeuristicResult.PASS`.
28
+
-**Dependency**: Will be run if the One Release heuristic passes.
41
29
42
30
5.**Unchanged Release**
43
-
-**Description**: Checks if the content of releases remains unchanged.
44
-
-**Rule**: Return `FALSE` if the content of releases is identical; otherwise, return `TRUE`.
31
+
-**Description**: Checks if the content of releases remains unchanged using the `sha256` digest of the package source.
32
+
-**Rule**: Return `HeuristicResult.FAIL` if the content of any two releases is identical; otherwise, return `HeuristicResult.PASS`.
33
+
-**Dependency**: Will be run if the High Release Frequency heuristic fails.
45
34
46
35
6.**Closer Release Join Date**
47
-
-**Description**: Checks the gap between the date the maintainer registered their account and the date
36
+
-**Description**: Checks the gap between the date the maintainer(s) registered their account and the date
48
37
of the latest release. A default threshold of 5 days is defined.
49
-
-**Rule**: Return `FALSE` if the gap is less than the threshold; otherwise, return `TRUE`.
38
+
-**Rule**: Return `HeuristicResult.FAIL` if the gap is less than the threshold for any maintainer; otherwise, return `HeuristicResult.PASS`.
50
39
51
40
7.**Suspicious Setup**
52
-
-**Description**: Checks the `setup.py` to see if there are suspicious imported modules, or
53
-
`install_requires` packages that are installed during the package installation process. We define two suspicious
54
-
keywords as the blacklist.
55
-
-**Rule**: Return `FALSE` if the package name contains suspicious keywords; otherwise, return `TRUE`.
41
+
-**Description**: Checks `setup.py` to see if there are suspicious imported modules, or
42
+
`install_requires` packages that are installed during the package installation process. Current blacklisted packages are `base64` and `requests`. This heuristic is skipped if no `setup.py` file can be found in the package.
43
+
-**Rule**: Return `HeuristicResult.FAIL` if the package name contains suspicious keywords; otherwise, return `HeuristicResult.PASS`.
44
+
-**Dependency**: Will be run if the Closer Release Join Date heuristic fails.
45
+
46
+
8.**Wheel Absence**
47
+
-**Description**: Checks for the presence of a wheel (`.whl`) file distributed with the specified package release.
48
+
-**Rule**: Return `HeuristicResult.FAIL` if there is no wheel file present with that package release; otherwise, return `HeuristicResult.PASS`.
49
+
50
+
9.**Anomalous Version**
51
+
-**Description**: Checks if the version number is abnormally high, checking the epoch and major version against threshold values. This does account for common date-based version number (calendar versioning) patterns.
52
+
-**Rule**: Return `HeuristicResult.FAIL` if the major or epoch is abnormally high; otherwise, return `HeuristicResult.PASS`.
53
+
-**Dependency**: Will be run if the One Release heuristic fails.
54
+
55
+
### Confidence Score Motivation
56
56
57
-
## Heuristics-Based Analyzer: Scanning 1167 Packages from Trusted Organizations
57
+
The original seven heuristics which started this work were Empty Project Link, Unreachable Project Links, One Release, High Release Frequency, Unchange Release, Closer Release Join Date, and Suspicious Setup. These heuristics (excluding those with a dependency) were run on 1167 packages from trusted organizations, with the following results:
58
58
59
59
| Heuristic Name | Count |
60
60
|------------------| ----- |
@@ -64,4 +64,4 @@ Define Seven Heuristics: `False` means suspicious and `True` means benign. `SKIP
64
64
| Frequent Release | 14 |
65
65
| Suspicious Setup | 5 |
66
66
67
-
**The result is used as a reference for the confidence score to lower the false positive rate.**
67
+
These results were used as a reference for the confidence score provided in each suspicious combination.
0 commit comments