Skip to content

Commit 6e025b2

Browse files
feat: add a check to analyze malicious Python packages (#750)
Signed-off-by: Yao-Wen-Chang <[email protected]>
1 parent 65f9e7e commit 6e025b2

30 files changed

+1972
-2
lines changed

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ dependencies = [
3434
"jsonschema >= 4.22.0,<5.0.0",
3535
"cyclonedx-bom >=4.0.0,<5.0.0",
3636
"cyclonedx-python-lib[validation] >=7.3.4,<8.0.0",
37+
"beautifulsoup4 >= 4.12.0,<5.0.0",
3738
]
3839
keywords = []
3940
# https://pypi.org/classifiers/
@@ -74,6 +75,7 @@ dev = [
7475
"pip-audit >=2.5.6,<3.0.0",
7576
"pylint >=3.0.3,<4.0.0",
7677
"cyclonedx-bom >=4.0.0,<5.0.0",
78+
"types-beautifulsoup4 >= 4.12.0,<5.0.0",
7779
]
7880
docs = [
7981
"sphinx >=7.0.0,<8.0.0",

src/macaron/config/defaults.ini

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -512,6 +512,10 @@ hostname = registry.npmjs.org
512512
attestation_endpoint = -/npm/v1/attestations
513513
request_timeout = 20
514514

515+
[package_registry.pypi]
516+
request_timeout = 20
517+
hostname = pypi.org
518+
515519
# Configuration options for selecting the checks to run.
516520
# Both the exclude and include are defined as list of strings:
517521
# - The exclude list is used to specify the checks that will not run.
@@ -547,3 +551,9 @@ request_timeout = 20
547551
exclude =
548552
# By default, we run all checks available.
549553
include = *
554+
555+
[heuristic.pypi]
556+
releases_frequency_threshold = 2
557+
# The gap threshold.
558+
# The timedelta indicate the gap between the date maintainer registers their pypi's account and the date of latest release.
559+
timedelta_threshold_of_join_release = 5
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Implementation of Heuristic Malware Detector
2+
3+
## Check
4+
5+
We schedule the heuristics sequentially:
6+
7+
1. **Empty Project Link**: If the package contains project links (e.g., documentation, Git Repositories),
8+
the analyzer will further operate the heuristic `Unreachable Project Links` to analyze if all the project links are not reachable.
9+
2. **One Release**: Checks if there is only one release of the package. If the package contains multiple
10+
releases, the checker will further check the release frequency through `High Release Frequency` and
11+
`Unchanged Release` to see if the maintainers release multiple times in a short timeframe (threshold) and
12+
whether the released contents are identical.
13+
3. **Closer Release Join Date**: Considers the date when the maintainer registered their account (if
14+
available). The checker will calculate the gap between the latest release date and the maintainer's account
15+
registration date.
16+
4. **Suspicious Setup**: Checks whether the `setup.py` includes suspicious imports, such as `base64` for
17+
encryption and `requests` for data exfiltration.
18+
19+
## Supported Ecosystem: PyPI
20+
21+
Define Seven Heuristics: `False` means suspicious and `True` means non-suspicious. `SKIP` means some metadata are missing, and the checker skips the heuristic.
22+
23+
1. **Empty Project Link**
24+
- **Description**: Checks whether the package contains any project links (e.g., documents or Git
25+
Repositories). Many malicious activities do not include any project link.
26+
- **Rule**: Return `FALSE` when there is only one project link; otherwise, return `TRUE`.
27+
28+
2. **Unreachable Project Links**
29+
- **Description**: Checks the accessibility of the project links. This is considered an auxiliary
30+
heuristic since no cases have met this heuristic.
31+
- **Rule**: Return `FALSE` if all project links are not reachable; otherwise, return `TRUE`.
32+
33+
3. **One Release**
34+
- **Description**: Checks whether the package has only one release.
35+
- **Rule**: Return `FALSE` if the package contains only one release; otherwise, return `TRUE`.
36+
37+
4. **High Release Frequency**
38+
- **Description**: Checks if the package released multiple versions within a short period. We calculate
39+
the release frequency and define a default frequency threshold of 2 days.
40+
- **Rule**: Return `FALSE` if the frequency is higher than the threshold; otherwise, return `TRUE`.
41+
42+
5. **Unchanged Release**
43+
- **Description**: Checks if the content of releases remains unchanged.
44+
- **Rule**: Return `FALSE` if the content of releases is identical; otherwise, return `TRUE`.
45+
46+
6. **Closer Release Join Date**
47+
- **Description**: Checks the gap between the date the maintainer registered their account and the date
48+
of the latest release. A default threshold of 5 days is defined.
49+
- **Rule**: Return `FALSE` if the gap is less than the threshold; otherwise, return `TRUE`.
50+
51+
7. **Suspicious Setup**
52+
- **Description**: Checks the `setup.py` to see if there are suspicious imported modules and the
53+
`install_requires` packages installed during the package installation process. We define two suspicious
54+
keywords as the blacklist.
55+
- **Rule**: Return `FALSE` if the package name contains suspicious keywords; otherwise, return `TRUE`.
56+
57+
## Heuristics-Based Analyzer: Scanning 1167 Packages from Trusted Organizations
58+
59+
| Heuristic Name | Count |
60+
| ------------------ | ----- |
61+
| Lower Release | 102 |
62+
| Empty Link | 45 |
63+
| Links Missing | 24 |
64+
| Frequent Release | 14 |
65+
| Suspicious Setup | 5 |
66+
67+
**The result is used as a reference for the confidence score to lower the false positive rate.**
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Copyright (c) 2022 - 2024, Oracle and/or its affiliates. All rights reserved.
2+
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Copyright (c) 2022 - 2024, Oracle and/or its affiliates. All rights reserved.
2+
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

0 commit comments

Comments
 (0)