Skip to content

try setting MAX_JOBS=4 for oom in arm wheel #1804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 33 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
78a3d93
try setting MAX_JOBS=4 for oom in arm wheel
tinglvv Apr 26, 2024
160daf3
change to desired_cuda
tinglvv May 5, 2024
78dd24d
change desired_cuda check
tinglvv May 5, 2024
6eed272
change path
tinglvv May 6, 2024
fa2a485
remove libopenblas file
tinglvv May 7, 2024
19feff4
test only hopper for quicker tat
tinglvv May 7, 2024
3931c11
add back max_jobs=4
tinglvv May 9, 2024
f2f8250
cherrypick #1808
tinglvv May 9, 2024
71bc4f2
need maxjobs=4
tinglvv May 9, 2024
5dcd9dd
fix path to copy wheel
tinglvv May 9, 2024
3c9ff98
fix path to rm
tinglvv May 9, 2024
42ea493
try set max jobs to 5 as 4 is too slow
tinglvv May 9, 2024
f670d5b
cuda 9.0 for aarch64 only
tinglvv May 16, 2024
0eecef2
add libopenblas.so new location (from OpenBLAS)
tinglvv May 17, 2024
3841eaf
upgrade ACL version to 24.04 (1824)
tinglvv May 17, 2024
9f62e48
remove copy libarm_compute_core.so
tinglvv May 17, 2024
374e9e1
still need max_jobs=5 as 6 oom
tinglvv May 18, 2024
c49a757
aarch64: cd: fix issue with invoking cpu wheel build option (#1791)
snadampal May 2, 2024
d31681e
Update s390x builder (#1802)
AlekseiNikiforovIBM May 3, 2024
4fcabbe
Fix cuda windows validations update cuda driver. (#1810)
atalman May 6, 2024
a8f71c0
Revert "aarch64: upgrade ACL version to 24.04" (#1813)
atalman May 7, 2024
57425f4
Don't deactivate/remove conda on linux after validation (#1814)
atalman May 7, 2024
8d58e64
Add manylinux_2_28 image (#1816)
atalman May 10, 2024
5625515
Add manylinux_2_28 image - fix cmake (#1817)
atalman May 10, 2024
b47978f
Add Almalinux to manywheel build script (#1818)
atalman May 13, 2024
242fa68
[BE] Remove unused files and dead code (#1819)
atalman May 13, 2024
1f19db5
arch64: CD: add manylinux_2_28 docker build workflow (#1784)
snadampal May 13, 2024
455b572
Revert "[BE] Remove unused files and dead code" (#1821)
atalman May 13, 2024
85e8b9f
Add manylinux_2_28 cuda docker images (#1820)
atalman May 14, 2024
06ca292
[Validations] Turn off CUDA exception catch test (#1825)
atalman May 17, 2024
4116508
test with linker script enabled
tinglvv May 20, 2024
d1baef5
reapply acl version 24.04 as git history is messed
tinglvv May 20, 2024
d7ffad8
Use export USE_PRIORITIZED_TEXT_FOR_LD=1 instead of command line
tinglvv May 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions .github/scripts/validate_binaries.sh
Original file line number Diff line number Diff line change
Expand Up @@ -54,23 +54,23 @@ else
${PWD}/check_binary.sh
fi

# We are only interested in CUDA tests and Python 3.8-3.11. Not all requirement libraries are available for 3.12 yet.
if [[ ${INCLUDE_TEST_OPS:-} == 'true' && ${MATRIX_GPU_ARCH_TYPE} == 'cuda' && ${MATRIX_PYTHON_VERSION} != "3.12" ]]; then
source ./.github/scripts/validate_test_ops.sh
fi

if [[ ${TARGET_OS} == 'windows' ]]; then
python ./test/smoke_test/smoke_test.py ${TEST_SUFFIX}
else
python3 ./test/smoke_test/smoke_test.py ${TEST_SUFFIX}
python3 ./test/smoke_test/smoke_test.py ${TEST_SUFFIX} --runtime-error-check disabled
fi

if [[ ${TARGET_OS} == 'macos-arm64' ]]; then
export PATH=${OLD_PATH}
fi

# We are only interested in CUDA tests and Python 3.8-3.11. Not all requirement libraries are available for 3.12 yet.
if [[ ${INCLUDE_TEST_OPS:-} == 'true' && ${MATRIX_GPU_ARCH_TYPE} == 'cuda' && ${MATRIX_PYTHON_VERSION} != "3.12" ]]; then
source ./.github/scripts/validate_test_ops.sh
fi

# TODO: remove if statement currently this step is timing out on linx-aarch64
if [[ ${TARGET_OS} != 'linux-aarch64' ]]; then
# this is optional step
if [[ ${TARGET_OS} != linux* ]]; then
conda deactivate
conda env remove -n ${ENV_NAME}
fi
Expand Down
52 changes: 46 additions & 6 deletions .github/workflows/build-manywheel-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ on:
paths:
- .github/workflows/build-manywheel-images.yml
- manywheel/Dockerfile
- manywheel/Dockerfile_2_28
- manywheel/Dockerfile_aarch64
- manywheel/Dockerfile_2_28_aarch64
- manywheel/Dockerfile_cuda_aarch64
- manywheel/Dockerfile_cxx11-abi
- manywheel/build_docker.sh
Expand All @@ -21,7 +23,9 @@ on:
paths:
- .github/workflows/build-manywheel-images.yml
- manywheel/Dockerfile
- manywheel/Dockerfile_2_28
- manywheel/Dockerfile_aarch64
- manywheel/Dockerfile_2_28_aarch64
- manywheel/Dockerfile_cuda_aarch64
- manywheel/Dockerfile_cxx11-abi
- 'common/*'
Expand Down Expand Up @@ -56,6 +60,27 @@ jobs:
- name: Build Docker Image
run: |
manywheel/build_docker.sh
build-docker-cuda-manylinux_2_28:
runs-on: linux.12xlarge.ephemeral
strategy:
matrix:
cuda_version: ["12.4", "12.1", "11.8"]
env:
GPU_ARCH_TYPE: cuda-manylinux_2_28
GPU_ARCH_VERSION: ${{ matrix.cuda_version }}
steps:
- name: Purge tools folder (free space for build)
run: rm -rf /opt/hostedtoolcache
- name: Checkout PyTorch builder
uses: actions/checkout@v3
- name: Authenticate if WITH_PUSH
run: |
if [[ "${WITH_PUSH}" == true ]]; then
echo "${DOCKER_TOKEN}" | docker login -u "${DOCKER_ID}" --password-stdin
fi
- name: Build Docker Image
run: |
manywheel/build_docker.sh
build-docker-cuda-aarch64:
runs-on: linux.arm64.2xlarge
strategy:
Expand Down Expand Up @@ -107,6 +132,21 @@ jobs:
- name: Build Docker Image
run: |
manywheel/build_docker.sh
build-docker-cpu-manylinux_2_28:
runs-on: ubuntu-22.04
env:
GPU_ARCH_TYPE: cpu-manylinux_2_28
steps:
- name: Checkout PyTorch
uses: actions/checkout@v3
- name: Authenticate if WITH_PUSH
run: |
if [[ "${WITH_PUSH}" == true ]]; then
echo "${DOCKER_TOKEN}" | docker login -u "${DOCKER_ID}" --password-stdin
fi
- name: Build Docker Image
run: |
manywheel/build_docker.sh
build-docker-cpu-aarch64:
runs-on: linux.arm64.2xlarge
env:
Expand All @@ -122,10 +162,10 @@ jobs:
- name: Build Docker Image
run: |
manywheel/build_docker.sh
build-docker-cpu-cxx11-abi:
runs-on: ubuntu-22.04
build-docker-cpu-aarch64-2_28:
runs-on: linux.arm64.2xlarge
env:
GPU_ARCH_TYPE: cpu-cxx11-abi
GPU_ARCH_TYPE: cpu-aarch64-2_28
steps:
- name: Checkout PyTorch
uses: actions/checkout@v3
Expand All @@ -137,10 +177,10 @@ jobs:
- name: Build Docker Image
run: |
manywheel/build_docker.sh
build-docker-cpu-s390x:
runs-on: linux.s390x
build-docker-cpu-cxx11-abi:
runs-on: ubuntu-22.04
env:
GPU_ARCH_TYPE: cpu-s390x
GPU_ARCH_TYPE: cpu-cxx11-abi
steps:
- name: Checkout PyTorch
uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/validate-windows-binaries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ jobs:

printf '%s\n' ${{ toJson(inputs.release-matrix) }} > release_matrix.json
source /c/Jenkins/Miniconda3/etc/profile.d/conda.sh
if [[ ${MATRIX_GPU_ARCH_VERSION} == "12.1" ]]; then
if [[ ${MATRIX_GPU_ARCH_TYPE} == "cuda" ]]; then
./windows/internal/driver_update.bat
fi
source ./.github/scripts/validate_binaries.sh
2 changes: 1 addition & 1 deletion aarch64_linux/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@ __NOTE:__ CI build is currently __EXPERMINTAL__
This app allows a person to build using AWS EC3 resources and requires AWS-CLI and Boto3 with AWS credentials to support building EC2 instances for the wheel builds. Can be used in a codebuild CD or from a local system.

### Usage
```build_aarch64_wheel.py --key-name <YourPemKey> --use-docker --python 3.8 --branch <RCtag>```
```build_aarch64_wheel.py --key-name <YourPemKey> --use-docker --python 3.8 --branch <RCtag>```
12 changes: 7 additions & 5 deletions aarch64_linux/aarch64_ci_build.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#!/bin/bash
set -eux -o pipefail

GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-}

SCRIPTPATH="$( cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )"
source $SCRIPTPATH/aarch64_ci_setup.sh

Expand All @@ -26,10 +28,10 @@ cd /
git config --global --add safe.directory /pytorch
pip install -r /pytorch/requirements.txt
pip install auditwheel
if [ -n "$GPU_ARCH_VERSION" ]; then
echo "BASE_CUDA_VERSION is set to: $GPU_ARCH_VERSION"
python /builder/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn --enable-cuda
else
echo "BASE_CUDA_VERSION is not set."
if [ "$DESIRED_CUDA" = "cpu" ]; then
echo "BASE_CUDA_VERSION is not set. Building cpu wheel."
python /builder/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn
else
echo "BASE_CUDA_VERSION is set to: $DESIRED_CUDA"
python /builder/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn --enable-cuda
fi
15 changes: 7 additions & 8 deletions aarch64_linux/aarch64_wheel_ci_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def build_ArmComputeLibrary() -> None:
"clone",
"https://github.com/ARM-software/ComputeLibrary.git",
"-b",
"v23.08",
"v24.04",
"--depth",
"1",
"--shallow-submodules",
Expand Down Expand Up @@ -122,12 +122,10 @@ def update_wheel(wheel_path) -> None:
"/usr/local/cuda/lib64/libcudnn_cnn_train.so.8",
"/usr/local/cuda/lib64/libcudnn_ops_infer.so.8",
"/usr/local/cuda/lib64/libcudnn_ops_train.so.8",
"/opt/conda/envs/aarch64_env/lib/libopenblas.so.0",
"/opt/conda/envs/aarch64_env/lib/libgfortran.so.5",
"/opt/conda/envs/aarch64_env/lib/libgomp.so.1",
"/opt/OpenBLAS/lib/libopenblas.so.0",
"/acl/build/libarm_compute.so",
"/acl/build/libarm_compute_graph.so",
"/acl/build/libarm_compute_core.so",
]
# Copy libraries to unzipped_folder/a/lib
for lib_path in libs_to_copy:
Expand All @@ -140,10 +138,10 @@ def update_wheel(wheel_path) -> None:
os.system(f"cd {folder}/tmp/; zip -r {folder}/cuda_wheel/{wheelname} *")
shutil.move(
f"{folder}/cuda_wheel/{wheelname}",
f"/dist/{wheelname}",
f"{folder}/{wheelname}",
copy_function=shutil.copy2,
)
os.system(f"rm -rf {folder}/tmp {folder}/dist/cuda_wheel/")
os.system(f"rm -rf {folder}/tmp/ {folder}/cuda_wheel/")


def complete_wheel(folder: str) -> str:
Expand Down Expand Up @@ -201,8 +199,9 @@ def parse_arguments():
branch = "master"

print("Building PyTorch wheel")
build_vars = "CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000 "
os.system("python setup.py clean")
os.system("export USE_PRIORITIZED_TEXT_FOR_LD=1")
build_vars = "MAX_JOBS=5 CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000 "
os.system("cd /pytorch; python setup.py clean")

override_package_version = os.getenv("OVERRIDE_PACKAGE_VERSION")
if override_package_version is not None:
Expand Down
2 changes: 1 addition & 1 deletion aarch64_linux/build_aarch64_wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ def build_ArmComputeLibrary(host: RemoteHost, git_clone_flags: str = "") -> None
print('Building Arm Compute Library')
acl_build_flags=" ".join(["debug=0", "neon=1", "opencl=0", "os=linux", "openmp=1", "cppthreads=0",
"arch=armv8a", "multi_isa=1", "fixed_format_kernels=1", "build=native"])
host.run_cmd(f"git clone https://github.com/ARM-software/ComputeLibrary.git -b v23.08 {git_clone_flags}")
host.run_cmd(f"git clone https://github.com/ARM-software/ComputeLibrary.git -b v24.04 {git_clone_flags}")
host.run_cmd(f"cd ComputeLibrary && scons Werror=1 -j8 {acl_build_flags}")


Expand Down
6 changes: 3 additions & 3 deletions check_binary.sh
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@ fi
if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
echo "Checking that MKL is available"
build_and_run_example_cpp check-torch-mkl
elif [[ "$(uname -m)" != "arm64" ]]; then
elif [[ "$(uname -m)" != "arm64" && "$(uname -m)" != "s390x" ]]; then
if [[ "$(uname)" != 'Darwin' || "$PACKAGE_TYPE" != *wheel ]]; then
if [[ "$(uname -m)" == "aarch64" ]]; then
echo "Checking that MKLDNN is available on aarch64"
Expand All @@ -354,7 +354,7 @@ if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
echo "Checking that XNNPACK is available"
build_and_run_example_cpp check-torch-xnnpack
else
if [[ "$(uname)" != 'Darwin' || "$PACKAGE_TYPE" != *wheel ]]; then
if [[ "$(uname)" != 'Darwin' || "$PACKAGE_TYPE" != *wheel ]] && [[ "$(uname -m)" != "s390x" ]]; then
echo "Checking that XNNPACK is available"
pushd /tmp
python -c 'import torch.backends.xnnpack; exit(0 if torch.backends.xnnpack.enabled else 1)'
Expand All @@ -375,7 +375,7 @@ if [[ "$OSTYPE" == "msys" ]]; then
fi

# Test that CUDA builds are setup correctly
if [[ "$DESIRED_CUDA" != 'cpu' && "$DESIRED_CUDA" != 'cpu-cxx11-abi' && "$DESIRED_CUDA" != *"rocm"* ]]; then
if [[ "$DESIRED_CUDA" != 'cpu' && "$DESIRED_CUDA" != 'cpu-cxx11-abi' && "$DESIRED_CUDA" != *"rocm"* && "$(uname -m)" != "s390x" ]]; then
if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
build_and_run_example_cpp check-torch-cuda
else
Expand Down
Loading