Skip to content

Docker cuda builds are failing as of 09.08.2023 #1473

Closed
@atalman

Description

@atalman

Docker builds are failing:
https://github.com/pytorch/builder/actions/runs/5812495270/job/15758529340

build-docker-cuda(11.8) error:

342.4 terminate called after throwing an instance of 'boost::filesystem::filesystem_error'
342.4   what():  boost::filesystem::copy_file: No space left on device: "./builds/nsight_compute/target/linux-desktop-glibc_2_11_3-x64/libcuda-injection.so", "/usr/local/cuda-11.8/nsight-compute-2022.3.0/target/linux-desktop-glibc_2_11_3-x64/libcuda-injection.so"
342.6 ./cuda_11.8.0_520.61.05_linux.run: line 524:    41 Aborted                 (core dumped) ./cuda-installer --toolkit --silent
.....
Dockerfile:58
--------------------
  56 |     # Install CUDA
  57 |     ADD ./common/install_cuda.sh install_cuda.sh
  58 | >>> RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh
  59 |     
  60 |     FROM base as intel
--------------------
ERROR: failed to solve: process "/bin/sh -c bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh" did not complete successfully: exit code: 134

[Edit - Not an Issue with rocm] rocm 5.5 error:

#21 [base 7/9] RUN wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm &&     rpm -ivh epel-release-latest-7.noarch.rpm &&     rm -f epel-release-latest-7.noarch.rpm
#21 sha256:0b00038419e49f1f06dc7b880efd8e1cecc68ce4550dbcc15b57efc18844ac3c
#21 0.500 --2023-08-09 18:23:17--  http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
#21 0.501 Resolving dl.fedoraproject.org (dl.fedoraproject.org)... 38.145.60.23, 38.145.60.22, 38.145.60.24
#21 0.504 Connecting to dl.fedoraproject.org (dl.fedoraproject.org)|38.145.60.23|:80... connected.
#21 0.505 HTTP request sent, awaiting response... 403 Forbidden
#21 0.507 2023-08-09 18:23:17 ERROR 403: Forbidden.
#21 0.507 
#21 ERROR: executor failed running [/bin/sh -c wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm &&     rpm -ivh epel-release-latest-7.noarch.rpm &&     rm -f epel-release-latest-7.noarch.rpm]: exit code: 8

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions