Skip to content

Commit 7869a1c

Browse files
Updating featured image
1 parent b1a5b23 commit 7869a1c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2022-12-12-scaling-pytorch-fsdp-for-training-foundation-models-on-ibm-cloud.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
layout: blog_detail
33
title: "Scaling PyTorch FSDP for Training Foundation Models on IBM Cloud"
44
author: Linsong Chu, Less Wright, Hamid Nazeri, Sophia Wen, Raghu Ganti, Geeta Chauhan
5-
featured-img: "/assets/images/scaling-pytorch-fsdp-image1-IBM_scaling_FSDP_visual.png"
5+
featured-img: "/assets/images/scaling-pytorch-fsdp-image1-IBM_scaling_FSDP_visual_new.png"
66
---
77

88
Large model training using a cloud native approach is of growing interest for many enterprises given the emergence and success of [foundation models](https://research.ibm.com/blog/what-are-foundation-models). Some AI practitioners may assume that the only way they can achieve high GPU utilization for distributed training jobs is to run them on HPC systems, such as those inter-connected with Infiniband and may not consider Ethernet connected systems. We demonstrate how the latest distributed training technique, Fully Sharded Data Parallel (FSDP) from PyTorch, successfully scales to models of size 10B+ parameters using commodity Ethernet networking in IBM Cloud.

0 commit comments

Comments
 (0)