Containers Tutorial
Docker is a platform for developing, shipping, and running applications inside containers. Containers are lightweight, portable, and ensure that applications run consistently across different environments. However, since we will be working on TACC’s HPC systems, this tutorial will be using Apptainer.
What is a Docker Image?
A Docker image is a pre-configured package that contains everything needed to run an application, including the code, runtime, libraries, and dependencies. Once an image is instantiated, it becomes a container, which is an isolated runtime environment.
Apptainer vs Container
Apptainer (formerly Singularity) is a containerization platform designed specifically for high-performance computing (HPC) environments, offering a solution optimized for scientific research and large-scale data processing. Unlike general containers like Docker, which require root privileges and are commonly used for development and cloud-based applications, Apptainer is built to run efficiently on shared systems, such as TACC’s supercomputers and clusters. It provides portability, reproducibility, and seamless integration with HPC job schedulers making it ideal for researchers who need to run complex applications in secure, isolated environments without compromising performance or requiring administrative access.
Prerequisites
- Before you begin, ensure that you have the following:
A working internet connection to download Docker.
Steps to Install PyTorch with CUDA on Docker
we call it this but its not really ON docker or using docker other than docker hub, is this the intended tutorial? im not sure if i have it right Apptainer runs Docker containers on HPC systems
Step 1: Install Docker (if not already installed)
Step 2: Run the SSH Command Use the following command to connect to TACC systems:
ssh <username>@<hostname>
(replace <username> with your TACC username and <hostname> with the system hostname)
Example: To connect to the Frontera system:
ssh username@frontera.tacc.utexas.edu
Step 3: Enter Your Password When prompted, type your TACC password. If this is your first time logging in, you may be required to set up or reset your password.
Step 4: Two-Factor Authentication TACC systems require two-factor authentication. Follow the on-screen prompts to complete the process.
Step 5: Request a Node If you try to download a Docker image right off the bat, your terminal will warn you!
(base) something$ idev -N 2 -n 2 -p rtx-dev -t 02:00:00
//explain what this means, the Flags This might take a while but you will know that you have successfully loaded into a node when your command line shows (base) some numbers and what not
Step 6: Load in Apptainer
(base) something$ module list
Currently Loaded Modules:
1) intel/19.1.1 4) autotools/1.2 7) hwloc/1.11.12 10) tacc-apptainer/1.3.3
2) impi/19.0.9 5) python3/3.7.0 8) xalt/2.10.34
3) git/2.24.1 6) cmake/3.24.2 9) TACC
(base) something$ module load tacc-apptainer
verify with:
(base) something$ type apptainer
apptainer is /opt/apps/tacc-apptainer/1.3.3/bin/apptainer
Step 5. Pull a Prebuilt PyTorch Docker Image
Instead of creating our own Dockerfile, we can use an official PyTorch image from DockerHub
Note
DockerHub is official cloud-based repository where developers store, share, and distribute Docker images. Similar to GitHub but for Docker containers.
Run the following command to pull the latest PyTorch image with CUDA support.
apptainer pull output.sif docker://pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
This will download the image and convert it into an Apptainer image format (.sif). You can replace “output.sif” with whatever you would like to name the file. Otherwise it will default to the name of the image.
Note
CUDA is an API that allows software to utilize NVIDIA GPUs for accelerated computing. This is essential for deep learning because GPUs process tasks much faster than CPUs. Since TACC machines have NVIDIA GPUs, we must use a CUDA-enabled PyTorch image to fully leverage GPU acceleration.
Step 7. Start an Interactive Apptainer Shell
Once the image is downloaded, we can enter the Apptainer shell by:
(base) something$ apptainer shell output.sif
Now we are in our own isolated environment free to do whatever we would like with it.
Step 8. Testing it Out
Once inside the container, switch over to your $SCRATCH directory and install this script.
(base) soemthing$ git clone https://github.com/pytorch/examples.git
(base) something$ torchrun --nproc_per_node=4 examples/distributed/ddp-tutorial-series/multigpu_torchrun.py 50 10
Step 9: Verifying the Script Execution Once you’ve executed the script, you can check the output directly in your terminal. If there are any issues or errors, they will be displayed in the terminal.
Conclusion
You have now successfully pulled a PyTorch image from Docker Hub, mounted local directories into the container, and run a Python script within an Apptainer container.
Special thanks to the Containers at TACC tutorial https://containers-at-tacc.readthedocs.io/en/latest/index.html
For further help, refer to the official Apptainer documentation at: https://apptainer.org/docs
First example, single node pytorch installation guide with just tacc machine Look at gabriels doc for differnt pytorch images
Second example, build docker file on local, push to docker hub, pull onto tacc system