Using HPC Clusters the Easy Way with ARC Connect

2016 Michigan IT Symposium

Mark Montague  •  LSA Information Technology
markmont@umich.edu

What is an HPC cluster?

  • A High Performance Computing cluster.
  • Many fast computers (called nodes).
    • Usually each with many processing cores.
    • Usually each with lots of RAM.
  • Connected together via a local area network.
    • Usually a low latency, high speed network such as InfiniBand network.
  • Often with large amounts of high speed networked storage such as Luster.
  • Sometimes with specialized hardware such as GPUs (graphics cards used for computation) or Xeon Phi cards.

What is an HPC cluster?

  • Software to let programs use multiple nodes simultaneously on computations that are too large for any single node (such as MPI, the Message Passing Interface).
  • Scheduler software to dynamically allocate resources (cores, memory) to users to meet their requirements and maximize value and efficiency.
  • Resource manager software to prepare nodes for compute jobs, start/monitor/end compute jobs, and deliver results back to users.

HPC versus other types of clusters

  • HPC clusters are optimized for speed first, scalability second, and everything else third.
  • Usually not fault tolerant.
  • Do not load balance (minimizing the number of nodes is use is usually better for power consumption than spreading jobs across all nodes equally).
  • Used for computation rather than running enterprise services.

Uses of HPC clusters

  • Deep learning
  • Genetic sequencing and assembly
  • Business Intelligence
  • Protein folding
  • Simulation and modeling
  • Visualization

Flux

  • U-M's main HPC cluster.
    Anyone at U-M can buy time on Flux.
  • 1,286 nodes
    • The largest have 56 cores and 1,536 GB RAM each.
  • 21,729 cores.
  • 109 TB RAM.
  • 1.5 PB Luster filesystem for scratch data.
  • 220 GPUs.
  • 40 Gbps networking (IP over InfiniBand)
    with < 5 μs latency.

Flux



But...

  • Flux was designed for batch (non-interactive) computing:
    Submit one or more jobs, wait for them to run, collect the results.
  • Limited support for interactive graphics.
  • Large Linux command line learning curve:
    Managing files, Unix text editors, scripting.

ARC Connect

  • Goal: Enable easier access to HPC resources.
    • Reduce the learning curve, especially for users who are not comfortable in command line environments.
    • Enable easier and more powerful interactive graphical use of applications on Flux.
    • Enhance collaboration.
  • Don't reinvent: The Texas Advanced Computing Center graciously shared the code they wrote for the XSEDE Visualization Portal. We enhanced it here at U-M () to meet our needs and contributed the enhancements back to TACC.

ARC Connect Enhancements

U-M enhancements contributed back to TACC:
  • Federated authentication:
    Shibboleth + mandatory MFA + CILogon
    • Use home institution accounts to facilitate collaboration.
    • Don't ask for passwords!
    • ITAR today, HIPAA on the roadmap.
  • Mandatory encryption for VNC sessions
    (no SSH tunnels needed!)
  • Multiple session support.
  • Web reverse proxying instead of port mapping for Jupyter, RStudio, and end user web apps.

ARC Connect

  • Functionality:
    • VNC desktops.
      • Via a simple-to-use HTML5/WebSocket based viewer in users' web browsers.
      • Via a high performance VNC viewer application.
      • Multiple people can collaborate simultaneously in either full-control or view-only mode.
    • Jupyter notebooks: Python 2, Python 3, R, and SAS.
    • RStudio Server IDE.
    • Secure access to end-user installed and maintained web applications.

Demo

  1. Log in to U-M ARC Connect using a Duke University NetID.Currently non-UM users are supported only in the development environment, pending finalization of the InCommon MFA Profile. Other institutions can be added individually in the meantime.
  2. Start a VNC desktop session.
  3. Show the built-in HTML5/WebSockets VNC viewer, noVNC.
  4. Show how a user can share their VNC desktop with a collaborator or a class.

Live demos and discussion

How would you like to spend the remaining time?

  • Jupyter notebooks live demo.
  • RStudio Server IDE live demo.
  • VNC desktop sessions and collaboration live demo.
  • HPC the regular way (non-interactive job submission) live demo.
  • User-installed HPC web applications.
  • ARC Connect architecture.
  • Future directions.
  • ...?

User-installed HPC web applications

ARC Connect has built-in support for certain web apps:
  • Jupyter
  • RStudio Server


But users can use ARC Connect to set up and access their own web applications running on the cluster, too. ()

User-installed HPC web applications

To leverage the ARC Connect infrastructure to access their own web application, the end user:
  1. Installs the web app in their home directory on the HPC cluster.
  2. Submits a job to run the web application.
  3. Opens the following URL in their web browser, substituting the name of the compute NODE the job was assigned and the PORT number used by the web application:
    https://connect.arc-ts.umich.edu/w1/NODE/PORT/

User-installed web application demo

Anvi'o is a web-based analysis and visualization platform for 'omics data.

# Log in to the cluster:
ssh username@flux-login.arc-ts.umich.edu

# Load the necessary software modules:
module load python-anaconda2/latest gsl \
  prodigal/2.6.2 hmmer mcl centrifuge anvio

# Submit an interactive compute job:
qsub -I -V -A support_flux -q flux \
  -l nodes=1:ppn=2,mem=8gb,walltime=04:00:00,qos=flux

# When the interactive job starts, run Anvi'o:
anvi-self-test

# The URL for accessing Anvi'o will be printed by the
# anvi-self-test command.
Log in to the cluster:
Logged in!
Load the software and start an interactive job:
When the interactive job starts, run Anvi'o:
Anvi'o has started and printed its URL:
Paste the Anvi'o URL into Google Chrome:
Log in via Shibboleth:
Log in via Shibboleth:
...with mandatory MFA:
Then use Anvi'o, running on the HPC cluster!

Future directions

  • Perform security assessment to allow ARC Connect to be used with the U-M HIPAA aligned HPC cluster, Armis.
  • Expand the built-in web apps beyond Jupyter and RStudio Server.
  • Implement per-job user authorization lists for finer-grained sharing.
  • Make it easier for users to set up arbitrary web apps themselves.

Appendix: ARC Connect Architecture