Scyld ClusterWare HPC

Administrator's Guide

Penguin Computing, Inc.

Table of Contents
System Design Description
Scyld ClusterWare System Architecture
System Hardware Context
Network Topologies
System Data Flow
System Software Context
System Level Files
Scyld ClusterWare Technical Description
Compute Node Boot Procedure
BProc Distributed Process Space
Compute Node Categories
Compute Node States
Miscellaneous Components
Scyld ClusterWare Software Components
BeoBoot Tools
BProc Daemons
BProc Clients
ClusterWare Utilities
Configuring the Cluster with the BeoSetup GUI
BeoSetup Features
Starting BeoSetup
Full vs. Limited Privileges
Limiting Full Privileges
The BeoSetup Main Window
The File Menu
Configuration File...
Start Cluster
Service Reconfigure
Shutdown Cluster
The Settings Menu
Reread Cfg Files
The Help Menu
The Toolbar
Node Floppy
Node CD
Config Boot
The Node List Boxes
The Configured Nodes List
The Unknown Addresses List
The Ignored Addresses List
Configuring the Cluster Manually
Configuration Files
Command Line Tools
The Kernel Command Line
Useful Command Line Options
Adding New Kernel Modules
Accessing External License Servers
Configuring RSH for Remote Job Execution
Configuring RSH from Master to Compute Node
Configuring RSH from Compute Node to Master
Other Interconnects
Monitoring the Status of the Cluster
Monitoring Utilities
Cluster Monitoring Interfaces
Monitoring Daemons
Using the Data
BeoStatus File Menu
BeoStatus Modes
beostat and libbeostat
Managing Users on the Cluster
Managing User Accounts
Adding New Users
Removing Users
Managing User Groups
Creating a Group
Adding a User to a Group
Removing a Group
Controlling Access to Cluster Resources
What Node Ownership Means
Checking Node Ownership
Setting Node Ownership
Job Batching
Remote Administration and Monitoring
Command Line Tools
X Forwarding
Compute Node Boot Options
Compute Node Boot Media
Floppy Disk
Linux BIOS
Flash Disk
Changing Boot Settings
Adding Steps to the node_up Script
Per-Node Parameters
Other Per-Node Configuration Options
Error Logs
Disk Partitioning
Disk Partitioning Concepts
Disk Partitioning with Scyld ClusterWare
Master Node
Compute Nodes
Default Partitioning
Master Node
Compute Nodes
Partitioning Scenarios
Applying the Default Partitioning Scheme
Specifying a Manual But Homogeneous Partitioning Scheme
File Systems
File Systems on a Cluster
Local File Systems
Network/Cluster File Systems
NFS on Clusters
Configuration of NFS
Machine Configuration
Configuring the Metadata Server
Configuring the I/O Server and Clients
Starting the PVFS Daemons
Client Configuration
Client Native Library Interface
What Does ROMIO Do for Me?
Installation and Configuration of ROMIO
Other Cluster File Systems
What Happens When Compute Nodes Fail?
Compute Node Data
How Do I Protect My Application from Node Failure?
How Do I Prevent Node Failure?
Load Balancing
How Does a Scyld Cluster Load Balance?
Mapping Policy
Queuing Policy
How Can I implement My Own Schedule Policy?
Extra Tools
Linux System Hardware Monitoring
How Do I Use lm_sensors?
What are Possible Problems with lm_sensors?
Updating Software On Your Cluster
What Can't Be Updated
Special Directories, Configuration Files, and Scripts
What Resides on the Master Node?
/etc/beowulf Directory
/usr/lib/beoboot Directory
/var/beowulf Directory
/var/log/beowulf Directory
What Gets Put on the Compute Nodes at Boot Time?
Site-Local Startup Scripts