You don’t just deploy Slurm; you engineer the logic behind it. You understand the nuances of topology-aware scheduling, multifactor priority plugins, and backfill algorithms. You have production experience configuring fairshare decay and cgroup containment to ensure multi-tenant equity in high-contention environments.
You are comfortable debugging InfiniBand congestion and RDMA issues at the packet level. You speak eBPF, DPDK, and syscalls. You know exactly which kernel parameters (read_ahead_kb, vm.dirty_ratio) to tune to eliminate jitter and squeeze the last 5% of performance out of a B200 cluster.
You solve the "IO starvation" bottlenecks that baffle generalist sysadmins. You have deep architectural experience with parallel filesystems (WEKA, Lustre, VAST, GPFS). You know how to implement GPUDirect Storage to bypass the CPU and optimize stripe counts for massive concurrent simulations.
Metranis is not a staffing agency; we are a decentralized guild of elite infrastructure architects. We do not do Tier-1 support. We enter the room when standard deployments hit the wall: when iowait spikes, when MPI jobs fragment, and when off-the-shelf configurations fail to saturate the GPUs.
You remain independent. You choose the engagements that fit your schedule and expertise. No non-competes, no "bench time."
We work exclusively on complex infrastructure for R1 Research Universities, HFT firms, and Pharma leaders and major research institutions. No generic IT work.
We strip away the middle management. No daily standups, no busy work. Just deep technical execution with peers who respect your time.
If you are interested in future project work with the Metranis Collective, please submit your profile below.