HPC Server PCB Fabrication for Enterprise and HPC OEMs

Figure 1. HPC server PCB fabrication

Highleap Electronics performs HPC server PCB fabrication based on customer-provided designs or Gerber files for high-performance computing OEMs, national laboratories, university supercomputing centers, and Tier-1 HPC system vendors. We manufacture PCBs according to customer specifications, including compute node mainboards (dual- and quad-socket Xeon Max, EPYC HPC, A64FX, NVIDIA Grace/Grace Hopper carriers), InfiniBand HDR/NDR and Slingshot/Omni-Path interconnect cards, liquid-cooled cold-plate carriers, storage and login node boards, rack-level power and management boards, and HBM-host integration carriers.

Get A Quick Quote

What HPC OEMs Need from a PCB Fabricator (vs AI Training)
Compute Node Mainboard Fabrication — Xeon Max, EPYC HPC, ARM HPC
HBM-Host and Grace Hopper-Class Carrier Board Fabrication
InfiniBand HDR/NDR HCA and Interconnect Card Fabrication
Direct Liquid Cooling — Cold Plate Carriers, CDU & Manifold Boards
Memory Subsystem — DDR5, CXL, and HBM Power Delivery
Storage Node, Login Node & Cluster Management Boards
Quality, AVL Qualification & Long-Life Program Support
Engaging Highleap on Your HPC Cluster Build

1. What HPC OEMs Need from a PCB Fabricator (vs AI Training)

HPC server programs and AI training server programs look similar from a distance — both build dense compute hardware with high-bandwidth interconnect, both use liquid cooling at scale, both target a small number of high-value customers. The differences emerge in program timeline, qualification rigor, and lifecycle expectations. Our high-performance computing PCB manufacturing capability spans both AI/HPC convergence builds and traditional CPU-dense HPC clusters.

HPC procurement cycle differences

Procurement timeline: HPC system contracts run 2–4 years from RFP to system acceptance; AI hardware programs run 12-18 months.
Qualification rigor: national labs and university HPC centers typically require formal acceptance testing including PCB-level verification samples; many AI hardware programs accept hyperscaler-style AVL flows with less formality.
Customer concentration: HPC market is highly concentrated — DOE labs, major universities, Top500-listed sites — and customer relationships span decades. AI hardware is more diffuse with hyperscalers, sovereign AI programs, and enterprise AI buyers across many vendors.
Sustainment lifecycle: HPC systems frequently operate 7-10 years in service; AI training hardware refreshes every 2-3 years.

HPC engineering requirements that differ from AI training

CPU-dominant architectures still exist: while AI/HPC convergence is real, many HPC workloads (CFD, MD, climate modeling) remain CPU-dominant; compute nodes have 4-8 CPU sockets with maximum DDR5 channel count.
Memory capacity vs bandwidth balance: HPC workloads often need both — 4 TB+ per node with high bandwidth is not unusual; CXL expansion increasingly important.
Vector processing emphasis: AVX-512, SVE2, and similar vector ISAs drive specific routing and power-delivery requirements for sustained vector workloads.
Network/storage convergence: InfiniBand with RDMA for both interconnect and storage; one network rather than separate compute and storage networks.
Burst-buffer and tiered storage: dedicated burst-buffer nodes with high-performance NVMe and DAOS/Lustre/GPFS support; specialized storage carrier boards.

HPC procurement-side requirements

Formal RFP responses: detailed proposals matching customer technical and commercial requirements.
System acceptance support: sample boards and full documentation for customer acceptance testing.
Sustainment commitment: 7-10 year spare-parts availability commitment with documented EOL management.
Audit and traceability: some national lab and defense-adjacent programs require full supply chain audit trails.
Country-of-origin documentation: Buy America, Buy European, and sovereign-supply requirements where applicable.

2. Compute Node Mainboard Fabrication — Xeon Max, EPYC HPC, ARM HPC

HPC compute node mainboards anchor every cluster. The current platform generation is HBM-equipped CPUs and accelerator-CPU hybrids; we build mainboards for the full landscape.

Intel Xeon Max (Sapphire Rapids HBM) and Granite Rapids HPC mainboards

Xeon Max: 64 GB HBM2e on package alongside up to 8 DDR5 channels per socket; significant power and thermal density.
Layer count: typically 16-22 layers on dual-socket Xeon Max mainboards — our multilayer PCB manufacturing line supports HPC mainboards from 12 layers up through 32+ layer dual-controller configurations.
Routing complexity: 8 DDR5 channels per socket × 2 sockets = 16 channels of DDR5 routing in addition to PCIe Gen5 and inter-socket UPI links.
UPI link routing: 4-link or 6-link UPI between sockets; tight signal-integrity requirements similar to PCIe Gen5.
Power delivery: 350-400W TDP per socket; dense VRM placement with low-impedance power distribution.
Material: I-Tera MT40 on UPI and PCIe Gen5 layers; FR408HR on DDR5 layers; 370HR for power layers.

AMD EPYC HPC (Genoa-X, Bergamo, Turin) mainboards

EPYC Genoa-X: 96 cores with 1 GB L3 cache (3D V-Cache); 12 DDR5 channels per socket.
EPYC Turin: next-generation; up to 192 cores per socket.
Layer count: 18-24 layers typical for dual-socket EPYC HPC mainboards due to higher DDR5 channel count.
Infinity Fabric inter-socket links: dense high-speed routing between sockets; similar PCB requirements to UPI.
PCIe Gen5 lane count: 128 lanes per socket = 256 lanes per dual-socket system; significant routing density even before considering DDR5.

Fujitsu A64FX and ARM HPC mainboards

A64FX: 48 cores with 32 GB HBM2 in package; ARMv8.2-A SVE; powered Fugaku exascale system.
NVIDIA Grace: 72-core ARM Neoverse V2 with 480-960 GB LPDDR5X; aimed at HPC, AI, and data analytics.
Ampere Altra Max and AmpereOne: general-purpose ARM server CPUs increasingly used in HPC.
Layer count and complexity: similar to x86 HPC mainboards; LPDDR5X-equipped Grace mainboards have unique routing due to soldered-down memory rather than DIMM slots.

Quad-socket and eight-socket fat-node mainboards

Use case: in-memory database, large-shared-memory simulations, specialized HPC workloads that fit poorly on distributed memory.
Layer count: 22-28 layers due to socket interconnect complexity.
Inter-socket fabric: UPI or Infinity Fabric scaling poorly above 4 sockets; some quad-socket designs use ring topology, others use switched topology.
Memory capacity: 8-16 TB per node typical; significant DIMM real estate on the board.

Figure 2. HPC server PCB

3. HBM-Host and Grace Hopper-Class Carrier Board Fabrication

The HBM-equipped CPU is one of the defining HPC processor architectures of this generation. Whether HBM is on-package (Xeon Max, A64FX) or attached through chip-to-chip interconnect (Grace Hopper Superchip), the host PCB has specific requirements for power delivery, thermal management, and signaling that differ from non-HBM mainboards.

Grace Hopper Superchip carrier fabrication

Architecture: NVIDIA Grace ARM CPU + Hopper GPU bonded via NVLink C2C (chip-to-chip); 900 GB/s coherent NVLink between CPU and GPU.
Carrier board construction: typically 18-24 layers.
NVLink C2C routing: short, high-speed coherent fabric between the two chips on the same carrier; tightest signal-integrity requirements on the board.
HBM3 (on Hopper die): not routed on PCB but supported by extensive power delivery to the Hopper die package.
LPDDR5X (Grace-attached): soldered on the carrier near the Grace chip; LPDDR5X routing at high data rate.
Material: Tachyon 100G or Megtron 7 for NVLink C2C; I-Tera MT40 for PCIe Gen5 host; FR408HR for LPDDR5X.

Power delivery for HBM-equipped CPUs

Currents: HBM-equipped CPUs draw 400-500W TDP under sustained vector workload; sustained currents of 500-800A on core rails.
VRM placement: high-current VRMs with point-of-load placement under or adjacent to the CPU package.
Plane copper: 2-3 oz copper on core power planes; some designs use 4 oz on specific high-current rails.
Decoupling capacitor density: hundreds of capacitors per socket distributed across the power-delivery hierarchy.
Plane impedance: low-impedance plane design verified by power-integrity simulation at the design stage; PCB fabrication must hit the modeled stackup precisely.

HBM-host package fabrication considerations

HBM-equipped CPU packages are physically large and mechanically heavy. The PCB beneath them must support the package mechanically (under cold-plate clamping force or socket retention), thermally (transferring heat from the package backside and from VRMs), and electrically (delivering hundreds of amperes through power vias under the package). Tooling-hole tolerance, mechanical reinforcement features, and via-array design under the package all matter for HBM-host PCB fabrication in ways they don’t for conventional CPU mainboards.

4. InfiniBand HDR/NDR HCA and Interconnect Card Fabrication

HPC interconnect drives system-level performance more than any other single component. InfiniBand dominates the high-end HPC market with HDR (200 Gbps) and NDR (400 Gbps) products; HPE Slingshot (originally Cray) and Intel Omni-Path are alternatives on specific Top500 systems.

InfiniBand HDR HCA carrier board fabrication

Adapter chip: NVIDIA ConnectX-6 (HDR) or ConnectX-7 (HDR/NDR).
Port count: single-port or dual-port HDR cards typical.
PCIe Gen4 host interface: 16 lanes Gen4 to host CPU; some Gen5 hosts run the same card.
Network interface: QSFP56 connector for HDR cabling.
Layer count: 14-18 layers on dual-port HDR HCAs.
Material: I-Tera MT40 for PCIe and network routing; 370HR for power and ground.

InfiniBand NDR HCA carrier board fabrication

Adapter chip: NVIDIA ConnectX-7 NDR with 400 Gbps capability.
PCIe Gen5 host interface: 16 lanes Gen5; back-drilling mandatory on host-side vias.
Network interface: OSFP connector for NDR cabling.
Layer count: 16-20 layers due to higher-speed routing density.
Material: Tachyon 100G or Megtron 7 on critical NDR network and Gen5 host signal layers.
Compensated launches: precision impedance-matched launches at PCIe edge connector and OSFP cage.

HPE Slingshot and Omni-Path HCA fabrication

Slingshot: 200 Gbps interconnect on Cray/HPE EX-class systems (Frontier, El Capitan); Ethernet-compatible at the physical layer with HPC-specific congestion control.
Omni-Path: Intel’s 100 Gbps interconnect; legacy but still in production on specific cluster classes.
HCA card construction: similar to InfiniBand HCAs; PCIe Gen4 host, custom network interface.

InfiniBand switch line card fabrication

NVIDIA Quantum-2 NDR switch: 64 ports × 400G or 32 ports × 800G per switch chip.
Switch line card layer count: 28-32 layers.
Routing density: hundreds of differential pairs from switch ASIC to OSFP cages.
Material: Tachyon 100G throughout signal layers.
Power delivery: 800-1200W per switch chip; dense VRM design and heavy power plane copper.

5. Direct Liquid Cooling — Cold Plate Carriers, CDU & Manifold Boards

Modern HPC systems are predominantly direct liquid cooled at the CPU socket. National laboratory exascale systems push this further with warm-water cooling enabling year-round free cooling. PCB design for liquid-cooled HPC has matured but still requires careful engineering and fabrication.

Cold-plate carrier board fabrication

Mechanical alignment: precision tooling holes for cold plate alignment with ±0.10 mm positional tolerance.
Board flatness: ±0.15 mm coplanarity across CPU footprint to ensure uniform cold-plate contact.
Component height restrictions: components under the cold plate cannot exceed cold-plate-relief depth; tight DFM coordination with cold-plate vendor.
Mounting reinforcement: board carriers or stiffeners support clamping force; PCB design accommodates carrier interface.

CDU (coolant distribution unit) control board fabrication

Function: manage rack-level coolant flow, temperature setpoints, pump control, leak detection, alarms.
Architecture: typically embedded ARM SBC with industrial I/O (4-20 mA flow sensors, RTD temperature inputs, valve drivers).
Layer count: 6-8 layers; mixed analog/digital with appropriate isolation.
Material: high-Tg FR4 (Isola 370HR) for industrial reliability.
IPC Class: Class 3 acceptance for failure-mode-critical components.

Manifold sensor board fabrication

Distributed sensing: flow rate, temperature, pressure at multiple manifold points.
Small format: typically 60 × 40 mm with M12 industrial connectors.
Conformal coating: AR or UR coating standard for moisture protection.
Lead-free reflow: standard SAC305 process compatible with all materials specified.

Leak detection and drip pan boards

Capacitive leak sensors: distributed under cold plates and at potential leak points.
Drip pan electronics: drainage monitoring with alarm signaling to rack BMC.
Reliability: false-positive rate critical (false alarms shut down compute); false-negative rate also critical (missed leaks damage equipment).

Figure 3. Storage Server PCB Manufacturer

6. Memory Subsystem — DDR5, CXL, and HBM Power Delivery

HPC workload memory requirements drive specific PCB fabrication choices for DDR5 routing, CXL memory expansion, and HBM power delivery on the carrier.

DDR5 routing for HPC mainboards

Channel count: 8 channels per socket (Intel Xeon Max), 12 channels per socket (AMD Genoa-X).
Data rate: DDR5-4800 to DDR5-6400 depending on platform; rising with each generation.
Impedance: 40Ω single-ended ±10% standard; ±7% achievable.
Length matching: intra-byte ±2 mil; byte-to-byte tolerance varies by speed.
Material selection: FR408HR for short channels; I-Tera MT40 for longer channels or higher data rates; 370HR acceptable on short low-rate runs.
Glass-weave: 1080 or 2113 glass on signal layers for high-rate DDR5.

CXL memory expansion fabrication

CXL 2.0 attachment: PCIe Gen5 PHY; CXL memory expanders attach via PCIe slots or specialized CXL connectors.
HPC use case: memory pooling to expand effective per-node memory beyond DIMM capacity; particularly valuable for in-memory analytics, large-grid simulations, and AI/HPC convergence workloads.
Fabrication requirement: Gen5-class precision on differential pair impedance and back-drilling.
Backplane builds: CXL switching introduces a new class of backplane and switch boards in HPC systems.

HBM power delivery on HBM-equipped CPU carriers

While HBM stacks are integrated in-package on Xeon Max, A64FX, and Hopper GPU products, the host PCB must deliver substantial current to support both the host die and the HBM stacks. Power planes must handle currents that would exceed conventional CPU mainboard design assumptions, with appropriate dense via patterns under the package to minimize voltage droop. Power-integrity simulation at design time is non-negotiable; PCB fabrication must hit the simulated stackup with ±5% dielectric thickness tolerance to preserve the modeled performance, and impedance control on all critical Gen5 differential pairs is required throughout the build.

7. Storage Node, Login Node & Cluster Management Boards

An HPC cluster contains many board types beyond compute nodes. Storage, login, and management infrastructure boards are smaller programs by volume but critical to system reliability.

Storage node fabrication

Lustre/GPFS storage servers: dual-socket Xeon or EPYC mainboards with high-density NVMe and SAS storage backplanes.
NVMe storage backplanes: U.2/U.3 hot-swap backplanes with PCIe Gen4/Gen5 routing to storage controllers.
DAOS storage nodes: persistent-memory-based storage for next-generation HPC storage tiers; specialized NVMe backplane and PMEM module integration.
Burst-buffer nodes: high-density NVMe servers serving as I/O acceleration tier between compute nodes and parallel filesystem.

Login node and management server fabrication

Login node mainboards: general-purpose server mainboards (single or dual socket) supporting user shell and job submission; not performance-critical, but reliability-critical.
Management servers: cluster scheduler (Slurm, LSF, PBS Pro) hosts; database servers for accounting and monitoring.
Standard server PCB practices: mid-loss material at most; cost-optimized stackups.

Top-of-rack and management switch boards

1 GbE management switches: separate management Ethernet network isolated from data fabric.
10/25 GbE service network switches: for storage access and management traffic.
Out-of-band management: serial console concentrators, IPMI/Redfish gateways.

Rack-level power and infrastructure

48V power shelf boards: centralized power distribution similar to AI training rack architecture.
Rack BMC mainboards: rack-level management of compute, storage, network, cooling.
Environmental sensor boards: temperature, humidity, vibration, intrusion monitoring.

8. Quality, AVL Qualification & Long-Life Program Support

HPC programs are characterized by long timelines and high stakes. Once a Top500 system is operational, board failures translate directly into research downtime — and the customer may need spare parts 7-10 years after the original procurement.

HPC quality flow requirements

IPC Class 3 acceptance: standard for compute and interconnect boards in HPC systems.
Microsection sampling: first article 100%, in-process sampling at high frequency (typically 1 per panel on critical builds).
S-parameter test: coupon-level insertion loss and return loss to 40 GHz on high-speed boards.
Thermal cycling validation: coupon-level reliability testing per IPC-TM-650 methods.
Documentation per delivery: CoC, mill certs, electrical test, impedance, S-parameter, AOI, microsection, visual inspection records.

AVL qualification for HPC OEMs

Pre-qualification audit: site visit by customer quality team; review of equipment, process, and documentation.
First article qualification: 25-200 piece sample build with full documentation.
Process validation: SPC data demonstrating process capability; control plans and FMEA documentation.
Customer-side acceptance: environmental testing, system-level validation, formal sign-off.
AVL maintenance: ongoing performance metrics, periodic audits, joint quality reviews.

Long-life program support

Spare parts inventory: committed spare-parts production capacity for 7-10 years after initial deployment.
Material lifecycle monitoring: we track laminate and component PCNs to identify long-lifecycle risk early.
Documented substitutability: for each qualified material, documented near-equivalent alternatives.
End-of-life management: 18-24 months before EOL, joint customer-supplier planning for last-time buys.
Archive retention: all design files, process records, and quality data retained for program duration + 7 years.

9. Engaging Highleap on Your HPC Cluster Build

For HPC OEMs, system integrators, and national-lab hardware partners evaluating PCB fabrication partners, our engagement model accommodates the long timelines and rigorous qualification flows characteristic of HPC procurement:

Pre-RFP technical consultation: at the request of customer engineering teams, we provide stackup recommendations, material guidance, and capability statements supporting RFP responses.
Proposal support: formal proposals with detailed pricing, lead time, capacity commitment, and quality documentation matching customer RFP requirements.
Prototype and qualification builds: 25-200 piece samples with full documentation supporting customer acceptance testing.
Production releases: capacity reservation aligned with system delivery schedule; coordinated change control flow.
Sustainment support: spare parts production and end-of-life management through full system operational lifetime.

Highleap is ISO 9001 and IATF 16949 certified, with AS9100D-aligned process flow available for HPC programs requiring defense-and-aerospace-grade quality documentation. We manufacture HPC server PCBs from 8 layers to 32+ layers, with HDI sequential lamination, controlled impedance to ±5%, heavy copper to 4 oz, and full surface finish coverage. Our high-speed digital line uses laser direct imaging at 25 µm resolution and HDI PCB manufacturing for the dense BGA fanout required around HBM-equipped CPUs and Grace Hopper carriers. Our customer base includes HPC system integrators, national laboratory hardware partners, and university supercomputing centers across multiple Top500 program generations.

Submit Gerber files, drill data, stackup specification, target quantities, and program timeline through our online quote portal for a 24-hour response. For RFP support, formal proposal development, or sustainment planning on existing HPC programs, our HPC team can engage directly to discuss scope and timeline. For related capability content, see our pages on server PCB manufacturing and rigid PCB capability covering layer counts, copper weight, and surface finish coverage for HPC mainboard programs.