1 of 7

Sycomp Storage in Google Cloud: Architecture

Sycomp Storage

Sycomp Storage in Google Cloud: Architectural Overview

Introduction

Sycomp Storage enables you to run your high performance computing (HPC), AI/ML and big data workloads in Google Cloud. With our solution you can concurrently access data from thousands of VMs, reduce costs by automatically managing tiers of storage and run your application on-premises or in Google Cloud. Sycomp Storage is available in the Google Marketplace, can be deployed in minutes and supports access to your data via NFS and the native IBM Storage Scale client.

This document provides architectural guidance to help you design and deploy a POSIX compliant file system for HPC, big data or other applications that require a high level of storage performance using Sycomp Storage.

Audience

This document was written for technical personnel responsible for designing or deploying POSIX compliant file systems using Sycomp Storage.

Overview of IBM Storage Scale

Sycomp chose IBM Storage Scale because it is a proven parallel filesystem that securely manages large volumes of data. The high performance, feature rich filesystem is well suited for HPC, AI/ML, big data and other applications requiring a POSIX compliant shared filesystem. With adaptable storage capacity and performance scaling Sycomp Storage can support your demanding workloads.

You choose how you want to access your data stored in IBM Storage Scale. You can use NFS and the native Storage Scale client. Adding storage or nodes is easy and can be done while the data remains online. This means that you can easily start small and grow a cluster.

Google Marketplace Solution

You deploy an instance of Sycomp Storage using Terraform modules provided by Sycomp through Google marketplace. When deploying a cluster, you choose:

Whether to use Scale clients deployed by Sycomp Storage or access the data from your existing application VMs.
The amount and type of storage.
The machine types for Sycomp Storage severs and application clients.

The Sycomp Storage license includes the use of the IBM Storage Scale and Red Hat software with how-to and break-fix support. You choose the machine and storage types when Sycomp Storage is deployed so the cost for VMs, storage and other Google Cloud resources are not included in the Sycomp Storage cost.

Deployment Models

Different ways you can deploy a Sycomp Storage cluster

Sycomp Storage

When deploying Sycomp Storage you can choose the amount of storage, the number of VMs running the IBM Storage Scale native client, the machine types to use and other options. Optionally you can use NFS or install the IBM Storage Scale client on your application nodes to give them access to the data.

Figure 1 is an example of a Sycomp Storage deployment in Google Cloud. The Sycomp tools deploy the IBM Storage Scale cluster and a set of VMs running IBM Storage Scale native client that can be used to run your HPC or big data application. In this example Google balanced persistent storage was used for reliability, you can choose what type of Persistent Disk you want to use.

The VMs running IBM Storage Scale Native Client and the VMs using NFS can concurrently access the data stored in the Sycomp Storage cluster. All resources are deployed within a single project.

Local SSD

You can reduce the number of Network Shared Disk (NSD) server VMs needed to reach a performance target by using Google Local SSD.

Local SSD is ephemeral storage that offers a high number of input and output operations per second (IOPS) and low latency. To prevent data loss, Sycomp Storage leverages Storage Scale replication for Local SSD storage pools. With replication enabled the storage pool can survive the failure of a storage VM or Local SSD. In the event of a failure Storage Scale automatically creates new replicas to prevent against future failures.

Application cluster

Whether you want to deploy multiple machine types, add a set of application nodes to handle increased year end processing, or quickly spin up a DR application cluster an application cluster is a good way to manage your compute nodes.

Figure: Example of an application cluster, once deployed this cluster is attached to a Sycomp Storage cluster containing a file system.

A Storage Scale application cluster accesses an existing file system using Storage Scale multi-cluster. Sycomp Storage tools make it easy to maintain application clusters.

File System Encryption

Your deployment can be enhanced by enabling data encryption with an HA pair of VMs running IBM Security Guardium Key Lifecycle Manager (GKLM) to manage and protect your encryption keys.

Sycomp Storage automates the deployment of an HA pair of GKLM server VMs.

Cluster Expansion

You can add storage, NFS servers and clients to an existing Sycomp Storage cluster. Adding storage capacity is achieved by adding NSD servers. When you add NSD servers to the cluster each new servers gets the same disk configuration as the existing NSD servers. Storage is added in this manner to maintain balance across NSD servers. You can add NFS servers to increase capacity. Adding clients in groups allows you to customize the names and machine types of each set of clients giving you more control over your deployment.

Sycomp Storage ensures cluster expansion maintains IBM Storage Scale best practices when storage or nodes are added to the cluster by automatically deploying the correct version of software, placing disks in the correct pools and evenly distributing disks across NSD servers for optimal performance.

Intelligent Data Caching

Sycomp Storage allows you to create a copy of a remote data set that exists in Google Cloud Storage or an NFS share in a high-performance POSIX file system. The caching tool has an extensive set of features.

Data on demand - File metadata and data is copied into the Sycomp Storage file system when a file is accessed. A directory listing in the Sycomp Storage file system fetches the file metadata, and a file open initiates a copy of the file from the remote NFS or Object store.
Data Prefetch - To keep from wasting GPU time you can ask Sycomp Storage to prefetch a set of data before your compute job starts. You can prefetch a directory tree or provide a custom list of files.
Read-only - You can create a read-only copy of the source data or a read-only copy that allows local modifications but never pushes the changes back to the source.
Writable - You can create a data copy that pushes changes made in the Sycomp Storage file system back to the NFS or Object source.
Many-to-one - You can have Sycomp Storage clusters deployed around the world accessing the same object bucket. This allows you to run your job where there is available hardware.
Configurable - You decide how often to check for source updates and how long to wait before pushing changes.

Integration with Google Cloud Storage

Once the cluster is deployed you can enable the Storage Scale connection to cloud object storage (COS) to automatically push your application data to Google Cloud Storage or dynamically hydrate the Sycomp Storage cluster from an existing bucket.

This makes it easy for you to use existing Google Cloud Storage data in a Sycomp Storage file system or push your file system data into the object store. You can add gateway VMs to increase the bandwidth of data movement between the file system and the object store.

Moving data to and from Google Cloud storage opens a wide range of options that can reduce storage costs, enable data sharing across applications and support bursting workloads to the cloud.

Hybrid Deployment

You can integrate your on-premises systems with the Google public cloud with bi-directional data flow.

Connect to an on-premises Storage Scale cluster, and NFS server or use Google Cloud Storage to share data between your on-premises data and your Sycomp Storage deployment in Google Cloud. Sycomp Storage can automatically move data to and from any NFS data source on-premises or in the cloud.

Automated data movement can be integrated with a job scheduler like Slurm or IBM Spectrum LSF to ensure data is where you need it so you do not pay for idle cloud compute time.

Design Guidelines

Use the following guidelines to design a POSIX file system that meets the requirements of your workload. The guidelines provide a framework to help you assess the storage requirements of your workload.

Evaluate the available storage options
Size your Sycomp Storage cluster.

Workload requirements

Identify the storage requirements of your high-performance workloads. Define the current requirements making sure you consider your growth requirements. Use the following questions as a starting point to identify the requirements of your workload:

How much storage capacity do you need today? In a year?
How many I/O operations per second (IOPS) or throughput in GBytes per second do you need? Do you need additional capacity to achieve the performance target?
Will data need to be moved between on-premises and the cloud?
Is there existing data in a Google Cloud Storage bucket that needs to be moved into the cluster?
Do you want to schedule data movement from one storage type to another? For example, migrate a file in a file system to a Google Cloud bucket.
Do you need persistent storage, scratch storage, or both?
Do you have a backup software vendor that you use?
Do you have a DR strategy?
How many application nodes need access to the data?

Storage Options

When deploying your cluster, you can choose to use either Google Persistent Disk or Local SSD.

Persistent Disk and Local SSD

You should use Persistent Disk for most deployments. There are options when selecting Google Persistent Disk. Use Local SSD for performance critical applications. Local SSD is ephemeral storage, so you get high performance though you need to use IBM Storage Scale replication for reliability.

Disk Type

Description

pd-standard

(HDD) is best for capacity

pd-balanced

Best for bandwidth

pd-ssd

Best for iops

pd-extreme

IBM Storage Scale optimizes the use of pd-balanced and pd-ssd such that pd-extreme is not commonly needed.

hyperdisk-balanced

Google's newest generation of network block storage

hyperdisk-throughput

Google's newest generation of network block storage

hyperdisk-extreme

Google's newest generation of network block storage

local-ssd

Best for IOPS and Bandwidth, though since it is not persistent it requires the use of Storage Scale replication.

Choosing the Right Storage

Sycomp recommends that you use Persistent Disk unless you need high-performance and density for ephemeral data. Local SSD can provide up to 9.3 GB/sec per VM or read throughput (as of February 2024) and 4.6 GB/sec write which becomes 2.3GB/ write with Storage Scale replication enabled (because the data is written in two places at once) whereas with Persistent Disk you can achieve high IOPS with a throughput of 4.8GB/sec per VM.

Persistent Disks are capable of a good number of IOPs and larger capacity than Local SSD and can be used on VMs that are shut down when not in use.

Networking

You can deploy your cluster with egress speeds of up to 200Gbps per VM. When selecting a network, it is best to match the NSD server networking with the storage performance (Persistent Disk Performance, Local SSD performance) available to the machine. For example, for Local SSD storage you can use a 75Gbit/s network interface to match the storage performance whereas with Persistent Disk SSD you can use a 32 Gbit/s egress option.

With Storage Scale client nodes choose a network interface speed that matches your application requirements. You can do this because a Storage Scale client accesses data in parallel from all the NSD Servers so a client can utilize larger network bandwidth than a single storage server can provide.

The network topology that is used by Sycomp Storage includes a frontend network for client data traffic and a backend network for internal cluster traffic. The VPC is deployed within a single GCP project. Sycomp Storage can create the VPC or you can choose to use an existing VPC.

NFS Architecture Options

There are two ways to deploy NFS servers using Sycomp Storage. You can place the NFS servers on the same virtual machines (VM) as the NSD servers or you can place the NFS servers on separate VMs. When designing your architecture consider:

The type of storage you are using
The size of your deployment
The performance you need from NFS and the Storage Scale (NSD) client.

If the goal is to get the best IO throughput, which storage type you choose can determine the most cost-effective cluster architecture.

Single-Tier NFS

A single tier architecture is beneficial when you deploy small clusters, or the speed of the storage justifies deploying a high-performance machine type.

Using a storage type of Local SSD a single Google VM can read up to 9GB/s which requires at least 75Gbit egress tier-1 networking to reach full performance. This type of networking comes in machine types with a large amount of memory and a large number of vCPU cores. In this case the NFS server can utilize the additional memory and cores, optimizing the deployment for cost performance. When running an NFS server on the same VM as an NSD server, some of the VM network bandwidth is consumed by NSD server traffic in addition to the NFS client traffic. This is fine if the resulting NFS client bandwidth meets your requirements, so consider this when choosing a machine and storage types.

Benefits of a single tier NFS server architecture.

Simplifies small deployments
NSD servers don’t require a large amount of memory or number of CPU cores so NFS service can utilize the extra CPU and memory available when high performance networking is used.
Good if the required number of NFS and NSD Servers are similar.

Two-Tier NFS

A two-tier NFS architecture is beneficial when you have large clusters of NSD servers and need fewer NFS servers.

For example, a Google VM using Persistent Disk can achieve approximately 1.2 GB/sec (as of Feb. 2023) therefore, to read data at 300GB/sec at least 250 NSD server VMs are required. These VMs do not need to be large or have high egress rates, so an n2-standard-16 VM is sufficient to support that throughput. While a single NFS server VM with 75Gbit tier_1 egress can provide more than 9GB per second. In this case separating out the NFS servers VMs can help because you need far fewer NFS servers to reach your desired throughput.

Benefits of two tier NFS server architecture

Reduce compute costs with many NSD server VMs
Use high memory nodes for NFS servers to improve caching.
Scale the number of NFS servers independent of the number of NSD servers to optimize cost.

Should I use NFS or Storage Scale client?

You can use NFS or the Storage Scale NSD client to access the data in the Storage Scale file system. Which one is best depends on your use case.

Table 1: Compare when to use Storage Scale client vs NFS for data access.

Storage Scale Client

NFS

Client Throughput

Best

Good

Client IOPS

Best

Good

Access from non-cloud client

Good

Best

Multi-client concurrent file access

Best

Good

Performance

We ran NFS and NSD protocol read performance tests using the IOR benchmarking tool. We tested two cluster architectures one using Google persistent disk and the other with local SSD.

Test Cluster Configuration

Extreme Persistent Disk

Local SSD

Number of NSD Servers

128

Number of NFS Servers

Number of Test Clients

256

150

NSD Server Machine Type

n2-standard-64

n2-standard-80

NFS Server Machine Type

n2-standard-80

Client Machine Type

n2-standard-16

n2-standard-48

Results

Persistent Disk

Local SSD

NFS Read (MiB/sec)

480,000

328,976

NSD Read (MiB/sec)

519,589

333,150

Validating Your Design Using Sycomp RISE

It is often not possible to deploy your entire application to test a new environment. Sycomp RISE was developed to help you validate the IO performance of a new deployment for an existing application without having to deploy your entire application. You can use Sycomp RISE to test IO performance against a file namespace that is identical in structure and file size (not file data) to your environment. This allows you to test IO performance in a directory structure the same as your current application environment on a new deployment. This application specific validation of a new environment gives you a way to test before you migrate a production application, which can greatly reduce risk.

If you have questions about RISE contact Sycomp at scaleongcp@sycomp.com.

Why Sycomp Storage?

Sycomp Storage Advantages

Easily deploy IBM Storage Scale optimized for your workload.
Performance tuning best practices automatically applied.
Access to technical advisors throughout the subscription term for monitoring and advice for your ever-changing workloads.
Upgrade support for new versions of Sycomp Storage including upgrade testing to prevent issues.
Seamless bidirectional data mobility from on-premises to cloud and cloud to cloud.
Simplified deployment and maintenance via Sycomp developed and maintained tools.
Fully managed option for Sycomp Storage.
AFM Gateway configuration and tuning

Sycomp Storage vs. Other Solutions

Sycomp storage implements IBM Storage Scale, the industry leading clustered file system. There are many other file system solutions but none of them provide the range of features and time in the field of IBM Storage Scale.

Storage Scale features include:

Web based administration and monitoring.
Advanced storage features including directory level snapshots, file cloning, immutability, encryption, and data replication. These features are included with no additional software license fees.
On demand asynchronous data archive and retrieval from another Storage Scale cluster, an NFS server or public cloud object stores with Active File Management (AFM).

Sycomp Storage enhances Storage Scale

Deploy a cluster in minutes.
Expand a cluster in minutes, add storage or VMs.
Pay as you go pricing.
Google cloud integration
Choose the Google machine type and storage types that meet you capacity, performance and budget requirements.

About Sycomp

Sycomp A Technology Company, Inc.

Sycomp is a global IT services and logistics provider with extensive expertise in cloud, data center, endpoint management and security solutions. Sycomp’s diverse team of consultants and engineers deliver on the company’s mission to tackle challenging global IT projects through its state-of-the-art integration and warehouse centers and global technology partnerships. Headquartered in the heart of Silicon Valley, California, Sycomp has successfully shipped, deployed and managed complex IT projects and supporting assets in more than 150 countries helping its Fortune 500 customers and global partners realize a world without boundaries. Visit sycomp.com for more information.

For more information, contact hpc@sycomp.com to set up a technical overview.

Design Guidelines

Evaluate the available storage options
Size your Sycomp Storage cluster.

Workload requirements

How much storage capacity do you need today? In a year?
How many I/O operations per second (IOPS) or throughput in GBytes per second do you need? Do you need additional capacity to achieve the performance target?
Will data need to be moved between on-premises and the cloud?
Is there existing data in a Google Cloud Storage bucket that needs to be moved into the cluster?
Do you want to schedule data movement from one storage type to another? For example, migrate a file in a file system to a Google Cloud bucket.
Do you need persistent storage, scratch storage, or both?
Do you have a backup software vendor that you use?
Do you have a DR strategy?
How many application nodes need access to the data?

Storage Options

When deploying your cluster, you can choose to use either Google Persistent Disk or Local SSD.

Persistent Disk and Local SSD

Disk Type

Description

pd-standard

(HDD) is best for capacity

pd-balanced

Best for bandwidth

pd-ssd

Best for iops

pd-extreme

IBM Storage Scale optimizes the use of pd-balanced and pd-ssd such that pd-extreme is not commonly needed.

hyperdisk-balanced

Google's newest generation of network block storage

hyperdisk-throughput

Google's newest generation of network block storage

hyperdisk-extreme

Google's newest generation of network block storage

local-ssd

Best for IOPS and Bandwidth, though since it is not persistent it requires the use of Storage Scale replication.

Choosing the Right Storage

Persistent Disks are capable of a good number of IOPs and larger capacity than Local SSD and can be used on VMs that are shut down when not in use.

Networking

NFS Architecture Options

The type of storage you are using
The size of your deployment
The performance you need from NFS and the Storage Scale (NSD) client.

If the goal is to get the best IO throughput, which storage type you choose can determine the most cost-effective cluster architecture.

Single-Tier NFS

A single tier architecture is beneficial when you deploy small clusters, or the speed of the storage justifies deploying a high-performance machine type.

Benefits of a single tier NFS server architecture.

Simplifies small deployments
NSD servers don’t require a large amount of memory or number of CPU cores so NFS service can utilize the extra CPU and memory available when high performance networking is used.
Good if the required number of NFS and NSD Servers are similar.

Two-Tier NFS

A two-tier NFS architecture is beneficial when you have large clusters of NSD servers and need fewer NFS servers.

Benefits of two tier NFS server architecture

Reduce compute costs with many NSD server VMs
Use high memory nodes for NFS servers to improve caching.
Scale the number of NFS servers independent of the number of NSD servers to optimize cost.

Should I use NFS or Storage Scale client?

You can use NFS or the Storage Scale NSD client to access the data in the Storage Scale file system. Which one is best depends on your use case.

Table 1: Compare when to use Storage Scale client vs NFS for data access.

Storage Scale Client

NFS

Client Throughput

Best

Good

Client IOPS

Best

Good

Access from non-cloud client

Good

Best

Multi-client concurrent file access

Best

Good

Performance

We ran NFS and NSD protocol read performance tests using the IOR benchmarking tool. We tested two cluster architectures one using Google persistent disk and the other with local SSD.

Test Cluster Configuration

Extreme Persistent Disk

Local SSD

Number of NSD Servers

128

Number of NFS Servers

Number of Test Clients

256

150

NSD Server Machine Type

n2-standard-64

n2-standard-80

NFS Server Machine Type

n2-standard-80

Client Machine Type

n2-standard-16

n2-standard-48

Results

Persistent Disk

Local SSD

NFS Read (MiB/sec)

480,000

328,976

NSD Read (MiB/sec)

519,589

333,150

Validating Your Design Using Sycomp RISE

If you have questions about RISE contact Sycomp at scaleongcp@sycomp.com.

Deployment Models

Different ways you can deploy a Sycomp Storage cluster

Sycomp Storage

The VMs running IBM Storage Scale Native Client and the VMs using NFS can concurrently access the data stored in the Sycomp Storage cluster. All resources are deployed within a single project.

Local SSD

You can reduce the number of Network Shared Disk (NSD) server VMs needed to reach a performance target by using Google Local SSD.

Application cluster

Figure: Example of an application cluster, once deployed this cluster is attached to a Sycomp Storage cluster containing a file system.

A Storage Scale application cluster accesses an existing file system using Storage Scale multi-cluster. Sycomp Storage tools make it easy to maintain application clusters.

File System Encryption

Your deployment can be enhanced by enabling data encryption with an HA pair of VMs running IBM Security Guardium Key Lifecycle Manager (GKLM) to manage and protect your encryption keys.

Sycomp Storage automates the deployment of an HA pair of GKLM server VMs.

Cluster Expansion

Intelligent Data Caching

Data on demand - File metadata and data is copied into the Sycomp Storage file system when a file is accessed. A directory listing in the Sycomp Storage file system fetches the file metadata, and a file open initiates a copy of the file from the remote NFS or Object store.
Data Prefetch - To keep from wasting GPU time you can ask Sycomp Storage to prefetch a set of data before your compute job starts. You can prefetch a directory tree or provide a custom list of files.
Read-only - You can create a read-only copy of the source data or a read-only copy that allows local modifications but never pushes the changes back to the source.
Writable - You can create a data copy that pushes changes made in the Sycomp Storage file system back to the NFS or Object source.
Many-to-one - You can have Sycomp Storage clusters deployed around the world accessing the same object bucket. This allows you to run your job where there is available hardware.
Configurable - You decide how often to check for source updates and how long to wait before pushing changes.

Integration with Google Cloud Storage

Moving data to and from Google Cloud storage opens a wide range of options that can reduce storage costs, enable data sharing across applications and support bursting workloads to the cloud.

Hybrid Deployment

You can integrate your on-premises systems with the Google public cloud with bi-directional data flow.

Automated data movement can be integrated with a job scheduler like Slurm or IBM Spectrum LSF to ensure data is where you need it so you do not pay for idle cloud compute time.