Cloud Storage Acceleration

One of the key advantages of Cloud Storage is the ability to access Cloud buckets from any location around the globe. However, this causes a particular problem for most cloud products – slow communication lines and distance being the chief culprits. BridgeSTOR has developed technology that sends data fast and efficiently into the Cloud.

Asynchronous I/O

Although most operating systems today use advanced caching methods, few have asynchronous I/O. Asynchronous I/O allows buffers to be sent up in parallel to a target device. This means Coronado V-NAS Access Points may open multiple connections to Cloud Storage. When a buffer has completed, it may start data reduction functions which require time consuming algorithms. However, while that is occurring another buffer may be accepting additional I/O. When the buffer has completed its complicated algorithms it is then sent off to Cloud Storage. In high speed environments it may be possible for multiple buffers to be sent to Cloud Storage in parallel, allowing for high speed cloud ingest.

POST Processing

Coronado V-NAS Access Points provide optional post processing. Post processing is ideal for backup situations where a read/write cache is implemented to get data quickly off of servers during a backup period. When files are transferred into Coronado they are sent to the local read/write cache and properly logged. At a user-defined time the log will be processed by a background task and the data will be moved into the cloud. If a Coronado V-NAS Global View Manager is utilized and a file is written into the cache, the file name will also be written to the Coronado V-NAS Global View Manager, reserving the file from being used by other Coronado V-NAS Global Access Points. If any other Coronado V-NAS Global Access Points try to access the file, they will be given an access error. Of course the Coronado V-NAS Global Access Point that wrote the file will still have full access to the file. For example, if a file is deleted it will be removed and the name made available for other users. Once the file is properly entered into the system, other Coronado V-NAS Global Access Points will be able to pull down the data.

Global Single File Instance

Another way to solve high performance challenges is with single file instance. But how is all the single file instance data kept in sync? BridgeSTOR solves this problem utilizing a feature called Global Single File Instance, which allows all global locations to maintain a single file instance and compress data into a common cloud bucket. When Global Single File Instance is enabled, CSFS utilizes a Cloud Hash Table that resides in the Coronado V-NAS Global View Manager and may be located in the Cloud or a local clustered VM data store that is specifically assigned to a single Cloud bucket. CSFS communicates with the Coronado V-NAS Global View Manager Hash Table utilizing a standard TCP socket interface designed to be friendly with firewalls. The Coronado V-NAS Global View Manager then brokers hash lookups and the allocation of common metadata so all file systems are capable of storing and retrieving shared files in and out of a Cloud bucket. For instance, a user in New York could write a file in single file instance form to an Amazon Bucket and if a user in Chicago were to send the same file, only the metadata (not the data) would be sent to the Cloud – resulting in a dramatic speed improvement.

How our Global Single File Instance Works

BridgeSTOR’s single file instance will also work on sub files. This means that even if a small portion of a file was a duplicate, that portion would not be sent into the Cloud.

With CSFS Sub-File Block Single File Instance, data blocks are “fingerprinted” using a hashing algorithm (SHA-1) that produces a unique, “shorthand” identifier for each data block. These unique fingerprints along with the blocks of data that produced them are indexed, compressed and optionally encrypted and then retained. Duplicate copies of data that have previously been fingerprinted are reduced to a single file instance, leaving only a single instance of each unique data block along with its corresponding fingerprint.

Once the block fingerprint value has been calculated, the single file instance engine has to compare the fingerprint against all the other fingerprints that have previously been generated to see whether this block is unique (new) or has been processed previously (a duplicate). It is the speed at which these index search and update operations are performed that is at the heart of a data reduction system’s throughput. All new fingerprints along with their corresponding full data representations are transferred to the Cloud in a compressed and optionally encrypted form, allowing WAN optimization in-flight and storage optimization in the Cloud.

CSFS allows the Hash Table to be memory resident. The amount of memory required to hold the hash table is based on the amount of physical capacity being used and the single file instance block size. The Hash Table may also be swapped to disk, which does produce higher latency but will allow small memory footprints.

CSFS Sub-File Block Single File Instance also has special logic for optimizing virtual machine images by mathematically aligning data into a CSFS single file instance block. This special algorithm for virtual machines adds little latency to the system and guarantees all blocks will be properly reduced to a single instance. CSFS automatically invokes this optimization to maximize the data reduction of virtual machine images.

When an enterprise establishes a common repository for shared files in the Cloud across multiple locations, the ability to optimize that Cloud Storage via global single file instance is a virtual “no brainer”, yet most “Cloud Storage Gateways” do not offer this feature. Leveraging CSFS Global Single File Instance, Coronado V-NAS Access Points offer significant savings in Cloud Storage capacity and money for organizations that need to share and collaborate on files across multiple locations.