Coping with Data Growth in Radiology – Turning to the Cloud
By R. Todd Thomas, CIO, Austin Radiological Association
As published in Healthcare Tech Outlook, August 2015
Austin Radiology Association (ARA) is no novice when it comes to medical image storage. Our firm, which is based in Austin, TX, has not only stored digital images since 2001, but has also offered image storage services to other healthcare providers for the past 12 years. Today, our more than 90 radiologists read and store the vast majority of central Texas’ imaging studies, which means that we add about 900,000 radiological studies annually to our archive. So it’s been challenging to ensure that our infrastructure can keep up with the rate of storage growth.
When we first started using digital images in 2001, we went big, digitizing everything except mammography and storing images on our first fiber channel storage area network (SAN), which had a high performance array. By 2003, which was shortly after our first client came aboard, we already had to move into a larger array. It’s important to note that we made a decision early in this process not to archive studies to tape. Our radiologists felt that retrieval times were too slow and wanted all current studies and comparison studies available to them in under three seconds.
In 2004 we needed yet more space, so we implemented a lower-cost CAS (content addressable storage) array and implemented some ILM (information lifecycle management) strategies, moving images from our SAN over to CAS after 30 days, a process that was also replicated to our new secondary datacenter. By 2006 we had grown to 58 terabytes of usable storage under management.
In 2007, ARA decided to move into digital mammography, and our storage use exploded. We needed an array that could be quickly deployed and expanded, so we deployed a clustered storage solution for mammography, because of its ability to serve large filesets quickly. But by 2008 we were consuming space at 24 terabytes per year, a growth rate that necessitated that we move off of our CAS platform entirely. We expanded our clustered storage array and migrated 150 terabytes of data. Today we consume over 36 terabytes of imaging data annually.
Our data growth is accelerating primarily because file sizes are exploding. Moving into 3D breast imaging earlier this year, we found those files are 20 times larger than the 2D mammography files. If we kept with the current storage refresh cycle and considering growth projections, in 2018 we would face a 555 terabytes migration, and by 2022 we’d need to migrate an additional perabytes. By 2024, we are projected to ingest 3 perabytes annually on mammography alone.
It is clear that simply adding nodes won’t do the trick, because eventually they won’t be backwards compatible. At some point, we will need to do a forklift upgrade of our entire archive. It took us 10 months to migrate 150 terabytes; I can’t imagine how long it would take to migrate a petabyte or more.
After some research, we identified three alternatives to the classic buy-expand-retire-buy-migrate cycle:
► Outsource the entire archive: Companies like Dell will provide the hardware, software, and services to run the imaging archive. This solution typically consists of a hardware array on premise, replicating to a similar array at an offsite location.
► Software-defined storage: The customer provides the disk shelves and installs software that transforms those shelves into a scaleout storage archive. Because IT controls the hardware, the customers decide when a forklift upgrade is needed.
► The cloud: These solutions provide the lowest amount of management overhead, but also require more “homework.” But storing in the cloud means no more image migrations–just pay for additional storage when it’s
needed. This option was the most intriguing to us.
Cloud offerings from public clouds like Amazon and Microsoft have a fairly complicated array of charges, and dealing directly with these cloud providers is challenging, both technically (they run on an object store architecture, which doesn’t handle files well at all), and operationally (it’s difficult to get hold of a human being for support). So we looked into file storage services that leverage public cloud. These services use dedicated hardware appliances to deliver performance to the data center while leveraging the cloud for scale and the global synchronization of files.
Ultimately, we chose Nasuni Cloud NAS. Nasuni provides a simple bill in which we pay them for usable storage: backups are included, but we aren’t charged for them. Before moving into production we needed to ensure that the performance of Nasuni would match the incumbent. When images were stored in the cache Nasuni performed 21 percent faster than the incumbent transferring the first image and 7 percent faster delivering a complete study. When the images were not stored in the cache and needed to be pulled from the cloud, performance was, of course, slower: 49 percent slower than the 3-second baseline for the first image, and 8.5 percent slower than the incumbent to deliver a complete study. On average it took just a few seconds more to pull an entire case from the cloud.
Given that nearly all recent studies are stored in the cache and the additional cost savings, management and scale benefits that Nasuni provides us, the business people feel it’s worth it, though the radiologists are still hesitant to buy off on waiting those few extra seconds per case. There are networking solutions that allow IT to create layer 2 connections directly to cloud-storage-provider datacenters, reducing those few extra seconds, but those have associated monthly charges. Until we shrink that delay to an acceptable level, we have migrated only our mammography data–2D and 3D–to the cloud. In any case, it’s clear that the cloud is the future for the storage of digital medical images. There’s no other way we could scale efficiently enough to cope with data growth and not break the bank. And there are other advantages as well with a service like Nasuni: a consolidated, simple monthly bill that charges only for usable storage; automatic data protection; centralized management; and one throat to choke when we encounter issues.