Affiliation:
1. Florida International University, Miami, FL
2. Samsung Semiconductor Inc., Mountain View, CA
3. Northeastern University, Boston, MA
Abstract
The demand for high performance I/O in Storage-as-a-Service (SaaS) is increasing day by day. To address this demand, NAND Flash-based Solid-state Drives (SSDs) are commonly used in data centers as cache- or top-tiers in the storage rack ascribe to their superior performance compared to traditional hard disk drives (HDDs). Meanwhile, with the capital expenditure of SSDs declining and the storage capacity of SSDs increasing, all-flash data centers are evolving to serve cloud services better than SSD-HDD hybrid data centers. During this transition, the biggest challenge is how to reduce the Write Amplification Factor (WAF) as well as to improve the endurance of SSD since this device has a limited program/erase cycles. A specified case is that storing data with different lifetimes (i.e., I/O streams with similar temporal fetching patterns such as reaccess frequency) in one single SSD can cause high WAF, reduce the endurance, and downgrade the performance of SSDs. Motivated by this,
multi-stream
SSDs have been developed to enable data with a different lifetime to be stored in different SSD regions. The logic behind this is to reduce the internal movement of data—when garbage collection is triggered, there are high chances of having data blocks with either all the pages being invalid or valid. However, the limitation of this technology is that the system needs to manually assign the same
streamID
to data with a similar lifetime. Unfortunately, when data arrives, it is not known how important this data is and how long this data will stay unmodified. Moreover, according to our observation, with different definitions of a lifetime (i.e., different calculation formulas based on selected features previously exhibited by data, such as sequentiality, and frequency),
streamID
identification may have varying impacts on the final WAF of multi-stream SSDs. Thus, in this article, we first develop a portable and adaptable framework to study the impacts of different workload features and their combinations on write amplification. We then propose a feature-based stream identification approach, which automatically co-relates the measurable workload attributes (such as I/O size, I/O rate, and so on.) with high-level workload features (such as frequency, sequentiality, and so on.) and determines a right combination of workload features for assigning
streamIDs
. Finally, we develop an adaptable stream assignment technique to assign
streamID
for changing workloads dynamically. Our evaluation results show that our automation approach of stream detection and separation can effectively reduce the WAF by using appropriate features for stream assignment with minimal implementation overhead.
Funder
National Science Foundation
National Science Foundation Career
Samsung Semiconductor Inc. Research Grant
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture
Reference43 articles.
1. Multi-Stream Technology. 2020. Retrieved 15 March 2020 from http://www.samsung.com/semiconductor/insights/article/25465/multistream.
2. Performance and Endurance Enhancements with Multi-stream SSDs on Apache Cassandra. 2020. Retrieved 27 Jan. 2020 from https://www.samsung.com/semiconductor/global.semi.static/Multi-stream_Cassandra_Whitepaper_Final-0.pdf.
3. systemd. 2020. Retrieved 18 Dec. 2020 from http://manpages.ubuntu.com/manpages/bionic/man1/systemd.1.html.
4. UMass Trace Repository. 2020. Retrieved 18 Dec. 2020 from http://traces.cs.umass.edu/index.php/Storage/Storage.
5. (accessed January 13, 2017). SNIA Iotta Repository. Retrieved from http://iotta.snia.org/historical_section.