it is the O_PONIES issue again!

Posted May 7, 2019 16:27 UTC (Tue) by walex (guest, #69836)
Parent article: Issues around discard

“but the FTL can take an exorbitant amount of time when gigabytes of files are deleted; read and write performance can be affected.”

This is alluded to in the text by C Mason and others, but that is typical of devices that don't have a supercapacitor-backed cache/buffer: they must commit every delete to flash.

So called "enteprise" devices have supercapacitor backed caches, and can do deletes (and random writes) a lot faster. The situation is rather similar to RAID host-adapters with a cache, where a BBU makes a huge difference.

It is the famous O_PONIES and eternal september issue that never goes away, because every year there is a new batch of newbie sysadms and programmers who don't get persistence and caches, and just want O_PONIES.

People familiar with using SSDs for journaling in a Ceph storage layer know how enormous the difference made by having a supercapacitor backed SSD cache...

it is the O_PONIES issue again!

Posted May 7, 2019 17:11 UTC (Tue) by naptastic (guest, #60139) [Link] (2 responses)

I inherited a Samsung 950 Evo after it was retired from service after ~2 years. Once it was installed, I checked the smart data. Couldn't believe the "Total LBAs written" number.

"How the hell did you write SEVEN HUNDRED TERABYTES to this drive in two years‽"

It was the Ceph journal drive.

it is the O_PONIES issue again!

Posted May 7, 2019 21:23 UTC (Tue) by walex (guest, #69836) [Link]

«"How the hell did you write SEVEN HUNDRED TERABYTES to this drive in two years‽"

It was the Ceph journal drive.»

And that is also because the 950 EVO does not have a persistent (supercapacitor backed) cache, and thus all 700 TB will have hit the flash chips, even if a lot of it probably was just ephemeral. Anyhow using the 950 EVO as a Ceph journal device, especially with that high rate of journaling (38GB/hour), probably cost a lot in latency to Ceph.

it is the O_PONIES issue again!

Posted May 10, 2019 0:01 UTC (Fri) by miquels (guest, #59247) [Link]

I have several SSDs in production that have written not 700 TB, but 7600 TB. In 4 years' time. Datacentre SSDs FTW :)

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 845DC PRO 800GB
User Capacity:    800,166,076,416 bytes [800 GB]
Sector Size:      512 bytes logical/physical

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   099   099   010    -    3
  9 Power_On_Hours          -O--CK   093   093   000    -    34281
 12 Power_Cycle_Count       -O--CK   099   099   000    -    4
177 Wear_Leveling_Count     PO--C-   076   076   005    -    9158
179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   099   099   010    -    3
180 Unused_Rsvd_Blk_Cnt_Tot PO--C-   099   099   010    -    7037
241 Total_LBAs_Written      -O--CK   094   094   000    -    16410885339592
242 Total_LBAs_Read         -O--CK   097   097   000    -    7734700749043
250 Read_Error_Retry_Rate   -O--CK   100   100   001    -    0