LWN: Comments on "Issues around discard" https://lwn.net/Articles/787272/ This is a special feed containing comments posted to the individual LWN article titled "Issues around discard". en-us Fri, 03 Oct 2025 01:01:04 +0000 Fri, 03 Oct 2025 01:01:04 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Issues around discard https://lwn.net/Articles/788600/ https://lwn.net/Articles/788600/ zdzichu <div class="FormattedComment"> For firmware, I think Linux Vendor Firmware Service (<a href="https://fwupd.org/">https://fwupd.org/</a>) can be used to update SSDs firmware under Linux.<br> </div> Thu, 16 May 2019 05:40:33 +0000 Issues around discard https://lwn.net/Articles/788598/ https://lwn.net/Articles/788598/ scientes <div class="FormattedComment"> The problem is that the firmware is non-free software. I've had SSDs that just fail with TRIM for example, and you are given a windows-only update tool with a binary blob firmware. I just threw it out and decided to never buy from the company again.<br> <p> These firmware run ARM M and they should be free software.<br> <p> There is a project in this direction: <a rel="nofollow" href="http://www.openssd.io/">http://www.openssd.io/</a><br> </div> Thu, 16 May 2019 04:05:07 +0000 Issues around discard https://lwn.net/Articles/788322/ https://lwn.net/Articles/788322/ roblucid <div class="FormattedComment"> May be the kernel needs to give user space the ability to give it some hints, so something like a drivecap file plus a utility which then configures the kernel on drive characterisics policy? This is how BSD used to tune to HDD's, when sectors per cylinder mattered (way before HDDs switched to LBA addressing).<br> <p> Then SSD vendors (or large customers) can characterise how their drive is expected to be used, LBA reuse, discard penalties, phantom discards and the like. You might even be able to tune for service life, with user space logging expected degraded performance.<br> </div> Tue, 14 May 2019 08:57:04 +0000 We should not look at discard as an uniform feature in the first place https://lwn.net/Articles/788313/ https://lwn.net/Articles/788313/ hmh <div class="FormattedComment"> It entirely depends on the competence of the vendor that wrote the device firmware, and that whomever designed the queue protocol was not crazy enough to forget about write collisions between queues.<br> <p> A discard really is just a write as far as ordering and races/collisions go.<br> </div> Tue, 14 May 2019 02:23:25 +0000 We should not look at discard as an uniform feature in the first place https://lwn.net/Articles/788204/ https://lwn.net/Articles/788204/ Fowl <div class="FormattedComment"> What happens if a write and a discard race?<br> </div> Mon, 13 May 2019 12:39:01 +0000 There are two very different types of TRIM command https://lwn.net/Articles/788038/ https://lwn.net/Articles/788038/ GoodMirek <div class="FormattedComment"> In my eyes, there is a significant difference between write and discard.<br> If a write fails it can cause a data loss, which is a critical issue and therefore requires transaction safety.<br> If a discard fails, the worst thing that can happen is some performance and wear deterioration, which is negligible issue.<br> </div> Fri, 10 May 2019 13:20:28 +0000 it is the O_PONIES issue again! https://lwn.net/Articles/788008/ https://lwn.net/Articles/788008/ miquels I have several SSDs in production that have written not 700 TB, but 7600 TB. In 4 years' time. Datacentre SSDs FTW :) <Pre> === START OF INFORMATION SECTION === Device Model: Samsung SSD 845DC PRO 800GB User Capacity: 800,166,076,416 bytes [800 GB] Sector Size: 512 bytes logical/physical SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 5 Reallocated_Sector_Ct PO--CK 099 099 010 - 3 9 Power_On_Hours -O--CK 093 093 000 - 34281 12 Power_Cycle_Count -O--CK 099 099 000 - 4 177 Wear_Leveling_Count PO--C- 076 076 005 - 9158 179 Used_Rsvd_Blk_Cnt_Tot PO--C- 099 099 010 - 3 180 Unused_Rsvd_Blk_Cnt_Tot PO--C- 099 099 010 - 7037 241 Total_LBAs_Written -O--CK 094 094 000 - 16410885339592 242 Total_LBAs_Read -O--CK 097 097 000 - 7734700749043 250 Read_Error_Retry_Rate -O--CK 100 100 001 - 0 </pre> Fri, 10 May 2019 00:01:13 +0000 There are two very different types of TRIM command https://lwn.net/Articles/787813/ https://lwn.net/Articles/787813/ masoncl <div class="FormattedComment"> "That's why mentioning that there are both blocking and nonblocking TRIMs matters so much, because if non-blocking TRIM is available it effectively works like a write as to queueing too, thus there is very little to be gained from backgrounding TRIMs like XFS does. Apart from love of overcomplexity, which seems rather common in the design of XFS."<br> <p> Actual queueing support for discards does change the math a bit, but the fundamental impact on latency of other operations is still a problem. Sometimes it's worse because you're just allowing the device to saturate itself with slow operations.<br> <p> The XFS async trim implementation is pretty reasonable, and it can be a big win in some workloads. Basically anything that gets pushed out of the critical section of the transaction commit can have a huge impact on performance. The major thing it's missing is a way to throttle new deletes from creating a never ending stream of discards, but I don't think any of the filesystems are doing that yet.<br> </div> Wed, 08 May 2019 15:26:04 +0000 Issues around discard https://lwn.net/Articles/787789/ https://lwn.net/Articles/787789/ nilsmeyer <div class="FormattedComment"> There are a lot of websites doing hardware testing already, however few of them do testing with Linux (Phoronix and Servethehome come to mind). I think they could and should add discard testing, with phoronix at least the procedure is somewhat standardized. Another test I would really like is fsync() performance, since that shows you the actual, durable write performance of the drive. <br> </div> Wed, 08 May 2019 06:41:30 +0000 There are two very different types of TRIM command https://lwn.net/Articles/787774/ https://lwn.net/Articles/787774/ walex <blockquote><p>“XFS will allow the commit to finish and let the trims continue floating down in the background”</p></blockquote> <p>Indeed, but a discard is pretty much like a write, so it could be handled the same way, wo why would XFS do that complicated stuff? The obvious reason is that if discarding is handled synchronously, like Btrfs and <tt>ext4</tt> do, then issuing blocking (non queued) TRIM can cause long freezes, which is indeed why many people don't use the <tt>discard</tt> mount option but use <tt>fstrim</tt> every now and then at quiet times.</p> <p>That's why mentioning that there are both blocking and nonblocking TRIMs matters so much, because if non-blocking TRIM is available it effectively works like a write as to queueing too, thus there is very little to be gained from backgrounding TRIMs like XFS does. Apart from love of overcomplexity, which seems rather common in the design of XFS.</p> <p>Put another way pretty much the entire TRIM debate has been caused by the predominance of blocking TRIM in the SATA installed base of consumer flash SSDs (and the other minor reason has been the numerous TRIM related bugs in many models of flash SSDs, which are just part the numerous bugs of many models of flash SSDs).</p> Tue, 07 May 2019 23:44:04 +0000 it is the O_PONIES issue again! https://lwn.net/Articles/787764/ https://lwn.net/Articles/787764/ walex <blockquote><p>«"How the hell did you write SEVEN HUNDRED TERABYTES to this drive in two years‽"</p> <p>It was the Ceph journal drive.»</p> </blockquote> <p>And that is also because the 950 EVO does not have a persistent (supercapacitor backed) cache, and thus all 700 TB will have hit the flash chips, even if a lot of it probably was just ephemeral. Anyhow using the 950 EVO as a Ceph journal device, especially with that high rate of journaling (38GB/hour), probably cost a lot in latency to Ceph.</p> Tue, 07 May 2019 21:23:14 +0000 There are two very different types of TRIM command https://lwn.net/Articles/787763/ https://lwn.net/Articles/787763/ masoncl <div class="FormattedComment"> I was talking about async discards in a slightly different context. Btrfs and ext4 will block transaction commit until we've finished trimming the things we deleted during the transaction. XFS will allow the commit to finish and let the trims continue floating down in the background, while making sure not to reuse the blocks until the trim is done.<br> <p> Depending on the device, the async approach can be much faster, but it can also lead to a very large queue of discards, without any way for the application to wait for completion.<br> </div> Tue, 07 May 2019 21:19:42 +0000 it is the O_PONIES issue again! https://lwn.net/Articles/787738/ https://lwn.net/Articles/787738/ naptastic <div class="FormattedComment"> I inherited a Samsung 950 Evo after it was retired from service after ~2 years. Once it was installed, I checked the smart data. Couldn't believe the "Total LBAs written" number.<br> <p> "How the hell did you write SEVEN HUNDRED TERABYTES to this drive in two years‽"<br> <p> It was the Ceph journal drive.<br> </div> Tue, 07 May 2019 17:11:14 +0000 There are two very different types of TRIM command https://lwn.net/Articles/787736/ https://lwn.net/Articles/787736/ walex <blockquote><p>“XFS does discard asynchronously, while ext4 and Btrfs do it synchronously.“</p></blockquote> <p>The discussion throughout the article and here is made less useful by a vital omission: there is no mention that the first edition of the TRIM command for SATA was "blocking" ("synchronous"), but there is now a variant that is non-blocking ("asynchronous").</p> <p>Essentially all the problems reported with 'discard' are due to the use of the first "blocking" variant, which unfortunately is the only one that has been implemented on most of the SATA flash SSD installed base so far. <a href="https://en.wikipedia.org/wiki/Trim_(computing)#Shortcomings">Wikipedia says</a>: <blockquote><p>“The original version of the TRIM command has been defined as a non-queued command by the T13 subcommittee, and consequently can incur massive execution penalty if used carelessly, e.g., if sent after each filesystem delete command. The non-queued nature of the command requires the driver to first wait for all outstanding commands to be finished, issue the TRIM command, then resume normal commands.”</p></blockquote> <p>SAS/SCSI and NVME have similar commands with different semantics, I particularly like the "write zeroes" command of NVME.</p> Tue, 07 May 2019 16:43:19 +0000 it is the O_PONIES issue again! https://lwn.net/Articles/787735/ https://lwn.net/Articles/787735/ walex <blockquote><p>“but the FTL can take an exorbitant amount of time when gigabytes of files are deleted; read and write performance can be affected.”</p></blockquote> <p>This is alluded to in the text by C Mason and others, but that is typical of devices that don't have a supercapacitor-backed cache/buffer: they must commit every delete to flash.</p> <p>So called "enteprise" devices have supercapacitor backed caches, and can do deletes (and random writes) a lot faster. The situation is rather similar to RAID host-adapters with a cache, where a BBU makes a huge difference.</p> <p>It is the famous <tt>O_PONIES</tt> and <q>eternal september</q> issue that never goes away, because every year there is a new batch of newbie sysadms and programmers who don't get persistence and caches, and just want <tt>O_PONIES</tt>.</p> <p>People familiar with using SSDs for journaling in a Ceph storage layer know how enormous the difference made by having a supercapacitor backed SSD cache...</p> Tue, 07 May 2019 16:27:44 +0000 Issues around discard https://lwn.net/Articles/787697/ https://lwn.net/Articles/787697/ shentino <div class="FormattedComment"> Discard isn't just for SSDs.<br> <p> In essence, discard is at this point a fundamental storage operation just like reads and writes.<br> <p> LVM thin pools for example use high level discards as cues to deallocate committed pool space, which may well provoke the thin pool itself to cascade discards to its own storage.<br> <p> It's also used in virtualization.<br> <p> VMs that issue discards to virtual block devices can likewise provoke the hypervisor into deallocating storage or space occupied by whatever it stores on the host to back the device. A guest OS issuing discards to its virtual drives, even ones presented as "spinning rust", can help a hypervisor optimize how it manages the storage on the host.<br> <p> Discards are a big opportunity for higher layers like this to give lower layers housekeeping opportunities beyond just letting an SSD garbage collect.<br> <p> They should be liberally sent at every opportunity. If anything the overhead in managing them should encourage lower layers to take advantage of the information.<br> </div> Tue, 07 May 2019 13:52:58 +0000 We should not look at discard as an uniform feature in the first place https://lwn.net/Articles/787688/ https://lwn.net/Articles/787688/ hmh <div class="FormattedComment"> Sometimes it feels like the real use for "TRIM" ("discard") on FLASH-based, old-style-storage devices with advanced FTLs (i.e. SATA or SAS-attached SSDs) is being forgotten. It is there to *reduce needless copying of stale pages of data* by the SSD itself, i.e. to reduce the need for background block writes. It is not an speedy way to delete data blocks, or if it is, someone forgot to properly notify the device vendors about it -- it is a FLASH endurance saver.<br> <p> When you either TRIM or overwrite an LBA, the SSD gets the implied information that the old block is not going to be reused, and can be scheduled to be *erased*.<br> <p> OTOH, when a filesystem prefers to direct writes to a new LBA and TRIM/discard is never done on the old, now-freed blocks, those old blocks are going to be copied around by the SSD firmware to free up erase blocks (much like memory compaction tries to do to create huge pages). This wastes FLASH write circles, increases on-device fragmentation, and reduces the number of "erased and ready to be used" FLASH pages. It also eventually renders the SSD into the dreaded "slow as an old floppy drive" state.<br> <p> So, what "discard" is really useful for on [old-style non-NVMe?] SSDs is vastly different on why one would use "discard" on, e.g., a thin-provisioned volume. And it is *not* any less important.<br> </div> Tue, 07 May 2019 11:47:11 +0000 Issues around discard https://lwn.net/Articles/787680/ https://lwn.net/Articles/787680/ kdave <blockquote>Martin Petersen said. If performance was bad for a device, the maker could recommend mounting without enabling discard; if the kernel developers had simply consistently enabled discard, vendors would have fixed their devices by now.</blockquote> ... <strike>vendors would have fixed their devices</strike> users would simply bug filesystem developers until discard is off by default, vendors doing nothing. No matter how much I'd like this approach to work, it does not work as expected in practice. I think vendors respond to $$$ and big companies asking for things, but for example see where we are with the erase block size. From our view it is a simple thing yet there has been no change AFAIK with the answers ranging from "trade secret" to "you don't need to know". And I'm afraid this won't change. Tue, 07 May 2019 08:09:19 +0000 Issues around discard https://lwn.net/Articles/787620/ https://lwn.net/Articles/787620/ fuhchee <div class="FormattedComment"> Did the possibility of running online measurements of TRIM performance come up? The filesystem could learn the actual contemporaneous characteristics of various size TRIMs during lulls in operation.<br> </div> Mon, 06 May 2019 18:01:15 +0000