|
|
Subscribe / Log in / New account

Go filesystems and file embedding

July 30, 2020

This article was contributed by Ben Hoyt

The Go team has recently published several draft designs that propose changes to the language, standard library, and tooling: we covered the one on generics back in June. Last week, the Go team published two draft designs related to files: one for a new read-only filesystem interface, which specifies a minimal interface for filesystems, and a second design that proposes a standard way to embed files into Go binaries (by building on the filesystem interface). Embedding files into Go binaries is intended to simplify deployments by including all of a program's resources in a single binary; the filesystem interface design was drafted primarily as a building block for that. There has been a lot of discussion on the draft designs, which has been generally positive, but there are some significant concerns.

Russ Cox, technical lead of the Go team, and Rob Pike, one of the creators of Go, are the authors of the design for the filesystem interface. Cox is also an author of the design for file embedding along with longtime Go contributor Brad Fitzpatrick. Additionally, Cox created YouTube video presentations of each design for those who prefer that format (the filesystem interface video and the file-embedding video). Both designs are quick to note that they are not (yet) formal proposals:

This is a Draft Design, not a formal Go proposal, because it describes a potential large change that addresses the same need as many third-party packages and could affect their implementations (hopefully by simplifying them!). The goal of circulating this draft design is to collect feedback to shape an intended eventual proposal.

Many smaller language and library changes are discussed on the GitHub issue tracker, but for these larger discussions the Go team is trying to use r/golang Reddit threads to scale the discussion — GitHub issues do not have any form of threading, so multiple conversations are hard to keep track of. There is a Reddit thread for each draft—the filesystem interface thread and the file-embedding thread—with quite a few comments on each. There is also a lengthy Hacker News thread that discusses the file-embedding design.

A filesystem interface

The crux of the filesystem interface design is a single-method interface named FS in a new io/fs standard library package:

    type FS interface {
        Open(name string) (File, error)
    }

This means that every filesystem implementation must at least implement the ability to open a file by name, returning a File as well as an error. The File interface is defined as follows:

    type File interface {
        Stat() (os.FileInfo, error)
        Read(buf []byte) (int, error)
        Close() error
    }

In other words, a file has the following characteristics: is able to provide file information like that returned from stat(), is able to be read, and can be closed. These are the bare minimum that a conforming filesystem needs to provide, but an implementation "may also provide other methods to optimize operations or add new functionality". The standard library's file type (os.File) already implements these three methods, so it is a conforming fs.File implementation.

If a File is actually a directory, the file information returned by Stat() will indicate that; in that case, the File returned from Open() must also implement the Readdir() method on top of the File interface. Readdir() returns a list of os.FileInfo objects representing the files inside the directory.

Filesystem implementations can expose additional functionality using what the design calls an "extension interface", which is an interface that "embeds a base interface and adds one or more extra methods, as a way of specifying optional functionality that may be provided by an instance of the base interface." For example, it is common to read a whole file at once, and for in-memory filesystem implementations, it may be inefficient to do this using Open(), multiple calls to Read(), and Close(). In cases like this, a developer could implement the ReadFile() method as defined in the ReadFileFS extension interface:

    type ReadFileFS interface {
        FS  // embed the filesystem interface (Open method)
        ReadFile(name string) ([]byte, error)
    }

Along with the extension interface, the design adds a ReadFile() helper function to the io/fs package that checks the filesystem for the ReadFileFS extension, and uses it if it exists, otherwise it falls back to performing the open/read/close sequence. There are various other extension interfaces defined in the draft proposal, including StatFS, ReadDirFS, and GlobFS. The design does not provide ways to rename or write files, but that could also be done using extensions.

In addition to the new io/fs types and helper functions, the design suggests changes to various standard library packages to make use of the new FS interface. For example, adding a ParseFS() method to the html/template package to allow parsing templates from an in-memory filesystem, or making the archive/zip package implement FS so that developers can treat a zip file as a filesystem and use it wherever FS is allowed.

Much of the feedback on the Reddit discussion has been positive, and it seems like an interface of this kind is something that developers want. However, one of the criticisms made by several people is about the drawbacks of extension interfaces. "Acln0" summarized the concerns:

I have only one observation to make, related to extension interfaces and the extension pattern. I am reminded of http.ResponseWriter and the optional interfaces the http package makes use of. Due to the existence of these optional interfaces, wrapping http.ResponseWriter is difficult. Doing it "generically" involves a combinatorial explosion of optional interfaces, and it's easy to go wrong in a way that looks like this: "we added status logging by wrapping http.ResponseWriter, and now HTTP/2 push doesn't work anymore, because our wrapper hides the Push method from the handlers downstream".

Peter Bourgon, a well-known Go blogger and speaker, believes that this use of extension interfaces means that it "becomes infeasible to use the (extremely useful) decorator pattern. That's really unfortunate. To me that makes the proposal almost a non-starter; the decorator pattern is too useful to break in this way." The decorator pattern wraps an interface and adds some functionality. It is often used for logging or authentication middleware in web servers; in the context of filesystems it would likely be used to add a caching or transformation layer. If a middleware author does not take into account the various optional interfaces, the resulting wrapper will not support them. Nick Craig-Wood, author of Rclone, a cloud-storage tool written in Go, likes the proposal but expressed similar concerns: "Extension (or optional as I usually call them) interfaces are a big maintenance burden - wrapping them is really hard".

The design states that "enabling that kind of middleware is a key goal for this draft design", so it would seem wise for the design's authors to tackle this problem head on. Cox hasn't yet proposed a solution, but acknowledged the issue: "It's true - there's definitely a tension here between extensions and wrappers. I haven't seen any perfect solutions for that.".

Another concern came from "TheSwedeheart" regarding contexts (the standard way in Go to explicitly propagate timeouts, cancellation signals, and request-scoped values down a call chain): "One thing I'm missing to migrate [his virtual filesystem] over to this is support for propagating contexts to each operation, for cancellation.". Cox replied that a library author could "probably pass the context to a constructor that returns an FS with the context embedded in it, and then have that context apply to the calls being made with that specific FS." As "lobster_johnson" pointed out, this goes against the context package's guideline to explicitly pass context as the first function argument, not store a context inside a struct. However, Cox countered with an example of http.Request doing something similar: "Those are more guidelines than rules. [...] Sometimes it does make sense."

There are of course the usual bikeshedding threads that debate naming; "olegkovalov" said: "I'm somewhat scared about io/fs name, fs is a good variable name, it'll cause many troubles to the users when io/fs will appear". After some back-and-forth, Cox stressed the need for a short name to keep the focus on application developers rather than on the filesystem implementers:

You're focusing on the file system implementers instead of the users. Code referring to things like os.FileInfo, os.ModeDir, os.PathError, os.ErrNotExist will all now refer canonically to fs.FileInfo, fs.ModeDir, fs.PathError, fs.ErrNotExist. Those seem much better than, say, filesystem.ErrNotExist. And far more code will be referring to those names than implementing file systems.

Embedding files in binaries

The other draft design proposes a way to embed files (or "static assets") in Go binaries and read their contents at runtime. This simplifies releases and deployments, since developers can simply copy around a large binary with no external dependencies (for SQL snippets, HTML templates, CSS and JavaScript assets for a web application, and so on). As the document points out, there are already over a dozen third-party tools that can do this, but "adding direct support to the go command for the basic functionality of embedding will eliminate the need for some of these tools and at least simplify the implementation of others". Including embedding in the standard go tool will also mean there is no pre-build step to convert files to data in Go source code, and no need to commit those generated files to version control.

The authors of the design make it clear that this is a tooling change, not a Go language change:

Another explicit goal is to avoid a language change. To us, embedding static assets seems like a tooling issue, not a language issue. Avoiding a language change also means we avoid the need to update the many tools that process Go code, among them goimports, gopls, and staticcheck.

The go tool already looks for special comments in Go source files for various things, including // +build tags to include certain files only on specific architectures, and //go:generate comments that tell go generate what commands to run for code-generation purposes. This file-embedding design proposes a new //go:embed comment directive that goes directly above a variable declaration and tells go build to include those files in the resulting binary associated with the variable. Here is a concrete example:

    // The "content" variable holds our static web server content.
    //go:embed image/* template/*
    //go:embed html/index.html
    var content embed.Files

This would make go build include all the files in the image and template directories, as well as the html/index.html file, and make them accessible via the content variable (which is of type embed.Files). The embed package is a new standard library package being proposed that contains the API for accessing the embedded files. In addition, the embed.Files type implements the fs.FS interface from the filesystem design discussed above, allowing the embedded files to be used directly with other standard library packages like net/http and html/template, as well as any third-party packages that support the new filesystem interface.

The design limits the scope of the proposal in an important way. There are many ways that the data in the files could be transformed before being included in the binary: data compression, TypeScript compilation, image resizing, and so on. This design takes a simple approach of just including the raw file data:

It is not feasible for the go command to anticipate or include all the possible transformations that might be desirable. The go command is also not a general build system; in particular, remember the design constraint that it never runs user programs during a build. These kinds of transformations are best left to an external build system, such as Make or Bazel, which can write out the exact bytes that the go command should embed.

Again, the feedback on the Reddit thread was mostly positive, with comments like this one from "bojanz": "This looks like a great start. Thank you for tackling this." There are a few minor suggestions, such as a comment by "zikaeroh" in favor of adding a more powerful path-matching API that supports double star for recursive path matching, like glob('**/*.png', recursive=True) in Python. Kevin Burke, who is the maintainer of a file-embedding package, suggested also storing a cryptographic hash of each file's content so the developer does not have to hash the file at runtime: "This is useful for e.g. cache busting on a static file server".

One of the repeated critiques is from developers who don't like overloading source code comments with the special //go:embed syntax. "Saturn_vk" stated bluntly, "I really don't like the fact that comments are being abused for actual work", and Hacker News commenter "breakingcups" strongly advocated for the use of a project file instead of directives in comments:

Again, more magic comments.

The proposed feature is great, but the unwillingness of the Go team to use a separate, clearly defined project file or at the very least a separate syntax in your code file leads them to stuff every additional feature into comments, a space shared by human notetaking.

Cox summed up his thinking about this with the following comment, which compares the syntax with #pragma for C:

For what it's worth, we already have //go:generate and a few other lesser known ones. And there is a separate draft design to replace // +build with //go:build. At that point we will be completely consistent: these kinds of directives begin with //go:. The point is to look enough like a comment to make tools that don't need to know ignore them, but enough not like a comment to signal to people that something special is going on.

C uses #pragma foo for this. Go simply spells #pragma as //go:.

Next up

There is a fair amount of community support for both draft designs, particularly the more user-facing proposal for file embedding. Many developers are already using third-party file-embedding libraries to simplify their deployments and these efforts will standardize that tooling. It seems likely that the designs will be refined and turned into full proposals. With Go 1.15 due out on August 1, it's possible that these proposals would be ready for Go 1.16 (scheduled for six months out), but if there needs to be another round of feedback — for example, regarding the problems with extension interfaces — it is more likely to be included in Go 1.17 in a year's time.


Index entries for this article
GuestArticlesHoyt, Ben


to post comments

Go filesystems and file embedding

Posted Jul 30, 2020 17:40 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (14 responses)

The inconsistency from Go team is frustrating. On one hand they are stoically opposed to any kind of thread-local storage (which I don't mind at all), on the other hand they are introducing new APIs that don't accept contexts.

And without contexts it's not possible to do context-based logging at all.

Go filesystems and file embedding

Posted Jul 30, 2020 17:48 UTC (Thu) by mgk (guest, #74833) [Link] (5 responses)

While there are places contexts don't make sense, I'm 100% behind you that there's a lot coming out without context support, that is a head scratcher: Like filesystems. I had hoped we stopped assuming filesystems were always available, always fast, never hung, etc. back in the 90's...

Go filesystems and file embedding

Posted Jul 31, 2020 7:23 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (4 responses)

On the other hand Linux still lacks a good story with cancelable local file reads. Adding context to file-related API would be a lie on a most popular server system.

Go filesystems and file embedding

Posted Jul 31, 2020 7:38 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Not all filesystems are local. Cancellation is very much possible for NFS (for which I'm actually already using a Go-based client).

Go filesystems and file embedding

Posted Jul 31, 2020 9:14 UTC (Fri) by benhoyt (subscriber, #138463) [Link] (1 responses)

By cancellation in this context, do you mean timeouts/deadlines? Or actually being able to cancel a read/write at any given time? If it's timeouts, does it work to specify a read/write timeout when you open the filesystem client (or Open() a file), rather than for example having a ctx/timeout on every Read call? Similar to what Russ Cox asks here: https://www.reddit.com/r/golang/comments/hv976o/qa_iofs_d...

Go filesystems and file embedding

Posted Jul 31, 2020 14:28 UTC (Fri) by ibukanov (subscriber, #3942) [Link]

By cancelable in my initial response I meant ability to interrupt a given blocked call so it returns immediately. Timeouts are a subset of that.

What would be ideal if FileHandle can be used as a pseudo-channel in the select statements in Go. But that requires very non-trivial implementation especially on Linux.

Go filesystems and file embedding

Posted Jul 31, 2020 12:35 UTC (Fri) by gray_-_wolf (subscriber, #131074) [Link]

Context is not used just for cancellation. For example, we are passing logger in it. So for those cases it would still be useful even if you cannot actually cancel the operation.

Go filesystems and file embedding

Posted Jul 30, 2020 19:08 UTC (Thu) by ehiggs (subscriber, #90713) [Link] (4 responses)

How would TLS even work in the face of goroutines that can be scheduled in different threads?

Go filesystems and file embedding

Posted Jul 30, 2020 19:10 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Each goroutine would have its own TLS context (GLS?), with potential automatic inheritance for new goroutines.

In other words, exactly like pprof labels work right now: https://rakyll.org/profiler-labels/

Go filesystems and file embedding

Posted Jul 31, 2020 7:54 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (2 responses)

Automatic inheritance for thread locals does not work. Java, where new threads are not created as often as in Go, provides interface to manually deal with the inheritance but even with that subtle bugs are possible.

And any thread-local with a mutable state that is shared between threads pretty much implies that it uses some form of locking internally. At which point using a global map with a mutex looks like a reasonable replacement for a thread-local especially for such heavy thing as Context.

Note I agree that it sucks that a language with GC does not provide a simple way to associate a piece of data with its threads, but at least one can see where the resistance is coming from.

Go filesystems and file embedding

Posted Jul 31, 2020 8:35 UTC (Fri) by smurf (subscriber, #17840) [Link]

Define "does not work". My experience (async Python code with contextvars) says otherwise.

Go filesystems and file embedding

Posted Jul 31, 2020 19:56 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Inheritable thread locals work just fine in Java. The problem is that Java is a bit old-fashioned regarding threading, so a lot of parallel stuff is done in shared global thread pools because thread creation is expensive. This kinda makes inheritable thread-locals a moot point.

On the other hand, in Go pretty much nobody uses long-lived goroutine pools so GLS variables can work just fine. Moreover, pprof labels already work this way (there's no way to read them, you can only set them).

Go filesystems and file embedding

Posted Jul 31, 2020 0:02 UTC (Fri) by benhoyt (subscriber, #138463) [Link] (2 responses)

I'm in two minds about context. It's definitely useful, but I agree with Michal Štrba's 2017 article that it's like a virus that starts polluting all your APIs with "ctx context.Context" (which apart from being something you have to add everywhere, the naming is stutter-y). I personally wouldn't mind "goroutine local storage" for values, though I think that can be solved in other ways too (eg: in an HTTP context, a map of *http.Request to data). And of course there's the very hack-ish libraries that add GLS, like jtolio/gls. As for timeouts, those can usually be done more explicitly, or with an overall timeout per client. For cancellation, I have much less experience and don't know if there's a good solution there.

Go filesystems and file embedding

Posted Jul 31, 2020 4:31 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

There's no other choice than to carry explicit context object in absence of GLS. I don't mind the absence at all, actually. It leads to cleaner code, because developers stop doing things like storing database transactions in thread locals.

And quite a few companies actually have code style that enforces context as the last (or first) parameter to every method in the exported interfaces.

Go filesystems and file embedding

Posted Aug 14, 2020 17:43 UTC (Fri) by HelloWorld (guest, #56129) [Link]

Passing a context around is trivial in purely functional programming, it's called the Reader Monad. It's built right into some modern effect systems like ZIO...


Copyright © 2020, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds