Simple is good enough

Posted Nov 25, 2010 21:44 UTC (Thu) by i3839 (guest, #31386)
In reply to: Simple is good enough by mjthayer
Parent article: LPC: Life after X

Problem is that in the case of html, you generally lose the formatting information because that's not in the part you copied, but higher up or in a css file. So there is no native format, you don't want to copy raw html code into a word processor, nor the plain text, but something that more or less looks like what you copied. Not to mention that usually the selected part is "broken" html because not all tags are closed. So it's not that simple and I don't think it's safe to get rid of the list support.

For images and other data formats with an obvious raw format are much easier and better suited for automatic convertion. That can be done automatically without changing the API.

Simple is good enough

Posted Nov 25, 2010 21:56 UTC (Thu) by mjthayer (guest, #39183) [Link]

> Problem is that in the case of html, you generally lose the formatting information because that's not in the part you copied, but higher up or in a css file. So there is no native format, you don't want to copy raw html code into a word processor, nor the plain text, but something that more or less looks like what you copied.

Just for interest I copied some text in Firefox and ran my clipboard format viewer. Here are the results:

$ ../tmp/viewclipformats
Found clipboard format: TIMESTAMP
Found clipboard format: TARGETS
Found clipboard format: MULTIPLE
Found clipboard format: text/html
Found clipboard format: text/_moz_htmlcontext
Found clipboard format: text/_moz_htmlinfo
Found clipboard format: UTF8_STRING
Found clipboard format: COMPOUND_TEXT
Found clipboard format: TEXT
Found clipboard format: STRING
Found clipboard format: text/x-moz-url-priv

Without knowing, it wouldn't surprise me if one of those contained both the html and the formatting information, which I think should be feasible with my proposal too.

Simple is good enough

Posted Nov 26, 2010 22:26 UTC (Fri) by mjthayer (guest, #39183) [Link] (1 responses)

You also have to ask, when an application puts HTML data into the clipboard, what data it is actually putting there. When I select a section of text, pictures and whatever in Firefox and copy I get HTML data in the clipboard. But Firefox can't just put the source of the document from the point where the selection begins to the point where it ends into the clipboard, as it is announcing HTML data, and as you point out, that wouldn't be HTML, it would be broken HTML. So Firefox has no choice but to massage the HTML data anyway, and if it is doing that already, adding the style information inline is no great hardship.

Of course, if you want to reuse that data as is as HTML for some other web page then you are probably out of luck, but if you think of it that makes no sense anyway - if you want to do that you should probably be copying the source of the HTML as plain text. If you select and copy part of a page in Firefox, chances are that what you are actually about to do is to paste it either as plain text (the text visible on the page, not the HTML source) or as formatted text into e.g. OpenOffice.

And if you were copying the data inside some visual HTML editor, it would probably still not make sense for the editor to insert the data as naive HTML - chances are there would be no way to paste the data in any form resembling the source of the page the editor was generating, and in any case, if you were trying to get at the generated source it would make more sense to ask the editor directly than copying and pasting to get at it. In fact I would expect the visual editor to use some internal format which was not valid HTML at all when copying to the clipboard, but which another instance of the editor would know what to do with when pasting it. It might provide a filter to convert it to HTML, but not for the purposes of viewing the source - you don't use the clipboard for that - but rather as a stepping stone for converting it to OOXML or something else.

Hope that made sense, as I am rather short of sleep currently. I would really like to be clear that I am not trying to argue for the sake of arguing here, but rather because responding to the points you make forces me to think things through myself.

Simple is good enough

Posted Nov 27, 2010 10:39 UTC (Sat) by i3839 (guest, #31386) [Link]

> I would really like to be clear that I am not trying to argue for the
> sake of arguing here, but rather because responding to the points you
> make forces me to think things through myself.

Same here, we're trying to figure out if a list of formats is really needed, or if always providing only one and having convertors is sufficient. This choice determines the API, so it's pretty important.

Only having one format and providing convertors is simpler, but less complete. My main concern is that it's not always sufficient, or that it makes implementing copy harder than necessary for some applications, because they have to create one "complete" format and convertors.

Another concern is that you convert from simple->complex->simple, when also supporting a complex type, hoping that the "simple" in the end is the same as what you started with. So the unrelated complex type makes simple types more complex too, with too much room for errors in my opinion. Or in other words, copying simple types is not simple anymore, if you also copy a complex one.

Lastly, I don't really see a way to support multiple types when pasting. It should be the pasting program's decision what type to paste, if it supports multiple types. I don't see another way than supporting a list of types in the pasting API anyway, and then you can as well support lists in the copy API too.

I think you make too many assumptions about what the user or pasting program expects in your line of thinking.

All in all I think the automatic convertion idea is good, but not always sufficient. Combined with today's multiple format support in applications, I think it's best to support multiple formats, but to encourage convertor usage when possible.

Then when someone pastes something the lists are compared, and if they have no common format, a convertor is used.

A list of formats is basically "more of the same", so I think the added complexity, both for the API and implementation, is small enough.

Now we just have to find some time to implement this. I think I'll give it a stab next week. I'll keep you informed (my email address is indan@nul.nu).