00:15
<Luke Warlow>
Do we need a getHTML function on the document interface? We can parse to a document but can't currently serialise from one (not preserving shadow Dom anyway). You can call getHTML on the documentElement but that loses the html element in the output (because that's effectively innerHTML and we'd need outerHTML). A getHTML on the document object itself could also serialise the compat mode back to the doctype? Which feels like it could be useful if you're trying to ensure lossless round tripping of parsing and serialisation?
01:31
<Domenic>
That issue just feels like an area where the community needs to go off and get a widely-used custom element (or React component, I'm not picky) before spending all this time in standards space...
01:32
<Domenic>
Do we need a getHTML function on the document interface? We can parse to a document but can't currently serialise from one (not preserving shadow Dom anyway). You can call getHTML on the documentElement but that loses the html element in the output (because that's effectively innerHTML and we'd need outerHTML). A getHTML on the document object itself could also serialise the compat mode back to the doctype? Which feels like it could be useful if you're trying to ensure lossless round tripping of parsing and serialisation?
I would like that. Serializing the DOCTYPE in particular is nice. jsdom has its own serialization function just for that, because documentElement.outerHTML misses it.
07:08
<annevk>
It might be a little weird that parseHTMLUnsafe() is a static method then and not just document.setHTMLUnsafe().
07:10
<annevk>
But I guess it's fine. Serializing the doctype seems good, but we should be opinionated about it and not support all inputs as-is.
07:22
<Luke Warlow>
But I guess it's fine. Serializing the doctype seems good, but we should be opinionated about it and not support all inputs as-is.
Yeah okay maybe lossless was the wrong choice of word. I think the compatMode should serialise to the presence or lack of a standard Doctype
07:24
<Luke Warlow>
It might be a little weird that parseHTMLUnsafe() is a static method then and not just document.setHTMLUnsafe().
I guess that's fine though because it ensures the state of the document object is consistent when parsing?
07:29
<annevk>
Hmm, the state for non-document nodes is not. But in theory we could also add setHTMLUnsafe() to document if we ever wanted to parse without allocation. Prolly not worth it for the foreseeable future though.
07:33
<Luke Warlow>
I got thinking on this because if we had document getHTML, you could use trusted types and the sanitizer APIs to always safely set the srcdoc attribute on an iframe without needing any dependencies (unless there's some missing issues I've not thought of). mXSS is what I was missing. And also just because it does feel like a missing piece to the serialization puzzle.
07:41
<Domenic>
Yeah okay maybe lossless was the wrong choice of word. I think the compatMode should serialise to the presence or lack of a standard Doctype
I don't think compatmode should impact it. It should be whether the Document has a DocumentType node. And then it should serialize according to https://html.spec.whatwg.org/#html-fragment-serialisation-algorithm .
07:43
<Luke Warlow>
I don't think compatmode should impact it. It should be whether the Document has a DocumentType node. And then it should serialize according to https://html.spec.whatwg.org/#html-fragment-serialisation-algorithm .
I was thinking back to how we used to do it where I worked before but maybe we were doing it badly 😅
07:44
<Domenic>
Interesting that the parser preserves name, system ID, and public ID, but the serializer only keeps the name.
07:45
<Luke Warlow>
I'll open an issue on the html spec proposing this idea and we can discuss further the details there. Just wanted to guage if we thought it was any good or not.
07:52
<Domenic>
I can't find any way to trigger the HTML fragment serializing algorithm with a doctype, sad.
07:53
<Domenic>
I also either found a bug or am blind https://github.com/whatwg/dom/issues/1278
07:54
<freddy>
Huh, I learned there is document.doctype but it doesn't serialize to a doctype string?
07:54
<freddy>
(testing in Firefox)
07:55
<freddy>
I've used document.body.parentElement.outerHTML in the past, but you'd have to guess and attach the right doctype. Either way. I think there's value in a getter for the document.
07:55
<Domenic>
It doesn't have any serializing methods or properties
08:39
<Ms2ger>
Did we manage to remove the getters on doctype?
08:57
<freddy>
There were some? Well, then apparently...yes? :)
09:16
<annevk>
I guess I disagree with Domenic. I like the idea of serializing based on quirks mode a lot better. That way we either have no doctype, an almost doctype, or the doctype in the output.
09:25
<Luke Warlow>
I raised https://github.com/whatwg/html/issues/10280
09:26
<freddy>
We also discussed this in our Sanitizer issue triage and thought that a document.setHTML might be neat, such that one could do iframe.contentDocument.setHTML() and without doing a parsing/serialization roundtrip between for that specific use case. https://github.com/WICG/sanitizer-api/issues/117#issuecomment-2060735167 has more.
11:55
<annevk>
freddy: note that it would require quite a number of downstream changes (none too hard I suspect) as we currently only consider possible element children
14:42
<Luke Warlow>
new XMLSerializer().serializeToString(document.doctype) this will give you a string representation back. But to your point this obviously goes via XML serialisation rather than HTML.
23:23
<zcorpan>
Why did we use labels and not milestones for Stages?