WHATWG on 2022-04-06

06:35	<annevk>	Yeah, the plan is to obsolete `embed`/`object`. `object` in particular has some capabilities that are unique to it still, but we're chipping away at those.
08:02	<annevk>	GitHub's "smart URL paste" continues to trip me up, I keep forgetting it's there and ending up with a result I don't want
08:03	<annevk>	And I kinda worry that if I learn it I'll expect it in other places and run into trouble there
09:23	<freddy>	I'm looking for something that resets a whole document, including expando attributes etc. - I suppose I could emulate this by serializing+parsing the whole document or by cloning it. But, ideally, I'm looking for existing patterns for this "reset" or at least individual puzzle pieces. It seems to me that even the puzzle piece that removes "all" event handlers from an element seems to be a "no" already :/
11:54	<annevk>	freddy: "document open steps" does a bunch of that
11:55	<freddy>	I did see https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#opening-the-input-stream, but it "replace the document with a new object" sounded to me as if the corresponding element-tree would be gone?!
11:56	<freddy>	OK, I realize I only read the non-normative stuff (I feel bad now)
11:57	<annevk>	freddy: it definitely doesn't do sanitization though; for sanitize() I'm still curious what the actual envisioned usage patterns are
11:58	<annevk>	But we do have primitives such as "remove all listeners" in the platform, but they're not widely available
11:59	<freddy>	Yeah, so the thing is that there's an apparent lack of "inside knowledge" from the sanitizer folks (freddy, otherdaniel, koto) about the potential state that could be attached to the document. The lack of knowledge lead to a security-conservative "we want it all gone" design goal. Similar to the other security-conservative choices we made (can't just a string-to-string API and hope for the best, can't parse a string with implicit/wrong context element, etc.).
12:00	<freddy>	So, one ideas is that I'm trying to find out if there's an algorithm that would a) shine some light on the apparent lack of details and b) alleviate the concerns by being an existing thing.
12:00	<annevk>	Again though, what's the actual input here? If you start with a tree, don't you have to trust that tree already?
12:01	<freddy>	On the other hand, I know that all other sanitizer libraries traverse a DOM tree based on the input (or parse it based on the input) and not even re-use those elements but create completely new ones and copy the bits over selectively. Which is a bit like a clone, but more... iterative
12:01	<freddy>	No. You don't. E.g., `<iframe sandbox="allow-same-origin" src="bad stuff>` contains attacker-controlled stuff and you want to "promote" it into the current document, but need it to be sanitized first
12:02	<freddy>	so `sanitize(iframe.contentDocument)`
12:03	<freddy>	OK, admittedly a `sandbox`ed iframe wouldn't have event listeners (because it doesnt script). But maybe your privilege-separation mechanism is `<iframe sandbox="allow-script" src="some-sandbox-domain-for-user-content">` (google, github, mdn, bugzilla use those sandbox-domains A LOT). And if you'd want to promote that into a current document you wouldn't iframe it, but instead `XHR()` it and sanitize the returned `document`
16:13	<vrafaeli>	Let me explain one thing that I'm concerned now. Imagine the following scenario. You list through several PDFs, and in-between (when the loading happens for example) you don't want to show the document. But I guess we don't want the iframe/embed/object to be removed from the DOM and then added again, cause that might cause performance penalty? We can hide this HTML element using CSS I guess which is probably optimal. One of the alternatives is to put "about:blank" in the "src" attribute. But then "iframe" will lose the type information because it doesn't have the "type" attribute. So I'm concerned that in this alternative the browser's PDF engine will get "remounted" which might cause performance penalty? (this is perhaps browser specific thing, but I'd appreciate your thoughts on this)
16:39	<annevk>	vrafaeli: the only thing I know about performance is that you want to measure first
16:40	<annevk>	freddy: how would you get iframe.contentDocument unless it's same-origin?
16:47	<freddy>	annevk: (Sorry, I somehow changed direction midsentence above?!) I suppose the more realistic scenario, is that the content is on a sandbox domain that is CORS-enabled and you fetch it via XHR? OK, I see...then the follow-up question is why would anyone add stuff to the document before sanitizing? Mh. I suppose assuming that a framework is operating in mysterious ways? Realistically, people don't write all the code themselves. I admit that it's getting handwavy. 😕 But given that we took the conservative approach with all string-returning APIs, why don't we take the conservative approach when accepting documents?
16:48	<freddy>	Actually, I really have to disconnect. I will be back tomorrow.
17:24	<annevk>	freddy: the conservative approach is to just add setHTML() until we understand this better, imo
17:53	<ntim>	https://twitter.com/argyleink/status/1511761383024521218 pretty sure this should use <popup>