#whatwg on 2015-10-18

00:05	<MikeSmith>	aleray: if that is in fact a message coming from the html5lib module and not lxml itself, gsnedders might have a clue
00:07	<MikeSmith>	aleray: a code grep indicates that it's an lxml message
00:07	<MikeSmith>	src/lxml/lxml.etree.c
00:09	<aleray>	MikeSmith, thanks. I found this solution: `html = ''.join(c for c in html if valid_xml_char_ordinal(c))` on a forum
00:09	<MikeSmith>	ah, that's a generated file
00:09	<MikeSmith>	ah OK
00:09	<aleray>	not sure if it strips any important thing though
00:10	<aleray>	or just junk characters (the HTML is generated with ckeditor from word documents)
00:10	<aleray>	Here is the link to the forum: http://www.itsprite.com/pythonfiltering-out-certain-bytes-in-python/
00:10	MikeSmith	looks
00:12	<MikeSmith>	looks like http://stackoverflow.com/questions/8733233/filtering-out-certain-bytes-in-python might also be useful
00:12	<aleray>	MikeSmith, same post actually. thanks for pointing ti the source though
00:13	<MikeSmith>	ah ok
00:13	<MikeSmith>	these days I always go to StackOverflow first
00:15	<aleray>	MikeSmith, so it seems to work. I have another small issue: I parse an html fragment with no root element. my code fails because of that because it is expecting a root node and I get a list of elements instead
00:16	<MikeSmith>	aleray: yeah I understand that problem there but can't be of much help just at the moment. Will be freed up in about 2 hours if you're still around
00:17	<aleray>	MikeSmith, thanks. i'll be sleeping probably. Any direction to search myself?
00:43	<gsnedders>	aleray: known bug, but kinda horrible to fix accurately and hence not done yet :\
00:43	<gsnedders>	aleray: I'm increasingly leaning towards just hacking together some horrible fix for it that should at least /mostly/ fix it
01:02	<aleray>	gsnedders, Hi; talking about the invalid xml characters?
01:03	<aleray>	I'm glad the solution I found seems working at least
01:04	<aleray>	gsnedders, may be you could help me with the other thig, that is to be able to parse a fragment with lxml and get an tree rather than a list of elements
01:06	<aleray>	the code here: http://dpaste.com/0XQHK9D raises an `AttributeError: 'list' object has no attribute 'xpath'`
01:06	<aleray>	because `tree = parser.parseFragment(html)` returns a list
01:07	<aleray>	because my fragment contains several elements
01:13	<aleray>	etree and lxml behave differently. See http://dpaste.com/3SE8103 and http://dpaste.com/3G7N7JS
01:13	<aleray>	I'd like the etree behaviour with lxml, So I could use xpath on it
08:39	<zewt_>	"filename.zip may harm your browsing experience, so Chrome has blocked it" nice of Chrome to ask my permission before "blocking it" a harmless file that I now have to download again in firefox
08:39	<zewt_>	all browsers have turned to crap
08:49	<zewt_>	when did it become okay for browsers to override the user on his own system, bodes deeply ill for the future of the web
08:53	<JonathanC>	https://www.w3.org/community/ DataSheets - need members
10:19	<aleray>	I'm stuck with my problem from yesterday: using lxml the parseFragment methods gives me a list of nodes
10:19	<aleray>	with etree it gives me one single element
10:19	<aleray>	because I have a list, I can't use methods like "xpath"
11:39	<annevk>	2016 is close and the Location object is still poorly understood and defined: https://lists.w3.org/Archives/Public/www-archive/2015Oct/0051.html
12:16	<gsnedders>	aleray: hmm… both options seem kinda bad :\
14:26	<aleray>	gsnedders, hi, what do you meean?
19:24	<nox>	annevk: Should an url object's query become null again when its URLSearchParams becomes empty?