00:05
<MikeSmith>
aleray: if that is in fact a message coming from the html5lib module and not lxml itself, gsnedders might have a clue
00:07
<MikeSmith>
aleray: a code grep indicates that it's an lxml message
00:07
<MikeSmith>
src/lxml/lxml.etree.c
00:09
<aleray>
MikeSmith, thanks. I found this solution: `html = ''.join(c for c in html if valid_xml_char_ordinal(c))` on a forum
00:09
<MikeSmith>
ah, that's a generated file
00:09
<MikeSmith>
ah OK
00:09
<aleray>
not sure if it strips any important thing though
00:10
<aleray>
or just junk characters (the HTML is generated with ckeditor from word documents)
00:10
<aleray>
Here is the link to the forum: http://www.itsprite.com/pythonfiltering-out-certain-bytes-in-python/
00:10
MikeSmith
looks
00:12
<MikeSmith>
looks like http://stackoverflow.com/questions/8733233/filtering-out-certain-bytes-in-python might also be useful
00:12
<aleray>
MikeSmith, same post actually. thanks for pointing ti the source though
00:13
<MikeSmith>
ah ok
00:13
<MikeSmith>
these days I always go to StackOverflow first
00:15
<aleray>
MikeSmith, so it seems to work. I have another small issue: I parse an html fragment with no root element. my code fails because of that because it is expecting a root node and I get a list of elements instead
00:16
<MikeSmith>
aleray: yeah I understand that problem there but can't be of much help just at the moment. Will be freed up in about 2 hours if you're still around
00:17
<aleray>
MikeSmith, thanks. i'll be sleeping probably. Any direction to search myself?
00:43
<gsnedders>
aleray: known bug, but kinda horrible to fix accurately and hence not done yet :\
00:43
<gsnedders>
aleray: I'm increasingly leaning towards just hacking together some horrible fix for it that should at least /mostly/ fix it
01:02
<aleray>
gsnedders, Hi; talking about the invalid xml characters?
01:03
<aleray>
I'm glad the solution I found seems working at least
01:04
<aleray>
gsnedders, may be you could help me with the other thig, that is to be able to parse a fragment with lxml and get an tree rather than a list of elements
01:06
<aleray>
the code here: http://dpaste.com/0XQHK9D raises an `AttributeError: 'list' object has no attribute 'xpath'`
01:06
<aleray>
because `tree = parser.parseFragment(html)` returns a list
01:07
<aleray>
because my fragment contains several elements
01:13
<aleray>
etree and lxml behave differently. See http://dpaste.com/3SE8103 and http://dpaste.com/3G7N7JS
01:13
<aleray>
I'd like the etree behaviour with lxml, So I could use xpath on it
08:39
<zewt_>
"filename.zip may harm your browsing experience, so Chrome has blocked it" nice of Chrome to ask my permission before "blocking it" a harmless file that I now have to download again in firefox
08:39
<zewt_>
all browsers have turned to crap
08:49
<zewt_>
when did it become okay for browsers to override the user on his own system, bodes deeply ill for the future of the web
08:53
<JonathanC>
https://www.w3.org/community/ DataSheets - need members
10:19
<aleray>
I'm stuck with my problem from yesterday: using lxml the parseFragment methods gives me a list of nodes
10:19
<aleray>
with etree it gives me one single element
10:19
<aleray>
because I have a list, I can't use methods like "xpath"
11:39
<annevk>
2016 is close and the Location object is still poorly understood and defined: https://lists.w3.org/Archives/Public/www-archive/2015Oct/0051.html
12:16
<gsnedders>
aleray: hmm… both options seem kinda bad :\
14:26
<aleray>
gsnedders, hi, what do you meean?
19:24
<nox>
annevk: Should an url object's query become null again when its URLSearchParams becomes empty?