#whatwg on 2007-06-06

03:12	<MikeSmith>	(asked the following over on public-html, but want to ask here also)
03:12	<MikeSmith>	I think it would be useful to have a somewhere a high-level "What problems we are trying to solve with HTML5" description.
03:12	<MikeSmith>	Suggestions?
03:13	<MikeSmith>	I think "interoperability among browsers" may be a big one (and interoperable error handling).
03:14	<MikeSmith>	maybe also, "better support for writing Web applications (instead of just Web documents)"
03:15	<zcorpan>	that part is interesting
03:15	<MikeSmith>	... "riqorously/thoroughly documenting conformant application behavior"
03:15	<zcorpan>	because html5 introduces new features for web apps, people think that html5 is good for web apps, but xhtml2 is better for "structured documents"
03:16	<MikeSmith>	zcorpan - yeah, I can see that being a inference that some would draw
03:17	<zcorpan>	some also think that xhtml2 is better because you can embed svg and other xml namespaces in it, and not realising that you can do the same in xhtml5
03:18	<othermaciej>	we could even make it possible in HTML
03:19	<zcorpan>	yeah
03:25	<MikeSmith>	I think that engaging much in that discussion might obscure that the important distinction is that HTML5 has as probably its primary goal to precisely specify behavior of conformant UAs (HTML processing applications), less so conformant authoring applications
03:26	<MikeSmith>	XHTML2 spec does not really have that as a goal (as far as I can see)
03:26	<MikeSmith>	I guess my take on the which-is-better-for-authoring thing is, it's mostly a matter of what your authoring requirements are, or a matter of taste ... anyway, not something worth battling about
03:27	<MikeSmith>	or to put it another way, author in whatever language you want, as long as you transform your source into conformant HTML5 before delivering it to UAs
03:29	<MikeSmith>	for many use cases, neither authoring directly in HTML5 nor in XHTML2 are the best choice
03:29	<MikeSmith>	but authoring instead on some custom vocabulary whose content models closely match your content, then transform that to what you deliver to UAs
03:30	<othermaciej>	HTML5 does indeed define conforming documents and therefore what conforming authoring tools must output
03:30	<othermaciej>	I think authoring in HTML is better than authoring in a custom vocabulary if you are going to do anything dynamic
03:30	<othermaciej>	because then your script code can act directly on the model rather than a transformed view
03:33	<karlUshi>	othermaciej: are you promoting the end of MySQL ;)
03:34	<MikeSmith>	othermaciej - I guess my point about that is there are a range of opinions about what's best for authoring, and it's open to debate and there's more value as far as communicating "what problems is HTML5 trying to solve" in focusing on the stuff that's not really debatable
03:35	<karlUshi>	agreed with MikeSmith
03:37	<othermaciej>	well, HTML5 aims to make things better for UA interoperability, as a target format, and as a direct authoring format
09:41	<annevk>	http://edward.oconnor.cx/2007/BarCamp-San-Diego/
09:59	<annevk>	seems the <pre>\n "hack" needs to be implemented for <textarea> as well jeremyb
09:59	<annevk>	euh jgraham
10:02	<annevk>	the spec now deals with the entities but not yet with the incorrect bytes...
10:35	<annevk>	http://ln.hixie.ch/?start=1181118077&count=1
10:36	<othermaciej>	I've been reading that
10:36	<othermaciej>	lots of amusing lines
10:40	annevk	barely has the time for rewriting the CSSOM
10:40	<annevk>	there's only a few people on this planet who understand CSS well enough to write a spec for it
10:41	<annevk>	then there's only a few people who are good at writing specifications
10:41	<annevk>	the union of both is Hixie I think
10:42	annevk	wants <datagrid> to handle sortable tables without scripting
10:45	<tantek>	annevk, there's also very few who have the time to write/edit a CSS spec
10:47	<annevk>	yeah, it be much like the fulltime job HTML5 is
10:48	<othermaciej>	the combination of having the time and the ability rules out a whole lot of possible candidates
10:49	<annevk>	all of them so far :(
10:49	<othermaciej>	I guess one could also add "inclination"
10:50	<annevk>	same goes for some of the SVG stuff btw...
10:51	<tantek>	annevk, I'd rather see a CSS Shapes draft before an SVG rewrite
10:51	<othermaciej>	in the SVG WG, having spec-writing ability and knowledge of relevant technology does not seem to be a requirement for editorship
10:51	<tantek>	it seems that most use of shapes on the web are decorative, not content
10:51	<tantek>	thus more appropriate to be done as a "styling"
10:51	<tantek>	and besides, markup is for marking up text
10:52	<annevk>	you could use XBL to hide the SVG images from actual content
10:54	<othermaciej>	SVG does contain some but not all of the needed capabilities, but it makes them all pretty awkward to use in combination with HTML markup
11:36	<hsivonen>	btw, I'm finding that the tokenizer can easily be implemented as a recursive descent tokenizer without an explicit state variable
11:37	<hsivonen>	wrapper loops are needed for attributes and the data state to avoid arbitrarily deep recursion
11:38	<mikeday>	buffering?
11:38	<mikeday>	oh, you're doing SAX
11:38	<hsivonen>	mikeday: I intend to do SAX with and without buffering, DOM and XOM
11:38	<mikeday>	so the recursion is just for eg. see a '<', call parseStartTag() ?
11:38	<hsivonen>	mikeday: this is the Tokenizer only
11:39	mikeday	nods
11:39	<hsivonen>	mikeday: yes
11:39	<mikeday>	so if you don't use arbitrary recursion, technically you don't need to recurse at all, right?
11:39	<annevk>	if you introduce new states...
11:39	<annevk>	and allow it to start in arbitrary states
11:39	<mikeday>	you could just jump around inside a big single function
11:40	<hsivonen>	mikeday: well, to avoid stack overflow regardless of input, I have a loop around the attribute states so that the stack rewinds back to the loop between attributes
11:41	<mikeday>	right, eg. while getAttribute() ...
11:41	<hsivonen>	yes
11:41	<mikeday>	but you don't actually need recursive calls, if you're not parsing a recursive grammar
11:41	<mikeday>	it's just for convenience structuring your code, yes?
11:41	<hsivonen>	this is for code structuring, yes
11:42	<hsivonen>	also, I am assuming that a straight final method invocation in Java is going to be faster than state lookup plus method dispatch
11:42	<mikeday>	hmm, Java has no goto, right? :)
11:43	<hsivonen>	mikeday: no goto in .java level
11:43	<mikeday>	right
11:43	<mikeday>	sounds pretty good then :)
11:43	<hsivonen>	mikeday: my reasoning is that this is as good as it gets without goto and jump arithmetic based on input token
11:43	<othermaciej>	the way to code a state machine is a loop with a switch statement
11:44	<othermaciej>	not via dynamic method dispatch
11:44	<mikeday>	s/the way/a way/ :)
11:44	<othermaciej>	the efficient way
11:45	<hsivonen>	othermaciej: you get as many method invocations either way, right?
11:45	<othermaciej>	hsivonen: well, I'm not sure why you contrasted "static final method invocation" with the other option
11:46	<hsivonen>	straight--not static
11:46	<hsivonen>	othermaciej: if you have one method per state
11:47	<hsivonen>	othermaciej: and state B follows A, why would I return to a dispatch loop in between?
11:47	<othermaciej>	depends on whether function calls are more expensive in your language than conditional branches
11:48	<hsivonen>	othermaciej: ah, you are assuming that I could do away with function calls
11:48	<hsivonen>	othermaciej: I am assuming one method per state either way for code structuring sanity
11:48	<hsivonen>	(since this is human-maintained code--not generated code)
11:48	<hsivonen>	I'm hoping the HotSpot does some inlining
11:49	<othermaciej>	well, with the switch, the compiler and/or the Java runtime can definitely inline everything into the switch statement
11:49	<mikeday>	I guess a parsing DSL that compiled down to Java byte code could help
11:49	<othermaciej>	if each processing method is final and they don't call each other
11:49	<hsivonen>	othermaciej: good point
11:49	<hsivonen>	othermaciej: thanks
11:50	<othermaciej>	anyway I don't know which way would be faster in Java
11:50	<othermaciej>	I don't have a lot of experience performance-tuning Java code
11:50	<othermaciej>	(though I do have performance-tuning experience in general)
11:50	<hsivonen>	yeah, this is guesswork without either benchmarking or knowing what HotSpot inlines
11:51	<hsivonen>	ok. I'll convert to a switch that is potentially inlineable
11:52	<mikeday>	a bit of premature optimisation going on here perhaps :)
11:52	<mikeday>	by the way, have you done meta charset detection yet?
11:52	<hsivonen>	mikeday: written--not run
11:52	<othermaciej>	depends on whether hsivonen finds it easier to code a finite state machine or a recursive descent parser
11:53	<mikeday>	hmm, I better hurry up then, I've been dragging my feet over it
11:53	<hsivonen>	mikeday: in C?
11:53	<mikeday>	yes
11:54	<othermaciej>	mikeday: you're writing an HTML5 parser in C?
11:54	<mikeday>	yes, that's why I come here, to make me feel guilty enough to work on it some more
11:54	<hsivonen>	hmm. come to think of it, I still think the way I have coded this is potentially a bit more efficient if HotSpot does deep inlines
11:55	<hsivonen>	perhaps I leave the optimization for later after all
11:56	<annevk>	a collegue did some testing on tokenization in C/C++ versus Python and JavaScript
11:56	<annevk>	C: ~1ms, Python: ~100ms, JavaScript: ~500ms
11:56	<mikeday>	lucky no one is writing a parser in JavaScript I guess
11:56	<mikeday>	...or ARE they
11:57	<hsivonen>	annevk: I would expect buffering to matter a lot in that case (and string object creation)
11:57	<mikeday>	I guess you could use a dictionary and avoid creating new string objects where possible
11:57	<mikeday>	eg. precache tag names and attribute names
11:59	<othermaciej>	which JavaScript implementation?
11:59	<hsivonen>	mikeday: how do you look at a character in Python or JS without creating a string object?
11:59	<othermaciej>	JavaScript suffers from the boxed/unboxed distinction for strings there I guess
11:59	<othermaciej>	if you actually use string methods
12:00	<Philip`>	HotSpot should be happy with inlining methods even when they're not final or are potentially recursive
12:00	<mikeday>	hsivonen, buffer file to array of int instead?
12:01	<hsivonen>	Philip`: these are final and to a finite recursion depth
12:01	<hsivonen>	mikeday: ok
12:01	<othermaciej>	array of int is much less efficient than a string in JS
12:01	<annevk>	othermaciej, string methods were used, Opera 9.2 was used for testing I think
12:01	<Philip`>	(e.g. http://java.sun.com/developer/technicalArticles/Networking/HotSpot/inlining.html (from 1999) talks about inlining non-final methods, and it just remembers enough to undo the optimisation if its assumptions are ever violated)
12:02	<annevk>	Python is already a hundred times slower...
12:02	<mikeday>	how about Ruby? :)
12:02	<othermaciej>	it's hard to beat C
12:02	<annevk>	We really need a C implementation if we want to use it for surveys and such
12:02	<othermaciej>	except sometimes with C++
12:02	<mikeday>	surveys?
12:02	<annevk>	Well, surveys covering lots of pages...
12:02	<othermaciej>	if I were doing it I would use C++
12:02	<mikeday>	bleh.
12:03	<annevk>	mikeday, like the research Ian did
12:03	<Philip`>	Even with a thousand pages, the Python one is unpleasantly slow :-(
12:03	<mikeday>	oh, right
12:03	<othermaciej>	and probably at least two open source HTML5 parsers will be written in C++ sooner or later
12:03	annevk	wonders how hard the browser parsers are to extract
12:03	<hsivonen>	IIRC, HotSpot beats C for some problems
12:03	<hsivonen>	will be interesting to see if this is one of them :-)
12:04	<mikeday>	Java beats C for malloc()
12:04	<mikeday>	so... don't use malloc :)
12:04	<hsivonen>	:-)
12:05	<othermaciej>	annevk: our current HTML parser does the DOM building, so probably not that easily separable
12:05	<Philip`>	C always wins because you can implement a JVM in it :-)
12:06	<annevk>	othermaciej, that's what I thought, main reason why I think having a third would be good
12:07	<othermaciej>	having a standalone one would be nice, it it was packaged well
12:08	<mikeday>	indeed.
12:09	<mikeday>	yay, testhtml is parsing attribute names and getting "http-equiv"
12:10	<othermaciej>	if I were doing it for fun, I'd do C++ implementation, C API
12:10	<othermaciej>	but if I have hobby coding time it will probably be spent on WebKit hacking
12:11	<mikeday>	hmm, not getting attribute values though. That's slightly useless.
12:12	<hsivonen>	(fwiw, recursive call to a finite depth with loops in the right places seems to be how others have written XML parsers in Java)
12:12	<mikeday>	that's also how libxml2 is written I think
15:16	annevk	started testing <base> himself
15:16	<annevk>	tests here: http://tc.labs.opera.com/html/base/
15:40	<annevk>	seems that IE7 happily does dynamic changes
15:40	annevk	just added 005 and 006
15:41	<annevk>	I actually already figured that out while testing XMLHttpRequest but never feeded it back to the HTML5 spec I think
15:52	<annevk>	So open questions: support xml:base? do dynamic changes affect baseURI or also inserted <img> etc? suport xml:base in text/html? reverse engineer IE7 href= handling?
15:52	<annevk>	dunno, just baseURI, dunno, if it's simple...
16:36	<gsnedders>	ARGH!
16:36	<gsnedders>	people are _still_ arguing <p/> is a self-closing element, citing the validator (which is correctly parsing it as a NET)!
16:37	<gsnedders>	they're also saying HTML isn't SGML, despite me citing the spec.
16:37	<annevk>	correctly?
16:37	<annevk>	point them to HTML5 and tell them that that's what browsers will implement
16:37	<gsnedders>	well, under the spec it is parsing it as
16:38	<gsnedders>	annevk: the argument is in the context of what the HTML 4.01 spec says.
16:38	<gsnedders>	annevk: which most certainly is SGML.
16:38	<annevk>	HTML4 is irrelevant
16:38	<gsnedders>	agreed
16:49	<annevk>	One use case for minlength= is search systems that don't work with less than 4 characters
16:49	<annevk>	However, I think that's a usability problem with those search systems and not really a good use case...
16:55	<Philip`>	minlength= and maxlength= seem largely unrelated, since the latter stops you entering out-of-range strings but the former presumably only stops you submitting out-of-range strings (because it'd be really horrible UI if it didn't let you delete and retype the contents of the box - though I suppose I've seen people do that anyway...)
16:55	<annevk>	good point
17:29	<duryodhan>	why were the digital signatures left out of the new Web Forms spec?? what patent problems are you referring to?
17:35	<annevk>	duryodhan, care to elaborate?
17:36	<duryodhan>	The Web Forms doc ...
17:37	<duryodhan>	sez that Digital Signatures weren't addressed because of Patent concerns
17:37	<duryodhan>	what are these concerns ?
17:37	<duryodhan>	cos GPG/PGP etc. are easily available ...
17:37	<annevk>	oh, I see
17:39	<annevk>	duryodhan, e-mail the list if you have a solution
17:42	<duryodhan>	I wish :D
17:42	<duryodhan>	I don't know the problem ...
23:34	<Hixie>	the internet is very quiet today
23:59	<Hixie>	i wonder if i can resolve the <base> problems by simply waiting a few more months for IE7 to get more market share and for the pages that break to get fixed...
23:59	<Dashiva>	A tempting plan
23:59	<kingryan>	but that only works if you don't talk about it, right?