Discussion:
[e-lang] XML support in E
Thomas Leonard
2009-10-26 17:08:56 UTC
Permalink
Hi,

Could someone give some simple examples showing how to read and write
XML using E?

The XMLQuasiParser looks reasonable, but says it is deprecated in favour
of using Terms. I can build Terms manually, but I can't see how to turn
serialised XML into a Term, or a Term into serialised XML.

I found this, but it doesn't say which methods to call:

http://www.erights.org/data/terml/embeddings.html

There's also Term.updoc in the source, which shows how to convert a sml
DOM to a term, but not the other way around. Also, sml doesn't support
all of XML, e.g.

? sml`<foo/>`
# problem: Failed: syntax error: expected '>' but got '/' at 4

Thanks,
--
Dr Thomas Leonard
IT Innovation Centre
2 Venture Road
Southampton
Hampshire SO16 7NP

Tel: +44 0 23 8076 0834
Fax: +44 0 23 8076 0833
mailto:***@it-innovation.soton.ac.uk
http://www.it-innovation.soton.ac.uk
Kevin Reid
2010-01-12 22:06:28 UTC
Permalink
Post by Thomas Leonard
Could someone give some simple examples showing how to read and write
XML using E?
The XMLQuasiParser looks reasonable, but says it is deprecated in favour
of using Terms. I can build Terms manually, but I can't see how to turn
serialised XML into a Term, or a Term into serialised XML.
http://www.erights.org/data/terml/embeddings.html
There's also Term.updoc in the source, which shows how to convert a sml
DOM to a term, but not the other way around. Also, sml doesn't support
all of XML, e.g.
? sml`<foo/>`
# problem: Failed: syntax error: expected '>' but got '/' at 4
There is no E-styled XML library built into E; this is certainly
something which ought to be addressed.

Such a library should of course use immutable trees (vs. e.g. DOM
which is mutable) and have quasiliteral/pattern support for users.


E-on-JavaScript's Updoc-to-HTML component is an example of XML
manipulation in E: it uses the Java DOM libraries (and, when a real E
XML library exists, should be converted to use it).
http://wiki.erights.org/wiki/E-on-JavaScript


There are two different approaches which could be used for an E XML
library: one is to use the Term-tree objects, and merely write an
xml__quasiParser which allows one to use the XML-in-TermL embedding,
as well as facilities for reading/writing XML documents. The other is
to have a distinct object type for XML tree nodes; this has the
advantage that its methods can be optimized for the needs of XML
applications, and its __printOn would show XML rather than TermL.

Parsing and printing could be handled for starters by using Java's
builtin XML facilities; I'm not sure exactly how much could be reused
vs. reimplented.

Questions:

* Would you be interested in working on the project of an XML library
for E?

* If I were to work on it, would you use it and give feedback?
--
Kevin Reid <http://switchb.org/kpreid/>
Mark Miller
2010-01-12 22:34:44 UTC
Permalink
I have become ever more attracted to JsonML <http://jsonml.org/> as a way to
handle XML data. Since E's term trees already handle JSON, I suggest using
the JsonML mapping of XML data into JSON structures. I suggest we would then
have no need for an XML quasi-parser. We could just use the term tree
quasiparser in quasi-JsonML format, to manipulate XML trees as translated to
JsonML structures. Even for text markup, which is the best case for the XML
vs JsonML comparison, I still find JsonML notation more readable than XML.
Post by Kevin Reid
Post by Thomas Leonard
Could someone give some simple examples showing how to read and write
XML using E?
The XMLQuasiParser looks reasonable, but says it is deprecated in favour
of using Terms. I can build Terms manually, but I can't see how to turn
serialised XML into a Term, or a Term into serialised XML.
http://www.erights.org/data/terml/embeddings.html
There's also Term.updoc in the source, which shows how to convert a sml
DOM to a term, but not the other way around. Also, sml doesn't support
all of XML, e.g.
? sml`<foo/>`
# problem: Failed: syntax error: expected '>' but got '/' at 4
There is no E-styled XML library built into E; this is certainly
something which ought to be addressed.
Such a library should of course use immutable trees (vs. e.g. DOM
which is mutable) and have quasiliteral/pattern support for users.
E-on-JavaScript's Updoc-to-HTML component is an example of XML
manipulation in E: it uses the Java DOM libraries (and, when a real E
XML library exists, should be converted to use it).
http://wiki.erights.org/wiki/E-on-JavaScript
There are two different approaches which could be used for an E XML
library: one is to use the Term-tree objects, and merely write an
xml__quasiParser which allows one to use the XML-in-TermL embedding,
as well as facilities for reading/writing XML documents. The other is
to have a distinct object type for XML tree nodes; this has the
advantage that its methods can be optimized for the needs of XML
applications, and its __printOn would show XML rather than TermL.
Parsing and printing could be handled for starters by using Java's
builtin XML facilities; I'm not sure exactly how much could be reused
vs. reimplented.
* Would you be interested in working on the project of an XML library
for E?
* If I were to work on it, would you use it and give feedback?
--
Kevin Reid <http://switchb.org/kpreid/>
_______________________________________________
e-lang mailing list
http://www.eros-os.org/mailman/listinfo/e-lang
--
Text by me above is hereby placed in the public domain

Cheers,
--MarkM
Kevin Reid
2010-01-12 23:11:35 UTC
Permalink
Post by Mark Miller
I have become ever more attracted to JsonML <http://jsonml.org/> as
a way to handle XML data. Since E's term trees already handle JSON,
I suggest using the JsonML mapping of XML data into JSON structures.
I suggest we would then have no need for an XML quasi-parser. We
could just use the term tree quasiparser in quasi-JsonML format, to
manipulate XML trees as translated to JsonML structures. Even for
text markup, which is the best case for the XML vs JsonML
comparison, I still find JsonML notation more readable than XML.
Against JsonML:

* I do not see a specification for JsonML; not even as much of one as
JSON started with.

* It does not appear to have any handling of XML namespaces; this is
fatal. The programmer must not be required to manually manage prefix
declarations inside of XML tree-composing code or they will get it
wrong (or have to think hard about issues that the software could be
handling for them).

* The point of JsonML as described is to handle XML documents as
native JavaScript values; in that case, we might as well use E lists
and maps, rather than the double-embedding inside the JSON subset of
term-trees.

For having an XML quasiparser:

* I disagree regarding readability, at least in that I want to have
the choice of either notation.

* E's goals of robustness and security suggest that we should provide
facilities which are attractive safe substitutes for plain-text string
interpolation. Having an XML quasiparser means that it is *trivial* to
write xml`<title>$docTitle</title>` instead of `<title>$docTitle</
title>` and get “XSS”/“injection” “protection”, even if the programmer
knows nothing of the details of the XML tree representation.
--
Kevin Reid <http://switchb.org/kpreid/>
Kevin Reid
2010-01-14 12:43:49 UTC
Permalink
Post by Kevin Reid
Post by Mark Miller
I have become ever more attracted to JsonML <http://jsonml.org/> as
a way to handle XML data. Since E's term trees already handle JSON,
I suggest using the JsonML mapping of XML data into JSON structures.
I suggest we would then have no need for an XML quasi-parser. We
could just use the term tree quasiparser in quasi-JsonML format, to
manipulate XML trees as translated to JsonML structures. Even for
text markup, which is the best case for the XML vs JsonML
comparison, I still find JsonML notation more readable than XML.
* I do not see a specification for JsonML; not even as much of one as
JSON started with.
* It does not appear to have any handling of XML namespaces; this is
fatal. The programmer must not be required to manually manage prefix
declarations inside of XML tree-composing code or they will get it
wrong (or have to think hard about issues that the software could be
handling for them).
I hereby retract the first of these statements: I failed to scroll
down on the main page and find the grammar.

Regarding XML namespaces, jsonml.org states *exactly the wrong* thing
(for an in-memory XML representation), requiring programmers to work
Post by Kevin Reid
XML Namespaces
JsonML supports namespaces the same way that namespaces were handled
in XML 1.0. The element name is a concatenation of the namespace
prefix, the colon ':' character, and the element local-name.
--
Kevin Reid <http://switchb.org/kpreid/>
Thomas Leonard
2010-01-14 10:45:38 UTC
Permalink
Post by Kevin Reid
Post by Thomas Leonard
Could someone give some simple examples showing how to read and write
XML using E?
[...]
Post by Kevin Reid
There is no E-styled XML library built into E; this is certainly
something which ought to be addressed.
Such a library should of course use immutable trees (vs. e.g. DOM
which is mutable) and have quasiliteral/pattern support for users.
E-on-JavaScript's Updoc-to-HTML component is an example of XML
manipulation in E: it uses the Java DOM libraries (and, when a real E
XML library exists, should be converted to use it).
http://wiki.erights.org/wiki/E-on-JavaScript
There are two different approaches which could be used for an E XML
library: one is to use the Term-tree objects, and merely write an
xml__quasiParser which allows one to use the XML-in-TermL embedding,
as well as facilities for reading/writing XML documents. The other is
to have a distinct object type for XML tree nodes; this has the
advantage that its methods can be optimized for the needs of XML
applications, and its __printOn would show XML rather than TermL.
Parsing and printing could be handled for starters by using Java's
builtin XML facilities; I'm not sure exactly how much could be reused
vs. reimplented.
* Would you be interested in working on the project of an XML library
for E?
* If I were to work on it, would you use it and give feedback?
We'd certainly be interested in testing any XML support. I don't think
I'd be able to help with the implementation though: a selling point for
E was compatibility with existing Java libraries, so telling my manager
I need to spend time implementing XML support is unlikely to fly ;-)

My main concern here is about speed. I temporarily used the JSON support
in part of my code, and it noticeably slows the GUI down when used, even
for quite small documents! Here's a little test case, timing how long it
takes to serialise an example document:

# Example document taken from http://jsonml.org/
def testData := ["table", ["class" => "MyTable", "style" => "background-color:yellow"], ["tr", ["td", ["class" => "MyTD", "style" => "border:1px solid black"], "#550758"], ["td", ["class" => "MyTD", "style" => "background-color:red"], "Example text here"]], ["tr", ["td", ["class" => "MyTD", "style" => "border:1px solid black"], "#993101"], ["td", ["class" => "MyTD", "style" => "background-color:green"], "127624015"]], ["tr", ["td", ["class" => "MyTD", "style" => "border:1px solid black"], "#E33D87"], ["td", ["class" => "MyTD", "style" => "background-color:blue"], "\u00a0", ["span", ["style" => "background-color:maroon"], "\u00a9"], "\u00a0"]]]

def jsonSurgeon := <elib:serial.deJSONKit>.makeSurgeon()

def timeIt(op) {
op()
for x in 1..3 {
def start := timer.now()
for y in 1..5 {
op()
}
def finish := timer.now()
println(`Took ${finish-start} ms`)
}
}

println("Using jsonSurgeon...")
timeIt(fn { jsonSurgeon.serialize(testData) } )

println("Using normal printing...")
timeIt(fn { `$testData` } )

On my laptop, the results are:

Using jsonSurgeon...
Took 3533 ms
Took 2600 ms
Took 2323 ms
Using normal printing...
Took 6 ms
Took 5 ms
Took 6 ms

QuasiParser support would be very nice too (for ensuring correct
quoting, as you mentioned in a later email), and we'd certainly need
namespaces.

Thanks,
--
Dr Thomas Leonard
IT Innovation Centre
2 Venture Road
Southampton
Hampshire SO16 7NP

Tel: +44 0 23 8076 0834
Fax: +44 0 23 8076 0833
mailto:***@it-innovation.soton.ac.uk
http://www.it-innovation.soton.ac.uk
Kevin Reid
2010-01-14 12:40:50 UTC
Permalink
Post by Thomas Leonard
Post by Kevin Reid
* Would you be interested in working on the project of an XML library
for E?
* If I were to work on it, would you use it and give feedback?
We'd certainly be interested in testing any XML support. I don't think
I'd be able to help with the implementation though: a selling point for
E was compatibility with existing Java libraries, so telling my manager
I need to spend time implementing XML support is unlikely to fly ;-)
I'll look into it. Er, how soon do you need it? I'm rather busy, but I
do *want* an E project; I just need to know how to prioritize this.
Post by Thomas Leonard
My main concern here is about speed. I temporarily used the JSON support
in part of my code, and it noticeably slows the GUI down when used, even
for quite small documents! Here's a little test case, timing how long it
# Example document taken from http://jsonml.org/
def testData := ["table", ["class" => "MyTable", "style" =>
"background-color:yellow"], ["tr", ["td", ["class" => "MyTD",
"style" => "border:1px solid black"], "#550758"], ["td", ["class" =>
"MyTD", "style" => "background-color:red"], "Example text here"]],
["tr", ["td", ["class" => "MyTD", "style" => "border:1px solid
black"], "#993101"], ["td", ["class" => "MyTD", "style" =>
"background-color:green"], "127624015"]], ["tr", ["td", ["class" =>
"MyTD", "style" => "border:1px solid black"], "#E33D87"], ["td",
["class" => "MyTD", "style" => "background-color:blue"], "\u00a0",
["span", ["style" => "background-color:maroon"], "\u00a9"],
"\u00a0"]]]
def jsonSurgeon := <elib:serial.deJSONKit>.makeSurgeon()
Oh, deJSONKit I wouldn't recommend for current practical purposes: the
Data-E subsystem is relatively slow no matter what kit you use. (I
don't know how much of this is simply due to inefficient E
interpretation, vs. what serialization must do, or anything else.) Use
the JSON subset of term-trees instead. Just wrap your JSON literals in
term`...`, to start with.
--
Kevin Reid <http://switchb.org/kpreid/>
Thomas Leonard
2010-01-15 11:12:04 UTC
Permalink
Post by Kevin Reid
Post by Thomas Leonard
  * Would you be interested in working on the project of an XML
library
    for E?
  * If I were to work on it, would you use it and give feedback?
We'd certainly be interested in testing any XML support. I don't think
I'd be able to help with the implementation though: a selling point for
E was compatibility with existing Java libraries, so telling my manager
I need to spend time implementing XML support is unlikely to fly ;-)
I'll look into it. Er, how soon do you need it? I'm rather busy, but I
do *want* an E project; I just need to know how to prioritize this.
Over the next few months we'll be connecting it up to systems that use
XML, but if there's no native E support then we'll just import and use
the normal Java DOM libraries.
Post by Kevin Reid
Post by Thomas Leonard
My main concern here is about speed. [...]
Oh, deJSONKit I wouldn't recommend for current practical purposes: the
Data-E subsystem is relatively slow no matter what kit you use. (I
don't know how much of this is simply due to inefficient E
interpretation, vs. what serialization must do, or anything else.) Use
the JSON subset of term-trees instead. Just wrap your JSON literals in
term`...`, to start with.
OK, I can create JSON documents easily enough using

def Term := <type:org.quasiliteral.term.Term>
def serialised := (testData :Term).asText()

This is nice and fast, and I assume it could produce XML in a similar way.

I can turn the serialised string back into a term like this:

def TermParserMaker := <import:org.quasiliteral.term.makeTermParser>
TermParserMaker(serialised)

But how do I turn that term back into the E data-structure (i.e.
reverse the effect of :Term)?
--
Dr Thomas Leonard ROX desktop / Zero Install
GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA
Kevin Reid
2010-01-15 12:47:52 UTC
Permalink
Post by Thomas Leonard
OK, I can create JSON documents easily enough using
def Term := <type:org.quasiliteral.term.Term>
def serialised := (testData :Term).asText()
This is nice and fast, and I assume it could produce XML in a
similar way.
Huh, I didn't know that worked except for leaf types. (I find that the
relevant code is in
org.erights.e.meta.org.quasiliteral.astro.AstroGuardSugar.)
Post by Thomas Leonard
def TermParserMaker := <import:org.quasiliteral.term.makeTermParser>
TermParserMaker(serialised)
But how do I turn that term back into the E data-structure (i.e.
reverse the effect of :Term)?
I don't know of a facility to do this. (deJSONKit does so the long way
around.) MarkM?
--
Kevin Reid <http://switchb.org/kpreid/>
Mark Miller
2010-01-17 01:07:03 UTC
Permalink
Post by Kevin Reid
Post by Thomas Leonard
OK, I can create JSON documents easily enough using
def Term := <type:org.quasiliteral.term.Term>
def serialised := (testData :Term).asText()
This is nice and fast, and I assume it could produce XML in a similar way.
Huh, I didn't know that worked except for leaf types. (I find that the
relevant code is in
org.erights.e.meta.org.quasiliteral.astro.AstroGuardSugar.)
Yes. The first test case at src/jsrc/org/quasiliteral/term/Term.updoc is

? [3=>4, "a"=>'x', [2,3]=>[4,5]]:Term
# value: term`{3: 4,
# "a": 'x',
# [2, 3]:
# [4, 5]}`

which exercises some of the interesting cases.
Post by Kevin Reid
Post by Thomas Leonard
def TermParserMaker := <import:org.quasiliteral.term.makeTermParser>
TermParserMaker(serialised)
But how do I turn that term back into the E data-structure (i.e.
reverse the effect of :Term)?
I don't know of a facility to do this. (deJSONKit does so the long way
around.) MarkM?
For leaf data terms, their __conformTo already knows how to auto-coerce to
their primitive data value.

Kevin, regarding non-leaf terms, I recall you once wrote such a guard as an
experiment, but I don't think we ever added it to the E library. As I
recall, at the time you may have called it Termish. But that conflicts with
a different use of that name in the org.quasiliteral.term package. I may be
misremembering.
Post by Kevin Reid
--
Kevin Reid <http://switchb.org/kpreid/>
_______________________________________________
e-lang mailing list
http://www.eros-os.org/mailman/listinfo/e-lang
--
Text by me above is hereby placed in the public domain

Cheers,
--MarkM
Kevin Reid
2010-01-17 01:19:55 UTC
Permalink
Post by Mark Miller
Post by Kevin Reid
Post by Thomas Leonard
def TermParserMaker := <import:org.quasiliteral.term.makeTermParser>
TermParserMaker(serialised)
But how do I turn that term back into the E data-structure (i.e.
reverse the effect of :Term)?
I don't know of a facility to do this. (deJSONKit does so the long
way around.) MarkM?
For leaf data terms, their __conformTo already knows how to auto-
coerce to their primitive data value.
Kevin, regarding non-leaf terms, I recall you once wrote such a
guard as an experiment, but I don't think we ever added it to the E
library. As I recall, at the time you may have called it Termish.
But that conflicts with a different use of that name in the
org.quasiliteral.term package. I may be misremembering.
That does seem like something I might have written, but I don't recall
where I might have done so. I did write a guard called Termish, but it
was just an incidental example of using trinary-define; what it did
was match recursive trees of ["tagname", subterms...].
http://www.eros-os.org/pipermail/e-lang/2005-August/010942.html
--
Kevin Reid <http://switchb.org/kpreid/>
Mark Miller
2010-01-17 02:01:23 UTC
Permalink
Post by Kevin Reid
That does seem like something I might have written, but I don't recall
where I might have done so. I did write a guard called Termish, but it
was just an incidental example of using trinary-define; what it did
was match recursive trees of ["tagname", subterms...].
http://www.eros-os.org/pipermail/e-lang/2005-August/010942.html
Now that I see it, I'm rather sure that is what I was remembering. But it
inspired a good solution:

def AntiTerm {
to coerce(specimen, optEjector) {
if (specimen.getOptData() =~ data :notNull) { return data }
def term`@{tag :String}(@{args :List[AntiTerm]}*)` exit optEjector
:=
specimen
if (tag == ".tuple.") { return args }
if (tag == ".bag.") {
def result := [].asMap().diverge()
for arg in args {
if (arg =~ [`.attr.`, key, value]) {
result[key] := value
} else {
return [tag] + args
}
}
return result.snapshot()
}
return [tag] + args
}
}


? term`{[1,2]:[3,4], foo:oo(bar)}` :AntiTerm
# value: [[1, 2] => [3, 4], ["foo"] => ["oo", ["bar"]]]
--
Text by me above is hereby placed in the public domain

Cheers,
--MarkM
Thomas Leonard
2010-02-22 15:42:07 UTC
Permalink
Post by Kevin Reid
Post by Thomas Leonard
OK, I can create JSON documents easily enough using
def Term := <type:org.quasiliteral.term.Term>
def serialised := (testData :Term).asText()
This is nice and fast, and I assume it could produce XML in
a
Post by Thomas Leonard
similar way.
Huh, I didn't know that worked except for leaf types. (I find that the
relevant code is in
org.erights.e.meta.org.quasiliteral.astro.AstroGuardSugar.)
Yes. The first test case at src/jsrc/org/quasiliteral/term/Term.updoc is
? [3=>4, "a"=>'x', [2,3]=>[4,5]]:Term
# value: term`{3: 4,
# "a": 'x',
# [4, 5]}`
which exercises some of the interesting cases.
This fails though:

? def Term := <type:org.quasiliteral.term.Term>
? def jsonSurgeon := <elib:serial.deJSONKit>.makeSurgeon()

? def data := [["hello\nworld"]]
? def text := (data:Term).asText()
? def data2 := jsonSurgeon.unserialize(text)

? data == data2
# value: true

The pretty printing inserts a space into the string after the \n:

? println(text)
[["hello
world"]]
--
Dr Thomas Leonard
IT Innovation Centre
2 Venture Road
Southampton
Hampshire SO16 7NP

Tel: +44 0 23 8076 0834
Fax: +44 0 23 8076 0833
mailto:***@it-innovation.soton.ac.uk
http://www.it-innovation.soton.ac.uk
Kevin Reid
2010-02-22 16:32:58 UTC
Permalink
Post by Thomas Leonard
? println(text)
[["hello
world"]]
This is definitely a bug, in that Terms should have read/print
consistency. Condensed test cases:

? println(term`foo(["hello\nworld"])`.asText())
foo(["hello
world"])
? println(term`foo("hello\nworld")`.asText())
foo("hello
world")
? println(term`foo(bar("hello\nworld"))`.asText())
foo(bar("hello
world"))

The first and third are wrong, the second is acceptable.
--
Kevin Reid <http://switchb.org/kpreid/>
Thomas Leonard
2011-09-08 12:21:07 UTC
Permalink
Post by Kevin Reid
Post by Thomas Leonard
OK, I can create JSON documents easily enough using
def Term :=<type:org.quasiliteral.term.Term>
def serialised := (testData :Term).asText()
This is nice and fast, and I assume it could produce XML in
a
Post by Thomas Leonard
similar way.
Huh, I didn't know that worked except for leaf types. (I find that the
relevant code is in
org.erights.e.meta.org.quasiliteral.astro.AstroGuardSugar.)
Yes. The first test case at src/jsrc/org/quasiliteral/term/Term.updoc is
? [3=>4, "a"=>'x', [2,3]=>[4,5]]:Term
# value: term`{3: 4,
# "a": 'x',
# [4, 5]}`
which exercises some of the interesting cases.
? def Term :=<type:org.quasiliteral.term.Term>
? def jsonSurgeon :=<elib:serial.deJSONKit>.makeSurgeon()
? def data := [["hello\nworld"]]
? def text := (data:Term).asText()
? def data2 := jsonSurgeon.unserialize(text)
? data == data2
# value: true
? println(text)
[["hello
world"]]
On further investigation, it seems that newlines aren't allowed in JSON
strings anyway:

http://stackoverflow.com/questions/42068/how-do-i-handle-newlines-in-json

This patch turns newlines into "\n" sequences and doesn't quote "'"
(which JSON also doesn't allow):

http://gitorious.org/~tal-itinnov/repo-roscidus/it-innovation/commit/1b0a82891d305059c3732bc33b6702b41e944acf

I spotted these problems when trying to parse E's JSON output using
Python. The changes also fix <elib:serial.deJSONKit>.
--
Dr Thomas Leonard
IT Innovation Centre
Gamma House, Enterprise Road,
Southampton SO16 7NS, UK


tel: +44 23 8059 8866

mailto:***@it-innovation.soton.ac.uk
http://www.it-innovation.soton.ac.uk/
Kevin Reid
2011-09-08 13:33:31 UTC
Permalink
Post by Thomas Leonard
On further investigation, it seems that newlines aren't allowed in JSON
http://stackoverflow.com/questions/42068/how-do-i-handle-newlines-in-json
This patch turns newlines into "\n" sequences and doesn't quote "'"
http://gitorious.org/~tal-itinnov/repo-roscidus/it-innovation/commit/1b0a82891d305059c3732bc33b6702b41e944acf
I spotted these problems when trying to parse E's JSON output using
Python. The changes also fix <elib:serial.deJSONKit>.
Looks OK, but please audit other users of StringHelper to see if any need it the other way.

I would object to escaping newlines for the readability reduction, except that not doing so means that pretty-printing indentation changes the content, which is a bug.
--
Kevin Reid <http://switchb.org/kpreid/>
Thomas Leonard
2011-09-08 14:13:31 UTC
Permalink
Post by Kevin Reid
Post by Thomas Leonard
On further investigation, it seems that newlines aren't allowed in JSON
http://stackoverflow.com/questions/42068/how-do-i-handle-newlines-in-json
This patch turns newlines into "\n" sequences and doesn't quote "'"
http://gitorious.org/~tal-itinnov/repo-roscidus/it-innovation/commit/1b0a82891d305059c3732bc33b6702b41e944acf
I spotted these problems when trying to parse E's JSON output using
Python. The changes also fix<elib:serial.deJSONKit>.
Looks OK, but please audit other users of StringHelper to see if any need it the other way.
I would object to escaping newlines for the readability reduction, except that not doing so means that pretty-printing indentation changes the content, which is a bug.
Note that there are (now) two methods in StringHelper:

- quote is mostly as before, except it no longer escapes "'"
- the new quoteIncludingNewline method also quotes newlines

The change to "quote" isn't strictly necessary, but it seemed sensible
and better than trying to explain why the two methods handled "'"
differently.

This causes the following change:

? "'foo'\n'bar'"

Before:

# value: "\'foo\'
# \'bar\'"

After:

# value: "'foo'
# 'bar'"
--
Dr Thomas Leonard
IT Innovation Centre
Gamma House, Enterprise Road,
Southampton SO16 7NS, UK


tel: +44 23 8059 8866

mailto:***@it-innovation.soton.ac.uk
http://www.it-innovation.soton.ac.uk/
Kevin Reid
2010-01-17 01:11:01 UTC
Permalink
Thomas Leonard:

I'm sketching out an XML library. I've got it parsing document
fragments inside an XML quasiliteral (xml`<a>foo</a> <b/> c`). I'm
also experimenting with providing XPath as the primary means of
descending into trees.

I'm currently implementing it by wrapping DOM trees with an immutable
interface, as this seems like both the simplest path and one which
minimizes the amount of (currently slow) E code executed in the high-
repeat-count paths.

I don't yet have quasi value holes or pattern holes, so you can't use
`` syntax to construct or to pattern-match XML, and there also aren't
any methods to actually get text content out of the tree.


I chose not to go the TermL-XML-embedding path because I would have to
write much additional code to make it as *accurate* as I want this
library to be. However, in the future I imagine the internal
representation of this library being replaced with Term-trees and the
objects being wrappers around Terms instead of around DOM.


Before I proceed further, I think we should construct a list of design
goals, particularly your immediate requirements, so as to make sure
this library becomes useful.

Here's the list I sort of have in mind:


* A data type representing immutable (sub-)trees of XML documents. The
tree should preserve all information in the XML Infoset.

* XML fragments can be written as quasiliterals in the program.

* These fragments can have quasi-value-holes so as to compose XML
documents. That is:
def foo := xml`<a/>`
def bar := xml`<b>$foo</b>`
results in bar having the value
xml`<b><a/></b>`
.

* It is possible to construct a customized XML quasiparser with a
given set of namespace declarations.

* There are means to traverse and pattern-match XML trees. Currently I
have two plans in mind for this:
1. XPath expressions can be used as subscripts. Example:
? xml`<a>xyz</a><b>foo</b><a>bar</a>`[xpath`a/text()`]
# value: [xml`xyz`, xml`bar`]
(This is a working-right-now example.)

2. Pattern matching, as offered currently by term-trees:
def xml`<input type="text" name="@name" value="@value">` := elem

(However, pattern-matching style raises issues of adding syntax and
semantics for repetitions, as well as don't-care vs. strict matching
of additional attributes, elements, and text.)

These two styles can be usefully combined:
for xml`<html:input type="text" name="@name" value="@value">`
in form[xpath`//html:input`] {
map[name] := value
}

* An XML fragment consisting solely of text should coerce to a String
and vice versa.

* There are straightforward, text-encoding-correct ways to read and
write XML documents (that is, convert between XML trees and strings,
byte arrays, character streams, and binary streams).


All of what I've listed so far is either already implemented or
seeming reasonably straightforward (given that we're running in Java)
except for (a) implementing quasi-holes (without writing a whole new
augmented-XML parser) and (b) the pattern matching facility, which
would be a good bit of nontrivial from-scratch design and code.


So, tell me what *you* think you need.
--
Kevin Reid <http://switchb.org/kpreid/>
Thomas Leonard
2010-01-17 19:03:19 UTC
Permalink
Post by Kevin Reid
I'm sketching out an XML library. I've got it parsing document
fragments inside an XML quasiliteral (xml`<a>foo</a> <b/> c`). I'm
also experimenting with providing XPath as the primary means of
descending into trees.
I'm currently implementing it by wrapping DOM trees with an immutable
interface, as this seems like both the simplest path and one which
minimizes the amount of (currently slow) E code executed in the high-
repeat-count paths.
Makes sense. We'd probably want methods to get the underlying Document
(or a copy, more likely) so we can pass it to existing Java code and
wrap the results again.
[...]
Post by Kevin Reid
* A data type representing immutable (sub-)trees of XML documents. The
tree should preserve all information in the XML Infoset.
* XML fragments can be written as quasiliterals in the program.
* These fragments can have quasi-value-holes so as to compose XML
  def foo := xml`<a/>`
  def bar := xml`<b>$foo</b>`
results in bar having the value
  xml`<b><a/></b>`
.
Sounds good. How do namespaces combine? Sometimes you have a lot of
child elements using the same namespace and it's handy if it ends up
as a single namespace declaration on the root element. Are you
planning to preserve prefixes in any way (or just auto-number them as
xmlns:n0, xmlns:n1, etc)? In the past I've used a scheme where we keep
prefix mappings as a hint when parsing the XML and use them if
possible when serialising, but combine multiple prefixes into one
where possible or create new prefixes where there are conflicts so
that we end up with all the mappings defined on the root element.
Post by Kevin Reid
* It is possible to construct a customized XML quasiparser with a
given set of namespace declarations.
* There are means to traverse and pattern-match XML trees. Currently I
    ? xml`<a>xyz</a><b>foo</b><a>bar</a>`[xpath`a/text()`]
    # value: [xml`xyz`, xml`bar`]
  (This is a working-right-now example.)
Both this and XPath would be very useful. Would this work?

def xml`<@elementName @attrs*>@content</@elementName>` := ...

When matching on XML I mostly want to ignore unmatched attributes, but
Post by Kevin Reid
  (However, pattern-matching style raises issues of adding syntax and
  semantics for repetitions, as well as don't-care vs. strict matching
  of additional attributes, elements, and text.)
        in form[xpath`//html:input`] {
      map[name] := value
  }
* An XML fragment consisting solely of text should coerce to a String
and vice versa.
* There are straightforward, text-encoding-correct ways to read and
write XML documents (that is, convert between XML trees and strings,
byte arrays, character streams, and binary streams).
All of what I've listed so far is either already implemented or
seeming reasonably straightforward (given that we're running in Java)
except for (a) implementing quasi-holes (without writing a whole new
augmented-XML parser) and (b) the pattern matching facility, which
would be a good bit of nontrivial from-scratch design and code.
So, tell me what *you* think you need.
Sounds like just what we need. Thanks!
--
Dr Thomas Leonard ROX desktop / Zero Install
GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA
Kevin Reid
2010-01-17 20:44:30 UTC
Permalink
Post by Thomas Leonard
Post by Kevin Reid
I'm currently implementing it by wrapping DOM trees with an immutable
interface, as this seems like both the simplest path and one which
minimizes the amount of (currently slow) E code executed in the high-
repeat-count paths.
Makes sense. We'd probably want methods to get the underlying Document
(or a copy, more likely) so we can pass it to existing Java code and
wrap the results again.
Perhaps this should be part of a generic input-output scheme, like
javax.xml.transform uses. Or not.
Post by Thomas Leonard
Sounds good. How do namespaces combine? Sometimes you have a lot of
child elements using the same namespace and it's handy if it ends up
as a single namespace declaration on the root element. Are you
planning to preserve prefixes in any way (or just auto-number them as
xmlns:n0, xmlns:n1, etc)? In the past I've used a scheme where we keep
prefix mappings as a hint when parsing the XML and use them if
possible when serialising, but combine multiple prefixes into one
where possible or create new prefixes where there are conflicts so
that we end up with all the mappings defined on the root element.
In the long run, this should be an option; currently, it will be
"whatever Java does" if Java does in fact ensure namespace
consistency. In general, I will preserve prefixes, since XML Infoset
says they are significant. Note that e.g. XSLT documents make
references to prefixes inside of attribute values (XPath expressions)
which makes renaming hairy.
Post by Thomas Leonard
Post by Kevin Reid
* It is possible to construct a customized XML quasiparser with a
given set of namespace declarations.
* There are means to traverse and pattern-match XML trees.
Currently I
? xml`<a>xyz</a><b>foo</b><a>bar</a>`[xpath`a/text()`]
# value: [xml`xyz`, xml`bar`]
(This is a working-right-now example.)
Both this and XPath would be very useful. Would this work?
When matching on XML I mostly want to ignore unmatched attributes, but
This general sort of syntax is the sort of thing I want to support.
You'd have to write </> or </...>, not repeat @elementName; E doesn't
support that style of pattern matching.

The tricky part is supporting the quasi-holes while still using a
standard XML parser (for ease of implementation, efficiency, and well-
tested correctness). This means I substitute into each hole in the
template a unique value which I can find in the result tree afterward.
However, these values need to be valid XML syntax in the relevant
context.

The contexts I intend to support are:
* General XML content: <foo>@bar</foo>
* Element name: <@bar/>
* Attributes: <foo @bar>
* Attribute values: <foo baz="@bar">
* Processing instruction conten: <?@foo bla bla @bar?>

The basic strategy I plan to use for the values is to generate a
pseudorandom string, check that it isn't in the input, then substitute
it in and look for it in the output. The tricky part is constructing a
string which is valid in the context. I don't think it's possible to
generate from purely local context, because there is no string which
is well-formed both as an attribute and an attribute value.

Therefore, the hole support will need a custom lexer, which I will
write in Java for practical efficiency. ...On the other hand, maybe I
should just take an existing XML parser and extend it to support holes?

For now, XPath should be sufficient to extract data values from XML.
--
Kevin Reid <http://switchb.org/kpreid/>
Thomas Leonard
2010-02-01 16:23:41 UTC
Permalink
Post by Kevin Reid
Post by Thomas Leonard
Post by Kevin Reid
I'm currently implementing it by wrapping DOM trees with an immutable
interface, as this seems like both the simplest path and one which
minimizes the amount of (currently slow) E code executed in the high-
repeat-count paths.
[...]
Post by Kevin Reid
Post by Thomas Leonard
Sounds good. How do namespaces combine?
[...]
Post by Kevin Reid
In the long run, this should be an option; currently, it will be
"whatever Java does" if Java does in fact ensure namespace
consistency. In general, I will preserve prefixes, since XML Infoset
says they are significant. Note that e.g. XSLT documents make
references to prefixes inside of attribute values (XPath expressions)
which makes renaming hairy.
Good point.
Post by Kevin Reid
Post by Thomas Leonard
Post by Kevin Reid
* It is possible to construct a customized XML quasiparser with a
given set of namespace declarations.
* There are means to traverse and pattern-match XML trees.
Currently I
? xml`<a>xyz</a><b>foo</b><a>bar</a>`[xpath`a/text()`]
# value: [xml`xyz`, xml`bar`]
(This is a working-right-now example.)
[...]
Can we get hold of this code from somewhere to test it?

Thanks,
--
Dr Thomas Leonard
IT Innovation Centre
2 Venture Road
Southampton
Hampshire SO16 7NP

Tel: +44 0 23 8076 0834
Fax: +44 0 23 8076 0833
mailto:***@it-innovation.soton.ac.uk
http://www.it-innovation.soton.ac.uk
Kevin Reid
2010-02-01 16:49:41 UTC
Permalink
Post by Thomas Leonard
Post by Kevin Reid
* It is possible to construct a customized XML quasiparser with a
given set of namespace declarations.
* There are means to traverse and pattern-match XML trees.
Currently I
? xml`<a>xyz</a><b>foo</b><a>bar</a>`[xpath`a/text()`]
# value: [xml`xyz`, xml`bar`]
(This is a working-right-now example.)
[...]
Can we get hold of this code from somewhere to test it?
It exists in a not-yet-published Git repository.

I would have gotten it cleaned up and available, but schoolwork and a
cold have wiped out my free time/enthusiasm for the past week. Sorry!
--
Kevin Reid <http://switchb.org/kpreid/>
Thomas Leonard
2010-02-12 15:55:04 UTC
Permalink
Post by Kevin Reid
Post by Thomas Leonard
Post by Kevin Reid
* It is possible to construct a customized XML quasiparser with a
given set of namespace declarations.
* There are means to traverse and pattern-match XML trees. Currently I
? xml`<a>xyz</a><b>foo</b><a>bar</a>`[xpath`a/text()`]
# value: [xml`xyz`, xml`bar`]
(This is a working-right-now example.)
[...]
Can we get hold of this code from somewhere to test it?
It exists in a not-yet-published Git repository.
I would have gotten it cleaned up and available, but schoolwork and a
cold have wiped out my free time/enthusiasm for the past week. Sorry!
No problem; I've make a temporary (string-based) one which we're using
while the proper version is being developed:

http://barooga.it-innovation.soton.ac.uk/cgi-bin/gitweb.cgi?p=labs/e-prototype;a=blob;f=src/main/e/gria/tools/xml.emaker;h=41b1281450570c2deb7c18b0e259fdfcc7aca2a9;hb=e9105f9e46db8680a3843cbafe9dae73e0d6cdec
--
Dr Thomas Leonard
IT Innovation Centre
2 Venture Road
Southampton
Hampshire SO16 7NP

Tel: +44 0 23 8076 0834
Fax: +44 0 23 8076 0833
mailto:***@it-innovation.soton.ac.uk
http://www.it-innovation.soton.ac.uk
Kevin Reid
2010-02-15 00:38:18 UTC
Permalink
Post by Thomas Leonard
Post by Kevin Reid
It exists in a not-yet-published Git repository.
I would have gotten it cleaned up and available, but schoolwork and a
cold have wiped out my free time/enthusiasm for the past week. Sorry!
No problem; I've make a temporary (string-based) one which we're using
http://barooga.it-innovation.soton.ac.uk/cgi-bin/gitweb.cgi?p=labs/e-prototype;a=blob;f=src/main/e/gria/tools/xml.emaker;h=41b1281450570c2deb7c18b0e259fdfcc7aca2a9;hb=e9105f9e46db8680a3843cbafe9dae73e0d6cdec
The not-yet-published repository is now published, at <git://switchb.org/e-xml
Post by Thomas Leonard
.
(I haven't set up gitweb or other browsing due to lack of time to
research and do so. In fact, I didn't really have the time to do what
I've just done to get this code working such that it does a reasonable
amount, but ...)

The code is *VERY* rough, badly-factored, and it probably isn't
actually good for much yet. Tell me what you need immediately, and/or
provide patches!

(I'm guessing the answer will be "value-holes in XML literals".)

Here's how I run the test cases:

$ rlwrap rune -cpa classpath
? rune(["/path/to/updoc.e", "test"])

(By the way, your code will fail to generate proper results if the XML
literal contains a "$" since it doesn't unescape them in substitute/1.)
--
Kevin Reid <http://switchb.org/kpreid/>
Kevin Reid
2010-01-17 01:22:55 UTC
Permalink
Post by Thomas Leonard
def TermParserMaker := <import:org.quasiliteral.term.makeTermParser>
TermParserMaker(serialised)
Oh, by the way, the ...Maker is a deprecated naming convention. This
variable should be called "makeTermParser" (just as the adjusted class
name FQN is).
--
Kevin Reid <http://switchb.org/kpreid/>
Continue reading on narkive:
Loading...