[e-lang] Semicolon wrangling (was Re: [friam] Minutes)

Discussion:

Kevin Reid

2012-10-20 00:11:57 UTC

Today's Friam

Thank you for writing this down! I'll try to fill in my own notes on the topic, using yours as a skeleton.

One question is whether semicolon is separator, or terminator.
I think that everyone agrees that commas are separators but Kevin noted that some languages allow terminal commas.
Perhaps sqrt(x,) means the same as sqrt(x).
(I don’t like that.)

For what it's worth, as a general rule in such languages, this is only permitted, or at least only used in practice, in uniform/variable-length things such as collection literals, not function arguments.

There was further discussion, mostly heat, on automatic addition of semicolons at line breaks.
(I like the Algol 68 stance that semicolon is a binary operator that finishes evaluating its left operand before it begins to evaluate it right operand.
The value is that of the 2nd operand.
This is an associative operator and establishes semicolon as a separator.
I grant the problem of accidental return values which must be addressed.
To me omitting semis is like omitting commas, or even plus signs.)

This is a fair summary of the issue. I raised today's syntactic topics mostly in search of a nice answer (nicer than E's current ones) to the accidental return value problem, which is critical when your values are authority-bearing.

An elegant yet horrible approach to keep the semicolon operator is that

foo(); bar(); baz()

yields the value of baz whereas

foo(); bar(); baz();

does not.

There was an idea for some syntactic construct consisting of a sequence of expressions to be evaluated in order, and whose value is the value of one of the expressions preceded by a caret.

That is,

foo(); bar(); ^baz()

yields baz(); or as generalized by Alan Karp,

foo(); ^bar(); baz()

yields the value of bar() but also still evaluates baz() after bar(). Alan thought this would be useful, and I slightly agree; a simple use case is where you wish to initialize some object in an imperative way. In current E syntax plus this proposal,

{
^def foo := makeFoo()
foo.setBar(baz)
}

would return foo.

A straightforward and easily-readable, yet unpleasantly verbose, syntax which I think would completely eliminate the accidental return value problem is (brackets are a placeholder):

[foo(); bar()] baz() # returns baz()
[foo(); bar(); baz()] # does not

That is, sequencing can ONLY be performed within the special sequence-brackets, and values do not escape from sequence-brackets. The problem I see with this is that then the typical imperative method/function would have two sets of brackets: the outer { scope brackets } and the inner [ sequence brackets ].

The concept which I am most fond of at the moment (which isn't to say that it's actually the best option) starts from the premise that: if we can write sequences with only newline as separator (as we can in E), and we can write an object's methods with only newline as separator (as we can in E), why can't we write the elements of a list with newline as separator (and remove the hazard of forgetting the comma and getting a sequencing instead)?

In the Lisp family, sequencing is done by an operator (begin in Scheme, progn in Common Lisp) which might as well be simply a function which returns its last argument. Therefore, let that be how it is done. For example purposes, let's spell it 'do'; then

[
foo()
do(
bar()
baz()
)
]

is a list whose elements are the value of foo() and the value of baz(). This essentially makes "," and ";" the same thing. The disadvantage of this is that we have an extra symbol "do"; but we could fix that by saying that plain parentheses do the same:

[
foo()
(
bar()
baz()
)
]

in which case we have actually come full circle to the current E syntax, except without the comma, and essentially reintroduced the C comma operator. This last step is arguably therefore a bad idea, in that the comma operator allows you to discard values in ways which *locally* look just like contexts that don't.

Kevin espoused defining a language semantics leaving precise syntax flexible.

Rather, that the language's AST should do a sufficiently good job of preserving formatting and comments that a programmer would not object to making use of source text which is the output of a program-transformer (such as a refactoring tool) written in terms of the AST.

A consequence, but not the primary goal, of this is that it is possible to have multiple surface syntaxes; the primary goal is to enable refactoring tools, as well as surface syntax *upgraders* (that is, if we go mad and decide that "else if" should be written "elif", we can trivially write a tool which does the conversion in a sound fashion).

--
Kevin Reid <http://switchb.org/kpreid/>

William ML Leslie

2012-10-20 02:10:08 UTC

Permalink

Post by Kevin Reid

One question is whether semicolon is separator, or terminator.
I think that everyone agrees that commas are separators but Kevin noted
that some languages allow terminal commas.
Perhaps sqrt(x,) means the same as sqrt(x).
(I don’t like that.)

For what it's worth, as a general rule in such languages, this is only
permitted, or at least only used in practice, in uniform/variable-length
things such as collection literals, not function arguments.

Some languages do support this in function arguments, I don't think it
is such a bad thing. It means one less thing to have to worry about
when generating code from a language that doesn't have a sensible
reduce/join, or generating code from within emacs (it would be
/really/ useful in SQL).

If the meaning of the comma there can be confused with some other
usage of commas within the language, there's probably a deeper issue.

Post by Kevin Reid

I grant the problem of accidental return values which must be addressed.
To me omitting semis is like omitting commas, or even plus signs.)

This point seems the most interesting to me. If the target language
doesn't have implicit return (from functions/methods) the issue is
somewhat mitigated because you can see immediately if the value is
being discarded or used somehow.

Something I find unclear in the remaining discussion is the precedence
of the operations mentioned. I took it that a newline was supposed to
be just like a semicolon, but then I wonder if

^def foo := makeFoo(); foo.setBar(baz)

binds foo to the result of makeFoo() or foo.setBar(baz).

I mean that, if this last-value feature of semicolon has any use, it
would be to allow the user to eventually bind the result to a name,
but this appears to be confused in the example.

I suppose you could make brackets or braces mandatory - it just looks
like binding has lower precedence to me.

--
William Leslie

Lex Spoon

2012-10-21 05:14:03 UTC

Permalink

On Fri, Oct 19, 2012 at 10:10 PM, William ML Leslie

Post by William ML Leslie
Some languages do support this in function arguments, I don't think it
is such a bad thing. It means one less thing to have to worry about
when generating code from a language that doesn't have a sensible
reduce/join, or generating code from within emacs (it would be
/really/ useful in SQL).

Python is one such. In Python, for most any form of sequence in the
syntax, you can optionally add an extra separator at the end of the
list. This rule includes arguments to function calls; f(x,y,z,) in is
the same as f(x,y,z).

The trailing commas help with code generation, and also for humans
editing code. For example, consider the following code in Python or
JavaScript:

names = [
"Fred",
"Wilma",
"Bam Bam",
]

The trailing comma makes the code more regular, which makes it easier
to edit. For example, you can swap any two lines in the list without
having to fix up the commas afterwards.

Once you allow a trailing comma for lists, it is hard to resist
allowing it for function calls (Python only):

set_names(
"Fred",
"Wilma",
"Bam Bam",
)

It gets worse if you have nesting. Consider the following example of
JSON, which does not permit trailing commas:

{
"name": "Fred",
"child0": {
"name": "Fred, Jr."
},
"child1": {
"name": "Mary"
}
}

It would be easier to edit code like the above if you were allowed to
put a comma at the end of every line. Or better yet, none of them.

Lex

Jonathan S. Shapiro

2012-10-21 07:56:51 UTC

Permalink

In most languages, support for "trailing separators" (that is: terminators)
tends to be very *ad hoc*. For example, a trailing ',' eventually came to
be permitted in C/C++ enumerations, but not in arguments.