On the syntax of closures in PHP, part 2

My last post on the length of closures in PHP provoked a constructive debate on reddit. Several opinions were expressed, positive and negative. I want to stay on just one comment:

So, why not take the next step and prepare an actual (and completer) RFC that the core group might even consider for inclusion?

Challenge accepted!

Well, this is easier said than done. Making a proposal at this level, requires that you have at least implemented something, tested it and seen that it works and does not break anything else. So, I downloaded a snapshot of PHP’s source code, set up my environment and started experimenting.

Before going further on, I have to admit that this was my first exposure on PHP’s source code and that I never had any professional or academic experience with real world compilers. So, I had to tame the bison (that’s name of PHP’s compiler generator engine). But, anyway, with the help of stackoverflow and other on-line material, I managed to implement a draft that works.

Defining the goal

In the previous article, I have analyzed why a shorter syntax for a specific subset of anonymous functions is necessary. Today, I am going to implement such a syntax without questioning at all about its necessity.

I am particularly interested in those functions that:

  1. Given an input, return an output and do nothing else in between (I am not interested in event handlers or callbacks in general).
  2. Usually have only one statement (which is a return).
  3. Usually take only one argument.

Here are some examples of such functions:

// A mapper:
function($x){ return $x->Name; }       

// A comparer:
function($x,$y){ return $x-$y; }

// A predicate:
function($x)use($y){ return $x==$y; }

Keep in mind that, in a more general form, the current syntax also allows the arguments and the lexical variables to have type hints (simple or qualified) or to be declared to be passed by reference, like these:

// A mapper with type hints
function(MyClass $x){ return $x->Name; }

// A mapper with qualified type hints
function(\MyNamespace\MyClass $x){ return $x->Name; }

// A mapper with qualified type hints and references
function & (\MyNamespace\MyClass &$x){ return $x->Name; }

Some constraints

It is certain, that we will have to resort to the introduction of some kind of new symbol. This is not easy as PHP already uses all single letter symbols that appear in a standard keyboard. The last one available was “\” which is now used as a namespace separator. That’s a shame given that symbols like “#” or “@” were, in my opinion, wasted. Therefore, any new symbol is bound to have at least 2 characters.

A second constraint here is that the new syntax has to begin with something distinctive, so that the compiler understands that we are entering in a lambda parsing mode. Probably, this is a limitation of my poor compiler knowledge, apologies for that. As a result, the ultra short C# syntax “x=>x.Name” is not possible because the distinctive symbol is not at the beginning. Anyway, since compile-time and run-time are not exactly independent in PHP installations without byte code caching, this limitation turns out to be useful because it could make the compile time shorter by a tiny fraction.

Finally, what I am trying to do is not to change the behavior of the language, but to give an alternative syntax. So, any new syntax will have to behave exactly as the old syntax did. In other words, the lexical variables (the “use” clause) have to stay.

Let’s see what can we do

If we take a look at the bare characters of the expression, we can separate the useful information from the noise of the syntax.

function($x)use($y){ return $x==$y; }
.........$x.....$y..........$x==$y...
<---A--->  <-B->  <----C--->      <D>

From the above anatomy, it is clear that if we replace the noise with something shorter we will have our problem solved.

First of all, the last segment (D) is totally useless here, since we always have just one statement. We can get rid of it and focus on the other three segments. A first approach could be this:

Attempt 1:    λ( $x | $y ). $x==$y

Cool, but not highly practical unless you are from Greece. However, it shows that A and C can group together the arguments and point towards the result, while B is just a separator. Let’s select symbols for A (the lambda) and C (the dot) first:

Attempt 2:    \\ $x | $y ~> $x==$y

Not good. The first two slashes could be followed by a qualified type hint resulting in three consecutive slashes. Also the curvy arrow does not look strong enough.

Attempt 3:    ?? $x | $y ?= $x==$y

Not bad, I like the ?= symbol as it could be translated as “given something return something”. However, the symbol for lambda is not good because it may follow the ? of a ternary operator resulting in three consecutive question marks.

Attempt 4:    :{ $x | $y }= $x==$y

I like the emphasis in grouping but one would expect to see statements and not arguments between curly brackets.

Attempt 5:    :[ $x | $y ]= $x==$y

This is better as square brackets are used for mapping as usual. However the “]=” conflicts with array assignment, let alone that the lambda looks like a sad face :-( !

Attempt 6:    =| $x | $y |= $x==$y

Still a conflict (with T_OR_ASSIGN), but the face is not so sad any more :-) and it looks similar to Ruby’s lambdas. I don’t like the fact that we have three pipes. Let’s improve it a little bit:

Attempt 7:    =| $x : $y |=> $x==$y

That’s good. It looks like a long arrow! However, we have another conflict this time. Given the sequence ==|, the tokenizer cannot tell whether it stands for T_EQUALS+T_BITWISE_OR or for T_ASSIGN+T_LAMBDA.

Attempt 8:    ~| $x : $y |=> $x==$y

We have a winner! There are no conflicts, the token for the lambda looks like an actual lambda (slightly rotated clockwise), the arguments are nicely grouped and the arrow is strong and very PHP-like.

Code golf

By removing all white space, the new syntax has less than half the characters of the initial one. I would also dare to say that it is even more readable.

function($x)use($y){return$x==$y;}
<------------- 34 chars --------->

~|$x:$y|=>$x==$y
<-- 16 chars -->

Edge cases

As we have not changed the actual content, our syntax can be used exactly as the initial one, even in more complex cases, like these:

~| &$x |=> ++$x
~| |=> 'A constant'
~| \MyNamespace\MyClass $x |=> $x->Name
~| \MyNamespace\MyClass $x : &$y |=> $x->Name . $y
etc.

However, we did not cover the return by reference case. This could be done with the introduction of yet another operator like this:

// Lambda with return by reference
~&| $x |=> $x->Name

Finally, there are cases where the single statement rule is not convenient. So maybe we should also allow the return expression to be replaced by the standard statement block:

// Lambda with statement block
~| $x |=>{ $t = $x->GetTag(); return is_null($t) ? '' : $t->Name; }

Conclusion

The final syntax is flexible enough to cover all cases. This does not mean that it should replace the longer form altogether. It has the advantage of being shorter, but it does so by sacrificing some clarity. Choosing between the short or the long form is left to the programmer’s distinction.

I am sure that some people will like the syntax I propose, while some others (probably the majority) will not. It is a matter of taste, anyway. Given the constraints I have described, there could be better solutions. I am looking forward to hearing some of them.

As for me, now that I have something that compiles and passes the tests, I feel more comfortable to fill in a proposal to be considered by the core group. Wish me luck.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s