Late static binding in C#

The last couple of years I work in PHP, without that being my choice, and naturally I curse all the time about the lack of all those little things that are taken for granted in C#. Today, however, I am going to talk about  Late Static Binding, a feature of PHP 5.3 that does not exist in C#.

Late Static Binding, to be put simply, is the ability to override static functions. It is the equivalent of late dynamic binding, ie. the ability to override methods, which is a key feature of object oriented programming.

Let’s see an example.

// PHP Code:
class A {
  public static function GetName(){
    return 'A';
  }
  public static function SayHello(){
    return 'Hello from '.static::GetName();
  }
}

class B extends A {
  public static function GetName(){
    return 'B';
  }
}

A::SayHello();  // returns 'Hello from A'
B::SayHello();  // returns 'Hello from B'

This seems to be very natural. Had I used dynamic methods, that would have been straight forward, because dynamic methods are always late-bound. However, I want to use static functions. This is something that is not supported by many languages. In PHP, this behavior is possible with the use of the static:: scope, which is late-bound.

On the other hand, this is not possible in C#. Let’s translate the code, word by word:

// C# Code:
public class A {
  public static string GetName(){
    return "A";
  }
  public static string SayHello(){
    return "Hello from " + GetName();  // here is the problem
  }
}

public class B : A {
  public static string GetName(){
    return "B";
  }
}

A.SayHello();    // returns "Hello from A"
B.SayHello();    // returns "Hello from A"

Oops! The static function GetName() is early bound, which means that it is always resolved as A.GetName(), even if it is called from the static context of the class B.

A pattern to the rescue

The problem is simple. The base class does not know which is the current static context and it always assumes it to be A.

This can be solved with Generics. The current static context can be passed as a generic parameter to the base class, so the class A can become A<T>. The generic parameter T is the current static context. So the class B has to extend A<B> because we want all the the static calls to be bound to B.

However, adding a generic parameter to a class changes the class signature. In other words, A is not the same as A<T>. Yet, in C#, it is possible to have both! So, the class A has to extend A<A>, because we want all the static calls to be bound to A! Remember, that this is possible because A is not the same as A<T> and nothing forbids us from setting T to A.

Furthermore, the parameter T has the type A<T>, which means that it follows the interface of A<T>. That’s exactly what we want to achieve. All static function inside A<T> can be called from the static context of T. So, instead of calling GetName() which is early bound, we can call T.GetName().

All these may sound like a vicious circle, but the following code actually compiles and gives the solution we want.

 // C# Code:
public class A<T> where T : A<T> { // vicious but compiles
  public static string GetName(){
    throw new Exception("Not implemented");
  }
  public static string SayHello(){
    return "Hello from " + T.GetName();
  }
}

public class A : A<A> {  // cool, right?
  public static string GetName(){
    return "A";
  }
}

public class B : A<B> {
  public static string GetName(){
    return "B";
  }
}

A.SayHello();   // returns "Hello from A"
B.SayHello();   // returns "Hello from B"

So, the trick here was to separate the class into two parts. The first part will have a generic parameter which represents the current static context and will contain all the static functions. The other part will be the ordinary class which will continue to work as if nothing has changed.

Stay tuned, in the next post I am going to analyze a trickier example with more levels of inheritance.

Short closures for PHP: An implementation

In this article, I present an implementation of an alternative short syntax for closures in PHP. I believe it to be an elegant solution and so I have begun writing a proposal for inclusion in the next version of PHP.

What it does

My implementation provides an alternative short syntax for anonymous functions and closures, focusing on those that have a single return statement.

// Current syntax:
$f = function( $x ){ return $x->Name; };
$g = function( $x )use( $y ){ return $x==$y; };

// Proposed alternative syntax:
$f = | $x |=> $x->Name;
$g = | $x : $y |=> $x==$y;

All in all, I have tried to eliminate the syntax noise by reducing the key strokes in the the non-significant parts of the expression:

function(      9 chars --->      |      1 char
)use(          5 chars --->      :      1 char
){return       8 chars --->      |=>    3 chars
;}             2 chars --->             0 chars
              --------                  -------
              24 chars                  5 chars

Advantages (besides less key strokes)

  1. The code is more readable (more on this below).
  2. The syntax is backwards compatible. The short lambda and the longer equivalent are totally interchangeable.
  3. Variable scoping mechanics have not changes at all. There is nothing new here but syntactic sugar.
  4. Lambda expressions are searchable. The |=> operator is unique.
  5. The |=> operator is similar to =>. That’s good because they both relate to mapping. In addition, the : operator is similar to :: and that’s also good because they both relate to scoping.
  6. Although it is not the primary use case, the proposed implementation supports type hinting, passing by reference and return by reference just like the longer alternative. There is also an option to fall back to a statement block if the single return statement is not enough.

Some disadvantages

  1. The short syntax is clear and readable, but can become confusing when used in the wrong places. Yet, isn’t this the case with every feature?
  2. A short lambda without arguments ( ||=>… ) conflicts with the logical or operator ( || ) resulting in a syntax error. A way to avoid this is to insert whitespace between the two pipes, but this breaks the invariant that whitespace is not important in PHP. This is a minor problem and there are ways to deal with it in the compiler level.

Readability

I want to stay on the readability a little bit more, because that was one of the reasons that a similar proposal for a short array syntax has been rejected.

I support that the syntax I propose is not just some Perl-like hocus-pocus to avoid writing some extra bytes of code, but it is actually more readable than the old one. This is because it focuses on what to do and not on how to do it.

Let’s try to translate the above examples in plain English.

function( $x ){ return $x->Name; }

A function that takes an argument $x and returns its name

| $x |=> $x->Name

Given an $x, get its name

How is this different? First of all, because I don’t care whether this is a function or not. All I want is the name of a given $x. Of course, in order to get the name, a callback function is going to be called for the argument $x, but this is just a technical detail that is not the most important thing in my clause.

The same goes for the second example:

function( $x )use( $y ){ return $x==$y; }

A function that takes an argument $x, captures the variable $y and returns weather these two are equal.

| $x : $y |=> $x==$y

Given an $x for the current $y, see weather these two are equal.

A more complicated example

$add = function( $x ){
  return function( $y )use( $x ){
    return $x+$y;
  };
};

A function that takes an argument $x that returns another function that takes an argument $y, captures the first argument $x and returns their sum.

Now, that’s sounds complicated, doesn’t it? But it is not. It is a typical application of currying that you can find in the first pages of any functional programming tutorial. Yet, there is so much syntax noise here that the pattern is hardly visible.

With the new short syntax, the above example could be rewritten like that:

$add = |$x|=> |$y:$x|=> $x+$y;

Given an $x, wait for a $y for this $x, to get their sum.

Anyone with a little exposure to functional languages can immediately recognize the pattern.

Some real world applications

In a recent article, I had presented three cases that could be benefited from a shorter closure syntax. The current syntax is like this:

// Case 1: Filtering (with a LINQ-style Iterator)
$result = $a->Where(
  function($x)use($name){
    return $x->Name == $name;
  });

// Case 2: Mapping (with a LINQ-style Iterator)
$result = $a->Select(
  function($x){
    return $x->Name;
  })
  ->Join(', '); 

// Case 3: Null avoiding (with a a Maybe Monad)
$result = $x->Select( function($x){ return $x->Name; } );

With the new syntax, we can have this code:

// Case 1: Filtering
$result = $a->Where( |$x:$name|=> $x->Name==$name ); 

// Case 2: Mapping
$result = $a->Select( |$x|=>$x->Name )->Join(', ');

// Case 3: Null avoiding
$result = $x->Select( |$x|=>$x->Name );

Conclusion

I believe that PHP’s readability will be improved with the proposed syntax and this is why I will try to convince the developers of PHP to include it in the next version.

On the syntax of closures in PHP, part 2

My last post on the length of closures in PHP provoked a constructive debate on reddit. Several opinions were expressed, positive and negative. I want to stay on just one comment:

So, why not take the next step and prepare an actual (and completer) RFC that the core group might even consider for inclusion?

Challenge accepted!

Well, this is easier said than done. Making a proposal at this level, requires that you have at least implemented something, tested it and seen that it works and does not break anything else. So, I downloaded a snapshot of PHP’s source code, set up my environment and started experimenting.

Before going further on, I have to admit that this was my first exposure on PHP’s source code and that I never had any professional or academic experience with real world compilers. So, I had to tame the bison (that’s name of PHP’s compiler generator engine). But, anyway, with the help of stackoverflow and other on-line material, I managed to implement a draft that works.

Defining the goal

In the previous article, I have analyzed why a shorter syntax for a specific subset of anonymous functions is necessary. Today, I am going to implement such a syntax without questioning at all about its necessity.

I am particularly interested in those functions that:

  1. Given an input, return an output and do nothing else in between (I am not interested in event handlers or callbacks in general).
  2. Usually have only one statement (which is a return).
  3. Usually take only one argument.

Here are some examples of such functions:

// A mapper:
function($x){ return $x->Name; }       

// A comparer:
function($x,$y){ return $x-$y; }

// A predicate:
function($x)use($y){ return $x==$y; }

Keep in mind that, in a more general form, the current syntax also allows the arguments and the lexical variables to have type hints (simple or qualified) or to be declared to be passed by reference, like these:

// A mapper with type hints
function(MyClass $x){ return $x->Name; }

// A mapper with qualified type hints
function(\MyNamespace\MyClass $x){ return $x->Name; }

// A mapper with qualified type hints and references
function & (\MyNamespace\MyClass &$x){ return $x->Name; }

Some constraints

It is certain, that we will have to resort to the introduction of some kind of new symbol. This is not easy as PHP already uses all single letter symbols that appear in a standard keyboard. The last one available was “\” which is now used as a namespace separator. That’s a shame given that symbols like “#” or “@” were, in my opinion, wasted. Therefore, any new symbol is bound to have at least 2 characters.

A second constraint here is that the new syntax has to begin with something distinctive, so that the compiler understands that we are entering in a lambda parsing mode. Probably, this is a limitation of my poor compiler knowledge, apologies for that. As a result, the ultra short C# syntax “x=>x.Name” is not possible because the distinctive symbol is not at the beginning. Anyway, since compile-time and run-time are not exactly independent in PHP installations without byte code caching, this limitation turns out to be useful because it could make the compile time shorter by a tiny fraction.

Finally, what I am trying to do is not to change the behavior of the language, but to give an alternative syntax. So, any new syntax will have to behave exactly as the old syntax did. In other words, the lexical variables (the “use” clause) have to stay.

Let’s see what can we do

If we take a look at the bare characters of the expression, we can separate the useful information from the noise of the syntax.

function($x)use($y){ return $x==$y; }
.........$x.....$y..........$x==$y...
<---A--->  <-B->  <----C--->      <D>

From the above anatomy, it is clear that if we replace the noise with something shorter we will have our problem solved.

First of all, the last segment (D) is totally useless here, since we always have just one statement. We can get rid of it and focus on the other three segments. A first approach could be this:

Attempt 1:    λ( $x | $y ). $x==$y

Cool, but not highly practical unless you are from Greece. However, it shows that A and C can group together the arguments and point towards the result, while B is just a separator. Let’s select symbols for A (the lambda) and C (the dot) first:

Attempt 2:    \\ $x | $y ~> $x==$y

Not good. The first two slashes could be followed by a qualified type hint resulting in three consecutive slashes. Also the curvy arrow does not look strong enough.

Attempt 3:    ?? $x | $y ?= $x==$y

Not bad, I like the ?= symbol as it could be translated as “given something return something”. However, the symbol for lambda is not good because it may follow the ? of a ternary operator resulting in three consecutive question marks.

Attempt 4:    :{ $x | $y }= $x==$y

I like the emphasis in grouping but one would expect to see statements and not arguments between curly brackets.

Attempt 5:    :[ $x | $y ]= $x==$y

This is better as square brackets are used for mapping as usual. However the “]=” conflicts with array assignment, let alone that the lambda looks like a sad face :-( !

Attempt 6:    =| $x | $y |= $x==$y

Still a conflict (with T_OR_ASSIGN), but the face is not so sad any more :-) and it looks similar to Ruby’s lambdas. I don’t like the fact that we have three pipes. Let’s improve it a little bit:

Attempt 7:    =| $x : $y |=> $x==$y

That’s good. It looks like a long arrow! However, we have another conflict this time. Given the sequence ==|, the tokenizer cannot tell whether it stands for T_EQUALS+T_BITWISE_OR or for T_ASSIGN+T_LAMBDA.

Attempt 8:    ~| $x : $y |=> $x==$y

We have a winner! There are no conflicts, the token for the lambda looks like an actual lambda (slightly rotated clockwise), the arguments are nicely grouped and the arrow is strong and very PHP-like.

Code golf

By removing all white space, the new syntax has less than half the characters of the initial one. I would also dare to say that it is even more readable.

function($x)use($y){return$x==$y;}
<------------- 34 chars --------->

~|$x:$y|=>$x==$y
<-- 16 chars -->

Edge cases

As we have not changed the actual content, our syntax can be used exactly as the initial one, even in more complex cases, like these:

~| &$x |=> ++$x
~| |=> 'A constant'
~| \MyNamespace\MyClass $x |=> $x->Name
~| \MyNamespace\MyClass $x : &$y |=> $x->Name . $y
etc.

However, we did not cover the return by reference case. This could be done with the introduction of yet another operator like this:

// Lambda with return by reference
~&| $x |=> $x->Name

Finally, there are cases where the single statement rule is not convenient. So maybe we should also allow the return expression to be replaced by the standard statement block:

// Lambda with statement block
~| $x |=>{ $t = $x->GetTag(); return is_null($t) ? '' : $t->Name; }

Conclusion

The final syntax is flexible enough to cover all cases. This does not mean that it should replace the longer form altogether. It has the advantage of being shorter, but it does so by sacrificing some clarity. Choosing between the short or the long form is left to the programmer’s distinction.

I am sure that some people will like the syntax I propose, while some others (probably the majority) will not. It is a matter of taste, anyway. Given the constraints I have described, there could be better solutions. I am looking forward to hearing some of them.

As for me, now that I have something that compiles and passes the tests, I feel more comfortable to fill in a proposal to be considered by the core group. Wish me luck.

On the syntax of closures in PHP

Let me be straight: I find the syntax of anonymous functions and closures in PHP annoying and disturbing. It may be clear and it may be easy to apprehend but it is way too long. Is this a real problem? In my opinion, it is.

Closures are a great weapon for every programmer. They offer a way to encapsulate not only data and states but also common tasks that appear in almost every program, such as filtering, mapping and null checking which I analyze below.

Yet, the syntax of closures is so long, that their application in such tasks make the code longer and less elegant.

In the following examples I am going to use some structures that I won’t define because their implementation is out of the scope of this article. I want to focus on the application of the structures, supposing that somebody has already done the hard work behind the scenes.

Case 1: Filtering

Suppose that you have a list of objects out of which you want to pick those that meet some criteria. Let’s say that your list is $a and you want to pick only the objects whose Name property is equal to $name.

Here is how you would do that in a traditional imperative way:

// PHP imperative way
$result = array();
foreach ($a as $x)
  if ($x->Name == $name)
    $result[] = $x;

Here is how you would go in a functional way (using some smart data structure):

// PHP functional way
$result = $a->Where(
  function($x)use($name){
    return $x->Name == $name;
  });

As you can see the code is not at all shorter, let alone the fact that I had to write $name twice.

Can’t we have something better?  Sure, here is the same code in C#:

// C# functional way
result = a.Where(x => x.Name == name);

Case 2: Mapping

This is a common scenario. From a list of objects $a return a comma separated string of all Names.

// PHP imperative way
$b = array();
foreach ($a as $x)
  $b[] = $x->Name;
$result = implode(', ',$b);

Here is the functional equivalent, again with the help of some data structure:

// PHP functional way
$result = $a->Select(
  function($x){
    return $x->Name;
  })
  ->Join(', ');

Once again this is not shorter or simpler…

Of course, in C#, the same code looks effortless:

// C# functional way
result = a.Select(x => x.Name).Join(', ');

Case 3: Null checking

You know what I am talking about, don’t you? Straight into code:

// PHP imperative way
$result = is_null($x) ? null : $x->Name;

The functional way would imply the use of some Maybe Monad:

// PHP functional way
$result = $x->Select( function($x){ return $x->Name; } );

Once more, this is longer and more complex!

Of course, C# is once again better:

// C# functional way
result = x.Select(x => x.Name);

Conclusion

In all three examples, the functional way has many advantages over the imperative one, because:

  1. The desired behavior is abstracted away and therefore there will be less points of failure.
  2. The result is given from just an expression and not a block of statements. This makes it easier to apply more methods on top of it, in a continuation.
  3. The functional way is lazy.

However, the syntax of PHP’s closures is an obstacle. The programmer is given the choice between simple (but naive) and complex (but state of the art). It does not have to be that way, as C# has shown us!

Anonymous functions and closures have so many applications that I believe a more elegant syntax would benefit everyone.

PHP code completion and PHPDoc

I love code completion. It makes my life easier. I can finish my work faster, let alone the nice feeling you have when you see it work. It makes me feel like putting the final piece in tetris.

Yet, when it comes to weakly typed languages like php, its implementation is not easy. The IDEs cannot derive all the required information from the code, so they have to resort to other techniques. One of the most common is to take advantage of annotated comments, which in the case of php are defined by PHPDocumentator, an open source tool.

However, we must never forget that PHPDocumentator was not designed with code completion in mind. It’s a topic out of its scope. So, I believe that some extensions to the standard have to be made.

The single most wanted extention is the ability to use the self and the static keywords as legal replacements of class names, the same way this is allowed in php code. There are two design patterns that would benefit from it: the Factory pattern and the Fluent Interface.

Simple example:

class Control {

    /** @return static */
     public static function Make() {      // factory pattern
      return new static();
    }

    private $label;

    /** @return static */
     public function WithLabel($value) {    // fluent interface
      $this->label = $value;
      return $this;
    }

    public function Render() {
      // rendering happens here…
    }

}

class Textbox extends Control {

    private $text;

    /** @return static */
      public function WithText($value) {   // fluent interface
        $this->text = $value;
        return $this;
    }

}

 And here is who one could use these classes:

Textbox::Make()           // <– late static binding, returns Textbox
    ->WithLabel(‘foo’)    // <– late dynamic binding, returns Textbox
    ->WithText(‘bar’)      // <– normal binding, returns Textbox
    ->Render();

Currently, IDEs have no way to determine the return type of the first call, and so cannot offer code completion for any of the following calls.

Many suggest the use of an undocumented feature of PHPDocumentator, which is @return $this . This idea has some flaws:

  1. It is not semanticaly correct, because $this stands for an object not for a class.
  2. It cannot be used in static methods
  3. The current implementation incorrectly substitutes it with method’s class (that is the class self) without taking into account late binding (that would result in the class static).

Let’s hope that the inclusion of self and static  will be implemented soon to make life easier by a tiny bit.

A PHP maybe monad

I believe that the single most frequent bug in almost all applications is the Null-Reference-Exception bug, followed by the Unhandled-Exception bug. And I thing that both of these problems have their origin in the way we write code. Take for instance the following piece of code:

echo $x->GetType()->GetName();

This simple code is evil! What happens if not “Type” object is found? There are two usual approaches. Either GetType will return a null value or it will throw an exception.

In the first case, the correct code is boring:

$type = $x->GetType();
echo !is_null($type) ? ‘Untyped’ : $type->GetName();

In the latter one, the correct code is even more boring:

try {
    echo $x->GetType()->GetName();
}
catch (NotFoundException $ex){
    echo ‘Untyped’;

Because all three syntaxes are acceptable by the language, we tend to use the first one because it is simpler, even though is error prone!

There is an answer coming from the functional programming world that can change that. It’s called “monads” and can enforce some discipline in out code. Here I will try to handle the Null-Reference-Exception problem by using an immitation of the Maybe monad for PHP.

Let’s say that the GetType method returns null if no “Type” is found. The Null-Reference-Exception problem is caused by the fact that GetType returns either some Type object or a null. What if we changed that? Let’s make the GetType method always return an object of a specially crafted class called Maybe:

class Maybe {
    private $object;
    public function __construct($object=null){
        $this->object = $object;
    }
    public function IsNull(){
        return is_null($this->object);
    }
    public function Select( $function ){
        if (is_null($this->object))
            return $this;
        else
            return new Maybe($function($this->object));
    }
    public function GetValueOr($default){
        if (is_null($this->object))
            return $default;
        else
     
            return $this->object;
    }

Have you noticed that there is no exposure of the object without checking for null first?

Now let’s say that GetType returns a Maybe object. Our final code will become like this:

echo $x->GetType()
  ->
Select( function($x){ return $x->GetName(); })
  ->GetValueOr( ‘Untyped’ );

It’s a litte bit long, but this is because PHP is bavard when it comes to anonymous functions. Yet, the cool thing is that there is no way to write something shorter that could hide a Null-Reference-Exception problem.

PHP is an untyped language, so compile-time type checking is not an option. All bugs cannot be seen until run-time. Still, with the use of this pattern, we made the error prone code to produce an error any time it is called, and not only some of the times. This, in my opinion, is a big advantage.

Let’s make a language!

The other day, I found an old book (1988) with the title “LET’S BUILD A COMPILER!” by Jack Crenshaw. I loved the way it demystified the art of crafting a language from scrach. It’s like filling a gap between theoritical computer science papers and day to day programming.

So to cut the story short, I am thinking of making a language. Just for the fun of it. I don’t have illusions that this is going to be the next C#. I do this just to make me understand why some things in programming work the way they do.

So, which will be the characteristics that I would like this language to have? I have to admit that this is still kind of blurry in my mind. Let’s write one or two points to clear it out:

  1. It will compile into PHP! Strange thought, right? Well, I have my reasons. PHP is not only a language but also a virtual machine. Actually php files can be compiled into bytecode that will be executed by the engine. It’s like Java files are compiled into JVM bytecode or C# files are compiled into CLI bytecode. Except that there are many languages that target JVM (for example Scala) and many languages that target CLI (for example VB.NET, F# etc). PHP is a language with many flaws which is however ubiquitous. What if there was an alternative?
  2. It will be strongly typed. I have watched the battle between strogly and loosely typed languages for years and I see myself at the side of strolgly typed languages. My point of view is that, should there be a bug that the compiler can find, let it find it!
  3. It will have strong support on metaprogramming. I will explain that in a later post.
  4. I want to test things from an engineering point of view. I do not care either for performance or mathematical proofs.
  5. Syntactic sugar is importand. I want to make the language as DRY as possible.

Of course, all these are rather ambitious. All in all, I want to run my experiments and maybe gain some knowledge out of it.