The slopes of these red lines, g, are the sub-gradients of relu(z) at z=0, where g is uniformly distributed! We know that for a given random variable g, with a certain probability distribution p(g) in the interval [a,b] the expectation is equal to:

And when g is uniformly distributed in the interval [a, b], p(g) must be:

So that the area under the curve would be exactly equal to 1. More specifically, this is a rectangular area, whose length is equal to and whose width is . And clearly, the area of this rectangel is 1:

Otherwise, p(g) cannot be a probability distribution! In our case, the range [a,b] is equal to [0,1], as this is the range of possible values for the slopes of these sub-gradients.

Now, back to computing the expectation of these bloody sub-gradients (i.e., slopes of infite number of those red lines at z=0) represented by g:

From there let’s replace [a,b] with [0,1]:

So:

So, the expected value of the sub-gradient over infinitie number of sub-gradients is 0.5! I know! We ended up with the mid-point in the range [0,1]. However, I am more convinced by the “Expectation” arguement than I am with the “mid-point” arguement. So, this means that when z=0 (as rare as it is), define the candidate sub-gradient to be:

Moral of the story: You can choose any value in the range [0,1] and your ANN will still train. However, I like my expectation arguement as it lays a consistent arguement rather than just picking a random value!