Friday, May 1, 2015

Optimizing your JavaScript with Functional Programming

Overview

Generally speaking, there are many ways to optimize your code. For instance, you could re-evaluate your core algorithms and perhaps try to speed up the execution of some loops. Also, you can adopt some best practices like avoiding large string concatenation or excessive use of closures. But, instead of reducing the execution time of your functions, what if you could avoid calling unnecessary functions? Being that JavaScript is a functional programming language, there are a couple of techniques we can use to reduce the execution time of your programs:

  1. Lazy Evaluation
  2. Memoization

Before we get started, here are some concepts you will need to understand in order make sense of the rest:

Pure functions: A function is pure when it has no side effects, i.e does not read/modify any external data.

Referential transparency (RT): Piggy-backing on pure functions, RT functions are ones which compute their return values based solely on the input provided. It's the closest thing to a mathematical function. Because its result is consistent with respect to the input, RT functions are very easy to reason about and test.

Now, Let's jump in:

Lazy Evaluation

In JavaScript, function composition plays an important part in implementing lazy evaluation. Lazy evaluation is a “call by need” execution strategy whereby a function is invoked only when needed. Non-strict languages like Haskell have built-in lazy evaluation consistently in every call; however, JavaScript is a strict language, which means that all functions are evaluated eagerly. Consider the following example: Suppose we wanted to extract a finite subset of numbers from an infinite set. What? Bear with me. We can create a function called range(begin, end) which creates an array of whole numbers from a specified beginning to end:


var firstTenItems = range(1, Infinity).take(10);

firstTenItems();//-> 1, 2, 3, 4....10?

You might wondering: how can this function call ever terminate if we are generating an infinite list of numbers? In a non-strict language like Haskell, we could write an equivalent program that will actually terminate and take the first 10 items, efficiently. In JavaScript, this similar program will generate an infinite loop since it will try to eagerly evaluate the range function first. Because lazy evaluation is not built-in to JavaScript, the only alternative is to emulate it with an API, which is based on functional composition in order to circumvent JavaScript’s eager evaluation mechanism at its core by a functional library called Lazy.js.


    var evenNumbers = Lazy.generate(function (x) {           
        return 2 * x;
    }); //-> 2, 4, 6, 8,...

    evenNumbers.take(300).each(doWork);                      

The Lazy.generate function is essentially a stream data structure. A stream is an array-like data structure that behaves very closely to a linked list (in a language like Java). It can be used to generate an infinite amount of elements. While this sounds bizarre, behind the scenes it is the power of lazy evaluation via function composition that allows this to happen. The code previously shown is a classical problem of separating a function that can generate infinite number of possible values, and a selector that chooses the appropriate ones.

Using Lazy.js allows us to modularize this program effectively as the initialization of the number stream is deferred until the each function is called. Without it, you would be forced to write code like looks similar to:

var threshold = 10;
for (var i = 0; i < Infinity; i+=2) {
  if (i >= threshold) {
      break;
  }
  else {
    var item = items[i]; 
    
    // do work on item
  }
}
This one-off code isn’t reusable at all, takes longer to write and it’s harder to maintain. Using Lazy.js we will avoid writing code like this in favor of a more declarative approach that takes advantage of lazy evaluation. Because it avoids unnecessary computations (or at the very least defer to only when needed), lazy evaluation has also shown to speed up the execution of JavaScript programs.

Memoization

Another form of laziness is by avoiding computing a function call altogether. In traditional object-oriented systems this is accomplished by placing a cache or proxy layer that is checked before a function is called. Upon return, the result of the function is given a key that references it uniquely and this key-value pair is persisted in the cache.

Functional languages implement a caching mechanism akin to the object-oriented model but built-in to the platform so a function’s result is stored and can be retrieved when calling it again with the same input. This technique is called Memoization and it relies on the notion of referentially transparent functions. Recall that pure functions always return the same value on same inputs.
Through Memoization the set of input parameters are encoded as the key to a function’s result, internally cached, and looked up when the function is invoked again on the same input.

Consider a function that computes a secure hash based on a user’s unique username and password. Depending on the type of algorithm used, this function can be very expensive to call. Generally, with secure encryption the more intense the algorithm the stronger the hash will be.

function generateSecureHash(username, password) {
 // run expensive algorithm

 return hash;
}
generateSecureHash('jim', 'Jim@Pwd!');
Based on the premise of pure functions, calling generateSecureHash again on this same input bypasses the computation and returns the memorized result. The effectiveness of Memoization is directly proportional to the level modularity in your code. As you decompose big tasks into a greater number of smaller, single-purpose functions, Memoization can be used to achieve a higher level of fine-grained caching.

Like lazy evaluation, Memoization is not a native JavaScript feature, but can be installed by augmenting JavaScript’s extensible Function object, and looks like this:

var computeFactors = (function factors(num) {
 
// run algorithm

 return factors;
}).memoize();

computeFactors(100); //-> [1,2,4,8,16,32,64]

Again, subsequent calls with 100 will be lazy and just return the memorized result. The beauty of this level of caching in functional languages and in JavaScript is can be implemented without adding any boilerplate code of placing cache checks/adds all over our functions. Here's a look at implementing memoization by extending the Function object:


Function.prototype.memoized = function (key) {

        if (this.length === 0) {
            console.log("Cannot memoize function call with empty arguments");
            return;
        }
        else {
            // Handle identifiable objects
            if (key.getId && (typeof key.getId === 'function')) {
                // Extract objs ID
                key = key.getId();
            }
            else {
                // Last resort, handle objects by JSON-stringifying them
                if (!isNumeric(key) && !isString(key)) {
                    // stringify key
                    key = JSON.stringify(key);
                }
            }
        }

        // Create a local cache
        this._cache = this._cache || {};
        var val = this._cache[key];

        if (val === undefined || val === null) {
            //alert("Cache miss, apply function");
            console.log("Cache miss");
            val = this.apply(this, arguments);
            this._cache[key] = val;
        }
        else {
            console.log("Cache hit");
        }
        return val;
    };

    // Enhance Function prototypes

    // Provide ability for functions to memoize (or be proxied)
    Function.prototype.memoize = function () {

        var fn = this;

        if (fn.length === 0 || fn.length > 1)
            return fn;  // only handle functions with a single formal argument

        return function () {
            return fn.memoized.apply(fn, arguments);
        };
    };
Functional languages cede mechanisms for lazy evaluation to the language platform and free developers from having to worry about implementing and testing this. In JavaScript, we can do the same by letting with high-level constructs like compose and memoize handle any optimizations in our code. This is preferable because it means that we are free to test our functions independently of any optimization mechanism. Hopefully we've enjoyed this blog post and begin to embraces functional programming is your programming paradigm of choice! Stay tuned for more!

Resources

  1. http://danieltao.com/lazy.js/

Tuesday, March 3, 2015

Functional Programming - Preserving Type Safety

Overview

Functional programming is a development paradigm centered around the use of functions. It's not just a matter of using functions, but about the perspective and thought process you exercise when faced with a programming problem. In other words, it is a framework for the mind and your code.
The goal of FP is to abstract operations on data with functions in order to maximize reusability and reduce mutation of state and side effects in your application. It so happens that by doing this, your code is much more scalable, readable, and testable.
Functional programming demands that functions have 2 important qualities:
  1. Be first-class citizens
  2. Be of high order
Functions are first-class when they can be created and assigned to a variable. Functions are high-order when they can be used in combination with other functions whether it is passed in as an argument or returned from a function. Let's expand this notion further. When a function is equivalent to a value, it follows a principle of substitutability, which means that a function call's return value can be replaced and inlined into an expression without altering its result. Consider a quick JavaScript example:


   function areaSquare(a) {
        return a * a;
   }

   function volumeCube(areaOfSquare, a) {
        return areaOfSquare * a;
   }

   a = 2;
   area = areaSquare(a);         // -> 4
   volume = volumeCube(area, a); // -> 8
The following expressions are equivalent:

   volume = a * a * a; 

   volume = areaSquare(a) * a; 

   volume = volumeCube(areaSquare(a)); 

Why JavaScript?

Simply put, omnipresence. JavaScript is everywhere. For many years, the language of the web, and now it's also breaking into the server market. Most importantly, JavaScript is a functional language; it actually supports writing in both functional and object oriented styles. This, I argue, should be the only way in which we should be using JavaScript.

Furthermore, JavaScript is a non-statically typed language. Though types are enforced, it is done at runtime and not at compile time. In fact you don't include type information in your code. Considering the invariant that referentially transparent functions need to be predictable and return the same value on same input when called, it follows naturally that functions need to be consistent in the type of value or object they return. Referentially transparent functions have no side effects and depend only on the input provided.

Functions like:

Date.now();

and

   var counter = 0;                    // global
   function incrementCounterTo(num) {
 return counter += num;    
   }

are not considered pure due to side effects that break referential transparency.

Preserving Type Safety


In order to talk about the issue of type safety, consider this code:

   var months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
                 "Aug", "Sep", "Oct", "Nov", "Dec"];
   function getMonthName(mo) {
      if (months[mo - 1] !== undefined) {
         return months[mo - 1];
      } 
      else {
         throw new Error("Invalid Month Error!");
      }
   }

It's clear that for every valid month number, this function will always return same month name, which is string. However, for other inputs (mo > 12) this function throws an exception, so type is not preserved across input values in the domain of this function. Exceptions causes the stack to unwind and set external program flags for termination, which cause side effects to occur. Calling this code looks like the following:

    try {
        var m = getMonthName(13);

        // do work with m
    }
    catch (e) {
        return e.message;
    }  

Also, this function violates referential transparency, for you can't under any circumstances inline an exception statement as part of a bigger expression, as it will alter the entire meaning of the expression.

Let's look at a more functional approach that will solve all of these issues.

Optional Data Type

The solution is based on a core functional programming concept called monads. In particular, we will be dealing with the Optional monad, also called the Maybe monad in some functional programming languages. Here is the code for the Optional JavaScript object (some details omitted for brevity):


var Optional = (function () {

    // private constructor
    function Option(val) {
        var _value = val || null;

        // public methods

        this.get = function () {
            if (!this.isPresent()) {
                throw 'NoSuchElementError';
            }
            return _value;
        };
        
        this.map = function (fn) {
            if (!this.isPresent()) {
                return Optional.empty();
            }
            if (typeof fn == 'function') {
                return Optional.ofNullable(fn.call(_value));
            }
            return Optional.ofNullable(_value[fn].call(_value));
        };

        this.getOrElse = function (other) {
            return _value !== null ? _value : other;
        };
    }

    return {

        empty: function () {
            return Object.freeze(new Option());
        },
        
        of: function (val) {
            if (val === null) {
                throw 'NoSuchElementError';
            }
            return Object.freeze(new Option(val));
        },
        
        ofNullable: function (val) {
            var inst = val !== null ? this.of(val) : this.empty();
            return Object.freeze(inst);
        }
    }
})();

There are many ways of doing this, I like the notion of immutability in objects, which is why I choose to freeze objects as they are created. Using Optional we keep referential transparency as well as preserve type safety in our functions. This has many benefits:
  1. Create a clear contract for our functions
  2. Prevent our callers from having to place boilerplate exception handling code into every call
  3. Provide a more succinct and fluent API experience
Let's look at how we can improve our sample problem:

   function findMonthName(months, mo) {
        if (months[mo - 1] !== undefined) {
            return Optional.of(months[mo - 1]);
        }
        else {
            return Optional.empty();
        }
    }

This allows you to be more idiomatic about how you handle failures. Your calling code looks like:

findMonthName(months, 13).getOrElse('No month found');

As you can see this function keeps its contract by returning an Optional object to encapsulate success and failure states.

Here is another use case for this functional pattern. Consider the following simple objects:

    function Address(country) {
        var _country = country;

        this.getCountry = function() {
            return _country;
        }
    }

    function Person(addr) {
        var _address = addr;

        this.getAddress = function() {
            return _address;
        }
    }

    function Customer(person) {
        var _person = person;

        this.getPerson = function() {
            return _person;
        }
    }

It would be nice to be able to access the address of a customer as:
customer.getPerson().getAddress().getCountry()
And expect it all to work all the time.....Right! You will probably want to guard against this access. In imperative and object-oriented programming, you will be writing code like this:

    var p = customer.getPerson();
    if(p !== null) {
        var a = p.getAddress();
        if(a !== null) {
            var c = a.getCountry();
            return c;
        }
        return null;
    }
    return null;

Can you spot any issues with this code? For starters, the incessant null checking that occurs only to return null again because there is no other sensible value. So, the caller of this function also must perform another round of null checks.

Let's now see a more scalable and idiomatic approach, the functional way:

   customer.getPerson()
            .map('getAddress')
            .map('getCountry')
            .getOrElse('Unknown');    

If any value is missing in the chain of calls, this will short-circuit and return 'Unknown'. We do this to avoid all of the null checks. All we need to do is tweak our API based on Optional:

   function Customer(person) {
        var _person = person;

        this.getPerson = function() {
            return Optional.ofNullable(_person);
        }
    }

And that's it!
Functional programming is a very important and powerful way of programming. Experts in the field have realized that applications written in a functional style tend to be more expressive, readable, and correct. Referentially transparent functions can have their result values substituted inline in place of the actual function call. Finally, using Optional can enhance your code in many ways by making it shorter to write as well as fluent.

Tuesday, January 6, 2015

Secrets of a Functional Ninja

Overview

JavaScript is a general-purpose, single-threaded, functional programming language for the web. Douglas Crockford in his post, JavaScript: The Most Misunderstood Programming Language, mentions that JavaScript has more in common with functional languages like Lisp or Scheme, than many other popular languages such as C or Java. This notion of "functional" is possible due its very powerful Function object, which is the core of Functional Programming and JavaScript alike.

To become a JavaScript expert, all you need to understand well is the interplay between the core pillars of the language, the rest falls into place:



In this post, I will talk about some of the nice features of the JavaScript Function object. In addition, I will talk about one of the most misunderstood, and yet powerful, features of JavaScript functions, closures.

The Function

According to John Resig in his book Secrets of a JavaScript Ninja, the most important skill that separates average JavaScript programmers from great ones lies in the understanding of functions --the very foundation of the language is based on this notion. Let's look at some of the nice properties of JavaScript functions.

Functions are first-class objects

Functions in JS can be treated just like any other object: they can be assigned to variables, returned from functions, and passed as arguments into other functions. Essentially, a function is a value. The only characteristic that separates a function from an object is that the latter can be invoked, via the ( ) operator.

Functions are JavaScript's "unit of work" (comparable to what classes are in Object Oriented  languages) and the main abstraction layer. All code that gets executed should run within the context of a function. As a result you can safely assume that functions can be nested within functions. Consider the browser's onLoad() function to be equivalent to JavaScript's main method. All code should begin within this function, creating something called the the global context --the window object.


<body onload="main();">

    ...
    <script type="text/javascript>"
      function main() {

          // application code starts here
      }
   </script>   
</body>

Aside from being "callable",  functions, like objects, can have properties and you can dynamically assign properties to it. Properties are equivalent to instance data. Let's look in detail at 2 important function properties: name and length.

Functions have a  name property; omitting it will create an anonymous function. This property is set to the name of the function in the function signature at the time of declaration. Anonymous functions are used typically as assigned to a variable (for later execution), used as immediately invoking functions, or passed in as function arguments in the form of a callback, like the setTimeout() API. Callbacks are very useful when handling asynchronous events: user input, browser events,  intervals, etc.

Another important function property is length which represents the number of named (formal) parameters with which the function was declared. length is used a lot in framework or library code to perform introspection on the function called in order to provide some nice functionality.

console.log(function(a,b) {}.length);  // returns 2

Functions can be invoked in 4 different ways:
  1. As a standalone function. This is the most straight-forward call. It occurs when the ( ) operator is applied to the function name or immediately invoked applied to a un-named function object.
  2. As a method (enabling object-oriented programming). It occurs when the ( ) operator is applied to a property of an object. 
  3. As a constructor (bringing a new object to life). It occurs when preceding the function invocation with the keyword new. In the absence of any explicit return value, the new object is returned. It is best practice to name these functions with a starting capital letter. 
  4. By calling the apply(...) and call(...) methods. Used when the intent is to force the function context to be a certain object. In these calls, the first argument is the owning object. Passing null has the interesting effect of setting the global context (namely, window)  to be the owning object.   

// 1
function myFunc() {
  // do something
}
myFunc();


// 2
var obj = {
   myFunc: function () {
      // do something
   }
};
obj.myFunc();

// 3
function Person(fname, lname) {
   this.firstname = fname;
   this.lastname = lname;
}
var person = new Person();


//4
Array.prototype.slice.call(null, [1,2,3,4.....n]);



All function object invocations (whichever of the 4 categories above) are syntactic sugar for the implicitly created Function() constructor. 

var func = new Function([arg1, [arg2, ..[argN]]], functionBody);

Where functionBody is a string containing all of the statements comprising the function body.

Variadic Functions

At function invocation, function arguments are mapped one-to-one to function parameters. However, there are cases where a user will supply more arguments to a function than the list of parameters it declares to accept.

Unlike statically types languages, JavaScript has no overloaded functions; instead, it handles parameter passing in the following manner:
  1. If more actual arguments are supplied than there are formal parameters, the extra arguments are not given explicit names. JavaScript provides another mechanism to retrieve them implicitly via the arguments object. Oddly so, the arguments object is not a JavaScript array, it is actually an array-like object with a length property and overloaded indexing operator. 
  2. If there are more formal parameters than there are actual arguments supplied to the function call, the extra parameters will be set to undefined
In addition to the implicit arguments object, all JavaScript functions (because they are first-class objects) are passed an implicit reference to this, similar to classes in OO languages.  The this reference is included in the a function's context. Function context is not straight-forward to understand, but it is JavaScript's most important feature, especially when combined with closures. I will explain this more clearly in the next sections to come.

If you come from OO languages, like Java, the this refers to the object within which an instance method is contained. In other words, it is a factor of how and where the function is declared. In JavaScript, it is a factor of how the function is invoked, based on the 4 invocation methods listed previously.

Scope

A function is scoped within the function that created it. If a function or variable is declared outside of a function scope, it will have global scope, which basically just translates into a dynamic property being added to the window object. Also, variables created without the keyword var will automatically be assigned to the global scope. By the way this can be very disastrous as it pollutes the global namespace. I just follow a simple rule, do not modify the global namespace.

To avoid namespace pollution, good JavaScript programmers write their code under some kind of function or object scope. Also, variables in the global scope become victims to "global change;" in other words, they can be changed in any place in the program, causing a debugging nightmare. The JQuery library and many others follow a pattern  called Immediately Invoked Function Expressions (IIFE), similar to:

(function() {

   // declare all variables and functions here   

}).call(this);

Function scope (similar to dynamic scoping explained below), and hence the reference to this, varies depending on the manner with which the function is being invoked, similar to before:
  1. As a standalone function. The function is in global scope, the window object owns it 
  2. As a method. The function becomes a property of said object and this refers to it. 
  3. As a constructor. The this implicitly reference is passed to the function and references the newly created object. All properties of the new object are in reachable via this
  4. By calling the apply(...) and call(...) methods. Both methods accept the owning object, which this will refer to. Essentially, the developers are in control of what the context will be.  
Throughout this post, I have identified several types of scoping rules: 

Global Scope

This is the simplest form of scoping. We covered this in detail in this post. Basically, objects created at the top level whether they are functions or variables have global scope. Also, objects created anywhere without the keyword var take on global scope. The owning object is the window object.

Lexical Scope

In this type, the visibility of a variable is based on its textual position. 


someVar = "Global";

function func() {
   var someVar = "Middle";

   return function(e) {
      var someVar = "Inner";
      alert(someVar); // will print "Inner"
   }
}
Lexical scope starts binding from the inner most scope and works outward. This is the "tightest" scoping mechanism available. JavaScript partially implements lexical scoping, but more appropriately, it implements function scoping, explained later. Languages with lexical scoping suffer from a downside called "Variable Shadowing". As you can see in the code snippet above, the variable someVar was shadowed with different layers all the way down to the innermost scope. In larger code segments, this can prove very hard to trace. 

Dynamic Scope

We covered this in detail throughout this post. The this reference depends on how a function is called: object, function, or global. Using call( )  or apply( ), developers can pass a dynamic reference to act as the context object, or pass null to indicate it belongs to the global context.

One common technique for supplying a scratch-pad to work on for the variable "this" is to pass an empty object into the call method, as such: 

someFunction.call({}, 10000);

Function Scope

Very similar to dynamic scoping. All bindings are constrained by the area bounding it, namely a function. This  is JavaScript's main scoping mechism.

function strangeIdentity() {

   for(var i=0; i<n; i++);  // notice semicolon

   return i;
}
strangeIdentity(138);  // returns 138

JavaScript groups all variable declarations to the top of the function, in a process called hoisting. JavaScript hoists all variable declarations within a function. This is somewhat confusing when compared with languages with block scope.  As good JavaScript developers, it is preferred that variables declared within functions (and within any scope) are done at the beginning.  Let's take a closer look.

Block Scope

Present in most other programming languages. Not implemented in JavaScript. As far as scope is concerned, JavaScript will ignore the curly braces in some situations in favor of function scope.  

function scopeTest() {
   for(var i = 0; i <= 5; i++) {
      var insideFor = i;
   }
   alert(insideFor);
}

In the following code, we would expect the script to fail indicating that the variable insideFor had not been declared. This is not the case, under hoisting this variable declaration was "moved up" to the top of the function and is thus visible within the entire function scope. The variable insideFor is hoisted to the top and given a value of undefined until the initialization happens within the for-loop.

To sum up, scoping can be present in the following ways:
  1. The value of the this binding, in the case of an object context or function context.
  2. The execution context defined by the value of the this binding, for cases when using apply( )  or call( )
  3. The “lifetime” of a variable. Variables exist from the moment they are declared until the function ends. Variables in the global context are accessible anywhere in the script.
  4. Variable value resolution scheme or lexical binding. Depending on where a variable appears, it's scope is bound to the function that contains it. 

Closures

A widely written about and debated topic in the JavaScript community. Closures have been a feature of functional programming languages such as Haskell for quite some time. Using them properly can have immense effects on your code.

Simply put, a closure is a scope that is created upon a function declaration allowing it access to any variables declared outside of the function --or within the enclosing (hence the name) scope.

Obviously, global variables will be included in the mix since as they are explicitly added as properties of the window object. The idea that drives developers crazy and at the same time why this is so powerful is that functions can be called at any point in time, after their declaration. So, access to that information when the function is called is preserved even after that scope has gone away.

Languages that support closures store the state of any accessible variable definitions together with the function declaration at runtime, in the functions stack. Imagine creating a "safety bubble" around all variables present, including function parameters, at the point of function declaration. This bubble will not be garbage collected until the reference to the function is gone.

function createScaleFunction(FACTOR) {
   return function(v) {
      return _.map(v, function(n) {
        return (n* FACTOR);
      });
   };
}


var scale10 = createScaleFunction(10);
scale10([1,2,3]);   //=> [10, 20, 30]

The FACTOR variable in this case is "remembered" or stored as part of the definition which gets returned to the user for later execution.  Notice that when we invoke scale10 the function is has already gone out of scope, which is impressive.  As shown before, another important use of closures is together with IIEFs to create private instance variables. In this code:


(function() {
   var _myVar = {

   };

}).call(this);

The variable _myVar will only be visible within the context or "bubble" of the immediately execution function. 

So, what's up with JavaScript?

Now we take a look at a somewhat obscure and also misunderstood JavaScript keyword, with.  The with statement is used to bound or put all properties of a JS object within the current scope, allowing us to access them without having to quality the owning object.

Support for this keyword is likely to be deprecated in new versions of JavaScript since ECMAScript 5 already prohibits its use under strict mode; however, this statement has been very controversial an debated among JavaScript experts, and you are likely to run into it in written code.

Let's take a quick look:

var someFunc = function () { return "ABC"};

var myObj = {
   someProp = 4,
   someFunc = function () { return "XYZ"};
};
with(myObj) {
   // bound the scope of myObj into the curly braces
   var xyz = someFunc();  // Who owns variable xyz? [window]
   alert(xyz);  // Prints XYZ
}

The following code above will print out "ABC " since we used the with statement to create a scope bound to object myObj. Some uses of with can be:

  1. Shortening references to an object in a deep hierarchy making code syntactically more legible (arguably harder to follow, according to Douglas Crockford)
  2. Exposing properties as top-level to a template

While reading an object's properties in an unqualified manner might look nice at a glance and make code shorter, writing to properties is very ambiguous and not consistent with what you might think. If you are within an object's scope bound by with, you would expect that assigning to or creating a new variable within this scope will implicitly create it under myObj. In other words, variable abc would belong to myObj. This is not the case. It will actually create it in the global context, which is consistent with the scoping rules mentioned previously. Only functions create new scopes, and a with block is no exception.

There is a better alternative to using with, you can use immediate functions as shown below to wrap a block of code and set the proper object scope.

Conclusion

The most powerful and yet complex concept to understand about JavaScript is that functions can be called in many different ways. Function context and scope, including the this reference,  depend on the manner in which function are invoked. While this is very powerful, it can be quite confusing.

In this post we took a brief tour of the JavaScript Function object as well as some inherent characteristics such as scope and closures. It is important to remember that closures should not be overused as they drag with it necessary memory to hold references in a function's outer scope.

Finally we looked briefly at the soon-to-be-deprecated with keyword. This can be used to achieve very powerful things at the expense of compromising readability.

Now that we have covered functions, objects, and scopes in detail stay tuned for my next blog post on Functional Javascript.

Resources

  1. Resig, John and Bibeault, Bear. Secrets of a JavaScript Ninja. Manning Publications
  2. http://javascript.crockford.com/javascript.html
  3. http://yuiblog.com/blog/2006/04/11/with-statement-considered-harmful/

Friday, December 5, 2014

Dart: A Best of Breed

I have playing with Dart quite a bit recently, and I think that the set of features it has makes it a great platform to develop small to large enterprise class applications. I have written code in languages such as Java, PHP, Python, and JavaScript, and I can see that Dart has borrowed many of the cool features of each into one great development platform. In this post I will highlight some of the core features of Dart in comparison to other languages I have used.

Get ready for this mouthful: Dart is an open source, class-based, optionally typed, structured, programming language for rich browser-based applications. Its familiar C-like syntax and scalable programming constructs, make it ideal for building single-page web applications.... ok....

Similar to other platforms such as Google Web Toolkit or CoffeeScript, you can run Dart by compiling to native JavaScript code. This will make adoption of this platform a bit easier for browser vendors other than Google. However, unlike GWT, which compiles for different targeted browsers, Dart is designed to compile your application to a single JS file targeting only the modern browsers.  Dart is optionally typed (also called documentary types), which means you can use var to assign any type of variable: String, Object, int, etc. You can use the static types as well. Documentary types improve tool integration features such as: code completion, compiler validations, and code-level documentation, which makes developer's lives easier. Darts type system can be considered a hybrid between Java's strong, static typing and JavaScript pure dynamic typing. If no type information is provided on a variable, dynamic is assumed.

In addition, Dart has a single-threaded concurrency model based on a concept called Isolates for parallel execution of code. Isolates are great because they don't have a shared-memory model, unlike Java Threads. When compiled to JavaScript, Isolates translate to HTML5 Web Workers. This stateless concurrency model is based on a message-passing middleware used to communicate from one isolate to another. In fact, the Dart runtime starts out (from its main method) within the context of an isolate; essentially, every script containing a "main" method runs in its own isolate.

Dart code can spawn a new Isolate, like Java does a for a new Thread. Isolates are different from threads in that they cannot share memory. They are ideal for creating a plugin architecture since they can be used to load external code dynamically in very much the same way to Python and PHP, but because it run in its own Isolate it will run in its own protected memory space.

On a much smaller scale, Dart also supports the development of Server-Side applications using the Dart Virtual Machine. On the server, Dart features nice I/O libraries for manipulating files and sockets. Most of the Dart libraries work on the server, except for dart:html which is in charge of DOM manipulation.

Dartium, the Dart developer edition of Chromium, is Google's open source version of Chrome. Dartium executes Dart code natively in the browser through an embedded Dart VM. For browsers that do not support Dartium, which is any other browser other than Chrome, there are a set of tools in the Dart ecosystem that might be useful:
  1. dart2js: Compiles all dart files into a single javascript file. 
  2. pub: a package manager much like Maven for Java or Composer for PHP. You can publish your libraries in Github.
  3. dartdoc: generated formatted HTML documentation (like javadoc). dartdoc comments start with "///" instead of "/**"
Like Java, Dart has single inheritance, multiple interfaces model; all classes inherit from Object. You use the keywords extends and implements, respectively. This is similar to Java's and other's inheritance model and is light years better than JavaScrip's prototypal inheritance. 
You can have public and private types and private fields are indicated via the "_" underscore. Other languages such as Python work this way as well. The latter, applies also for methods, functions, and even classes.
The "this" keyword is much more simple to understand as it refers to the instance of the class itself (like in Java) and not to the owner of the class (like in JavaScript).
You can split an application into modules or libraries --or packages. Each library is contained within its own .dart file. Libraries are defined using the "library" keyword --analogous to "package". This is a much better improvement over JavaScript since it suffers a lot from namespace pollution and variable collisions. In Dart, you can prefix (or qualify) libraries in order avoid class collisions when importing 2 libraries. For instance:

import "../my/lib/dart" as mine;

mine.commonFunction("do something"); // this will be unique
Now let's spend some time with a very powerful feature in Dart, functions. Like JavaScript, functions are first-class objects in Dart. Top-Level functions allow developers to pass functions as arguments to other functions, assign functions to variables, and dynamically call functions by name. In addition, function arguments (both objects and primitives) are passed by reference.

Dart provides a cool feature called Factory Constructors that allows class designers control the concrete implementation class for a given interface. This is really useful when you intend to have one implementation of an interface, but also to hide the underlying implementation of a class. In Java, you would have to implement it via a Factory Method pattern. In fact, String and int are implemented this way, via factory constructors. For instance:


abstract class IButton {
  
  click();

  factory IButton() {
     return new GenericButton();
  }
}

class GenericButton implements IButton {
  
   click() {
      return "Button has been clicked!";
   }

}

class CloseButton implements IButton {
  ...
}
Another cool thing about constructor methods in Dart are the initializer parameters:

class MyClass {

   var color;
   
   MyClass(color) {
      this.color = color;
   }
}

// you can also use

   MyClass(this.color) {

   }

Dart has several notations to express functions: long notatin and and short notation (similar to a Java8 lambda expression). Here is an example using short-hand syntax for getters and setters:


class MyClass {

   var _color;  // private property

   Color get color => _color;
  
   set color(Color c) => this._color = c;
   
}   

All functions in Dart return a value. Short-hand functions always return the result of evaluating the expression. Default is null. Long notation will employ the use of the "return" keyword just like any other language. You may also qualify a function as returning void. 

Dart supports closures, just like in Javascript. Closures close -over (or wrap) any local variables defined at the time of a function declaration.

Finally, Unit Testing Dart applications is easy with the built-in unittest library. Creating a test is nothing more than declaring a test function and using the Expect class to define assertions in your code. For instance:


test ("Test name", () {
   
   // anonymous function containing test body
   Expect.isNotNull(someObj);

});

Resources

  1. Dart In Action.
  2. dartlang.org

Monday, November 24, 2014

Apache Mahout: Clustering

Overview

People tend to naturally and unconsciously group things together. For instance, honey and sugar are "sweet".  When we taste items exhibiting a similar taste, we immediately describe it as "sweet."

This activity that we as humans do so quickly and instinctively is called clustering, or grouping.

The components involved in a Mahout clustering solution contain:

  1. An algorithm implementation used to perform grouping
  2. A notion of similarity and dissimilarity 
  3. A stopping condition representing when groups can no longer be formed
Circles are a good way to visualize clustering:



The center of the circle is defined as the centroid, or mean (average) of a cluster. This is the point whose coordinates are the average of the xy coordinates in the cluster. The closer items are to a cluster's center, the more similar they are.

Clustering is all about finding the similarities between 2 points or 2 items in the xy plane. Mahout contains a few clustering algorithms such as: k-means, fuzzy k-means, and canopy. In this case, "k" refers to the number of clusters to form. In the example above, "k" has been set to 3.

A Mahout clustering algorithm involves the following steps:


  1. Create a SequenceFile with the input vectors (these are boolean points in the xy space) 
  2. Create a SequenceFile with the initial cluster centers. 
  3. Pick similarity measure to use, such as: EuclidianDistanceMeasure 
  4. Define A convergenceThreshold indicating when to stop 
  5. Define number of iterations to perform 
  6. Create the Vector implementation used in input files 
Cluster centers (in step 2) are sometimes estimated or guessed; but with non-trivial data, this can be very challenging. Even if the estimated centers are way off, the k-means algorithm will adjust the centers at each iteration by computing the average center, or centroid in each cluster.

Getting perfect clusters is a science in it of itself as there are many parameters to tweak in the algorithm.

Other distance measures available in Mahout are the: SquaredEuclidianDistanceMeasure, ManhattanDistanceMeasure, CosineDistanceMeasure, TanimotoDistanceMeasure, among others.

Resources

  1. Mahout in Action. O'Reilly