Tuesday, January 6, 2015

Secrets of a Functional Ninja

Overview

JavaScript is a general-purpose, single-threaded, functional programming language for the web. Douglas Crockford in his post, JavaScript: The Most Misunderstood Programming Language, mentions that JavaScript has more in common with functional languages like Lisp or Scheme, than many other popular languages such as C or Java. This notion of "functional" is possible due its very powerful Function object, which is the core of Functional Programming and JavaScript alike.

To become a JavaScript expert, all you need to understand well is the interplay between the core pillars of the language, the rest falls into place:



In this post, I will talk about some of the nice features of the JavaScript Function object. In addition, I will talk about one of the most misunderstood, and yet powerful, features of JavaScript functions, closures.

The Function

According to John Resig in his book Secrets of a JavaScript Ninja, the most important skill that separates average JavaScript programmers from great ones lies in the understanding of functions --the very foundation of the language is based on this notion. Let's look at some of the nice properties of JavaScript functions.

Functions are first-class objects

Functions in JS can be treated just like any other object: they can be assigned to variables, returned from functions, and passed as arguments into other functions. Essentially, a function is a value. The only characteristic that separates a function from an object is that the latter can be invoked, via the ( ) operator.

Functions are JavaScript's "unit of work" (comparable to what classes are in Object Oriented  languages) and the main abstraction layer. All code that gets executed should run within the context of a function. As a result you can safely assume that functions can be nested within functions. Consider the browser's onLoad() function to be equivalent to JavaScript's main method. All code should begin within this function, creating something called the the global context --the window object.


<body onload="main();">

    ...
    <script type="text/javascript>"
      function main() {

          // application code starts here
      }
   </script>   
</body>

Aside from being "callable",  functions, like objects, can have properties and you can dynamically assign properties to it. Properties are equivalent to instance data. Let's look in detail at 2 important function properties: name and length.

Functions have a  name property; omitting it will create an anonymous function. This property is set to the name of the function in the function signature at the time of declaration. Anonymous functions are used typically as assigned to a variable (for later execution), used as immediately invoking functions, or passed in as function arguments in the form of a callback, like the setTimeout() API. Callbacks are very useful when handling asynchronous events: user input, browser events,  intervals, etc.

Another important function property is length which represents the number of named (formal) parameters with which the function was declared. length is used a lot in framework or library code to perform introspection on the function called in order to provide some nice functionality.

console.log(function(a,b) {}.length);  // returns 2

Functions can be invoked in 4 different ways:
  1. As a standalone function. This is the most straight-forward call. It occurs when the ( ) operator is applied to the function name or immediately invoked applied to a un-named function object.
  2. As a method (enabling object-oriented programming). It occurs when the ( ) operator is applied to a property of an object. 
  3. As a constructor (bringing a new object to life). It occurs when preceding the function invocation with the keyword new. In the absence of any explicit return value, the new object is returned. It is best practice to name these functions with a starting capital letter. 
  4. By calling the apply(...) and call(...) methods. Used when the intent is to force the function context to be a certain object. In these calls, the first argument is the owning object. Passing null has the interesting effect of setting the global context (namely, window)  to be the owning object.   

// 1
function myFunc() {
  // do something
}
myFunc();


// 2
var obj = {
   myFunc: function () {
      // do something
   }
};
obj.myFunc();

// 3
function Person(fname, lname) {
   this.firstname = fname;
   this.lastname = lname;
}
var person = new Person();


//4
Array.prototype.slice.call(null, [1,2,3,4.....n]);



All function object invocations (whichever of the 4 categories above) are syntactic sugar for the implicitly created Function() constructor. 

var func = new Function([arg1, [arg2, ..[argN]]], functionBody);

Where functionBody is a string containing all of the statements comprising the function body.

Variadic Functions

At function invocation, function arguments are mapped one-to-one to function parameters. However, there are cases where a user will supply more arguments to a function than the list of parameters it declares to accept.

Unlike statically types languages, JavaScript has no overloaded functions; instead, it handles parameter passing in the following manner:
  1. If more actual arguments are supplied than there are formal parameters, the extra arguments are not given explicit names. JavaScript provides another mechanism to retrieve them implicitly via the arguments object. Oddly so, the arguments object is not a JavaScript array, it is actually an array-like object with a length property and overloaded indexing operator. 
  2. If there are more formal parameters than there are actual arguments supplied to the function call, the extra parameters will be set to undefined
In addition to the implicit arguments object, all JavaScript functions (because they are first-class objects) are passed an implicit reference to this, similar to classes in OO languages.  The this reference is included in the a function's context. Function context is not straight-forward to understand, but it is JavaScript's most important feature, especially when combined with closures. I will explain this more clearly in the next sections to come.

If you come from OO languages, like Java, the this refers to the object within which an instance method is contained. In other words, it is a factor of how and where the function is declared. In JavaScript, it is a factor of how the function is invoked, based on the 4 invocation methods listed previously.

Scope

A function is scoped within the function that created it. If a function or variable is declared outside of a function scope, it will have global scope, which basically just translates into a dynamic property being added to the window object. Also, variables created without the keyword var will automatically be assigned to the global scope. By the way this can be very disastrous as it pollutes the global namespace. I just follow a simple rule, do not modify the global namespace.

To avoid namespace pollution, good JavaScript programmers write their code under some kind of function or object scope. Also, variables in the global scope become victims to "global change;" in other words, they can be changed in any place in the program, causing a debugging nightmare. The JQuery library and many others follow a pattern  called Immediately Invoked Function Expressions (IIFE), similar to:

(function() {

   // declare all variables and functions here   

}).call(this);

Function scope (similar to dynamic scoping explained below), and hence the reference to this, varies depending on the manner with which the function is being invoked, similar to before:
  1. As a standalone function. The function is in global scope, the window object owns it 
  2. As a method. The function becomes a property of said object and this refers to it. 
  3. As a constructor. The this implicitly reference is passed to the function and references the newly created object. All properties of the new object are in reachable via this
  4. By calling the apply(...) and call(...) methods. Both methods accept the owning object, which this will refer to. Essentially, the developers are in control of what the context will be.  
Throughout this post, I have identified several types of scoping rules: 

Global Scope

This is the simplest form of scoping. We covered this in detail in this post. Basically, objects created at the top level whether they are functions or variables have global scope. Also, objects created anywhere without the keyword var take on global scope. The owning object is the window object.

Lexical Scope

In this type, the visibility of a variable is based on its textual position. 


someVar = "Global";

function func() {
   var someVar = "Middle";

   return function(e) {
      var someVar = "Inner";
      alert(someVar); // will print "Inner"
   }
}
Lexical scope starts binding from the inner most scope and works outward. This is the "tightest" scoping mechanism available. JavaScript partially implements lexical scoping, but more appropriately, it implements function scoping, explained later. Languages with lexical scoping suffer from a downside called "Variable Shadowing". As you can see in the code snippet above, the variable someVar was shadowed with different layers all the way down to the innermost scope. In larger code segments, this can prove very hard to trace. 

Dynamic Scope

We covered this in detail throughout this post. The this reference depends on how a function is called: object, function, or global. Using call( )  or apply( ), developers can pass a dynamic reference to act as the context object, or pass null to indicate it belongs to the global context.

One common technique for supplying a scratch-pad to work on for the variable "this" is to pass an empty object into the call method, as such: 

someFunction.call({}, 10000);

Function Scope

Very similar to dynamic scoping. All bindings are constrained by the area bounding it, namely a function. This  is JavaScript's main scoping mechism.

function strangeIdentity() {

   for(var i=0; i<n; i++);  // notice semicolon

   return i;
}
strangeIdentity(138);  // returns 138

JavaScript groups all variable declarations to the top of the function, in a process called hoisting. JavaScript hoists all variable declarations within a function. This is somewhat confusing when compared with languages with block scope.  As good JavaScript developers, it is preferred that variables declared within functions (and within any scope) are done at the beginning.  Let's take a closer look.

Block Scope

Present in most other programming languages. Not implemented in JavaScript. As far as scope is concerned, JavaScript will ignore the curly braces in some situations in favor of function scope.  

function scopeTest() {
   for(var i = 0; i <= 5; i++) {
      var insideFor = i;
   }
   alert(insideFor);
}

In the following code, we would expect the script to fail indicating that the variable insideFor had not been declared. This is not the case, under hoisting this variable declaration was "moved up" to the top of the function and is thus visible within the entire function scope. The variable insideFor is hoisted to the top and given a value of undefined until the initialization happens within the for-loop.

To sum up, scoping can be present in the following ways:
  1. The value of the this binding, in the case of an object context or function context.
  2. The execution context defined by the value of the this binding, for cases when using apply( )  or call( )
  3. The “lifetime” of a variable. Variables exist from the moment they are declared until the function ends. Variables in the global context are accessible anywhere in the script.
  4. Variable value resolution scheme or lexical binding. Depending on where a variable appears, it's scope is bound to the function that contains it. 

Closures

A widely written about and debated topic in the JavaScript community. Closures have been a feature of functional programming languages such as Haskell for quite some time. Using them properly can have immense effects on your code.

Simply put, a closure is a scope that is created upon a function declaration allowing it access to any variables declared outside of the function --or within the enclosing (hence the name) scope.

Obviously, global variables will be included in the mix since as they are explicitly added as properties of the window object. The idea that drives developers crazy and at the same time why this is so powerful is that functions can be called at any point in time, after their declaration. So, access to that information when the function is called is preserved even after that scope has gone away.

Languages that support closures store the state of any accessible variable definitions together with the function declaration at runtime, in the functions stack. Imagine creating a "safety bubble" around all variables present, including function parameters, at the point of function declaration. This bubble will not be garbage collected until the reference to the function is gone.

function createScaleFunction(FACTOR) {
   return function(v) {
      return _.map(v, function(n) {
        return (n* FACTOR);
      });
   };
}


var scale10 = createScaleFunction(10);
scale10([1,2,3]);   //=> [10, 20, 30]

The FACTOR variable in this case is "remembered" or stored as part of the definition which gets returned to the user for later execution.  Notice that when we invoke scale10 the function is has already gone out of scope, which is impressive.  As shown before, another important use of closures is together with IIEFs to create private instance variables. In this code:


(function() {
   var _myVar = {

   };

}).call(this);

The variable _myVar will only be visible within the context or "bubble" of the immediately execution function. 

So, what's up with JavaScript?

Now we take a look at a somewhat obscure and also misunderstood JavaScript keyword, with.  The with statement is used to bound or put all properties of a JS object within the current scope, allowing us to access them without having to quality the owning object.

Support for this keyword is likely to be deprecated in new versions of JavaScript since ECMAScript 5 already prohibits its use under strict mode; however, this statement has been very controversial an debated among JavaScript experts, and you are likely to run into it in written code.

Let's take a quick look:

var someFunc = function () { return "ABC"};

var myObj = {
   someProp = 4,
   someFunc = function () { return "XYZ"};
};
with(myObj) {
   // bound the scope of myObj into the curly braces
   var xyz = someFunc();  // Who owns variable xyz? [window]
   alert(xyz);  // Prints XYZ
}

The following code above will print out "ABC " since we used the with statement to create a scope bound to object myObj. Some uses of with can be:

  1. Shortening references to an object in a deep hierarchy making code syntactically more legible (arguably harder to follow, according to Douglas Crockford)
  2. Exposing properties as top-level to a template

While reading an object's properties in an unqualified manner might look nice at a glance and make code shorter, writing to properties is very ambiguous and not consistent with what you might think. If you are within an object's scope bound by with, you would expect that assigning to or creating a new variable within this scope will implicitly create it under myObj. In other words, variable abc would belong to myObj. This is not the case. It will actually create it in the global context, which is consistent with the scoping rules mentioned previously. Only functions create new scopes, and a with block is no exception.

There is a better alternative to using with, you can use immediate functions as shown below to wrap a block of code and set the proper object scope.

Conclusion

The most powerful and yet complex concept to understand about JavaScript is that functions can be called in many different ways. Function context and scope, including the this reference,  depend on the manner in which function are invoked. While this is very powerful, it can be quite confusing.

In this post we took a brief tour of the JavaScript Function object as well as some inherent characteristics such as scope and closures. It is important to remember that closures should not be overused as they drag with it necessary memory to hold references in a function's outer scope.

Finally we looked briefly at the soon-to-be-deprecated with keyword. This can be used to achieve very powerful things at the expense of compromising readability.

Now that we have covered functions, objects, and scopes in detail stay tuned for my next blog post on Functional Javascript.

Resources

  1. Resig, John and Bibeault, Bear. Secrets of a JavaScript Ninja. Manning Publications
  2. http://javascript.crockford.com/javascript.html
  3. http://yuiblog.com/blog/2006/04/11/with-statement-considered-harmful/

Friday, December 5, 2014

Dart: A Best of Breed

I have playing with Dart quite a bit recently, and I think that the set of features it has makes it a great platform to develop small to large enterprise class applications. I have written code in languages such as Java, PHP, Python, and JavaScript, and I can see that Dart has borrowed many of the cool features of each into one great development platform. In this post I will highlight some of the core features of Dart in comparison to other languages I have used.

Get ready for this mouthful: Dart is an open source, class-based, optionally typed, structured, programming language for rich browser-based applications. Its familiar C-like syntax and scalable programming constructs, make it ideal for building single-page web applications.... ok....

Similar to other platforms such as Google Web Toolkit or CoffeeScript, you can run Dart by compiling to native JavaScript code. This will make adoption of this platform a bit easier for browser vendors other than Google. However, unlike GWT, which compiles for different targeted browsers, Dart is designed to compile your application to a single JS file targeting only the modern browsers.  Dart is optionally typed (also called documentary types), which means you can use var to assign any type of variable: String, Object, int, etc. You can use the static types as well. Documentary types improve tool integration features such as: code completion, compiler validations, and code-level documentation, which makes developer's lives easier. Darts type system can be considered a hybrid between Java's strong, static typing and JavaScript pure dynamic typing. If no type information is provided on a variable, dynamic is assumed.

In addition, Dart has a single-threaded concurrency model based on a concept called Isolates for parallel execution of code. Isolates are great because they don't have a shared-memory model, unlike Java Threads. When compiled to JavaScript, Isolates translate to HTML5 Web Workers. This stateless concurrency model is based on a message-passing middleware used to communicate from one isolate to another. In fact, the Dart runtime starts out (from its main method) within the context of an isolate; essentially, every script containing a "main" method runs in its own isolate.

Dart code can spawn a new Isolate, like Java does a for a new Thread. Isolates are different from threads in that they cannot share memory. They are ideal for creating a plugin architecture since they can be used to load external code dynamically in very much the same way to Python and PHP, but because it run in its own Isolate it will run in its own protected memory space.

On a much smaller scale, Dart also supports the development of Server-Side applications using the Dart Virtual Machine. On the server, Dart features nice I/O libraries for manipulating files and sockets. Most of the Dart libraries work on the server, except for dart:html which is in charge of DOM manipulation.

Dartium, the Dart developer edition of Chromium, is Google's open source version of Chrome. Dartium executes Dart code natively in the browser through an embedded Dart VM. For browsers that do not support Dartium, which is any other browser other than Chrome, there are a set of tools in the Dart ecosystem that might be useful:
  1. dart2js: Compiles all dart files into a single javascript file. 
  2. pub: a package manager much like Maven for Java or Composer for PHP. You can publish your libraries in Github.
  3. dartdoc: generated formatted HTML documentation (like javadoc). dartdoc comments start with "///" instead of "/**"
Like Java, Dart has single inheritance, multiple interfaces model; all classes inherit from Object. You use the keywords extends and implements, respectively. This is similar to Java's and other's inheritance model and is light years better than JavaScrip's prototypal inheritance. 
You can have public and private types and private fields are indicated via the "_" underscore. Other languages such as Python work this way as well. The latter, applies also for methods, functions, and even classes.
The "this" keyword is much more simple to understand as it refers to the instance of the class itself (like in Java) and not to the owner of the class (like in JavaScript).
You can split an application into modules or libraries --or packages. Each library is contained within its own .dart file. Libraries are defined using the "library" keyword --analogous to "package". This is a much better improvement over JavaScript since it suffers a lot from namespace pollution and variable collisions. In Dart, you can prefix (or qualify) libraries in order avoid class collisions when importing 2 libraries. For instance:

import "../my/lib/dart" as mine;

mine.commonFunction("do something"); // this will be unique
Now let's spend some time with a very powerful feature in Dart, functions. Like JavaScript, functions are first-class objects in Dart. Top-Level functions allow developers to pass functions as arguments to other functions, assign functions to variables, and dynamically call functions by name. In addition, function arguments (both objects and primitives) are passed by reference.

Dart provides a cool feature called Factory Constructors that allows class designers control the concrete implementation class for a given interface. This is really useful when you intend to have one implementation of an interface, but also to hide the underlying implementation of a class. In Java, you would have to implement it via a Factory Method pattern. In fact, String and int are implemented this way, via factory constructors. For instance:


abstract class IButton {
  
  click();

  factory IButton() {
     return new GenericButton();
  }
}

class GenericButton implements IButton {
  
   click() {
      return "Button has been clicked!";
   }

}

class CloseButton implements IButton {
  ...
}
Another cool thing about constructor methods in Dart are the initializer parameters:

class MyClass {

   var color;
   
   MyClass(color) {
      this.color = color;
   }
}

// you can also use

   MyClass(this.color) {

   }

Dart has several notations to express functions: long notatin and and short notation (similar to a Java8 lambda expression). Here is an example using short-hand syntax for getters and setters:


class MyClass {

   var _color;  // private property

   Color get color => _color;
  
   set color(Color c) => this._color = c;
   
}   

All functions in Dart return a value. Short-hand functions always return the result of evaluating the expression. Default is null. Long notation will employ the use of the "return" keyword just like any other language. You may also qualify a function as returning void. 

Dart supports closures, just like in Javascript. Closures close -over (or wrap) any local variables defined at the time of a function declaration.

Finally, Unit Testing Dart applications is easy with the built-in unittest library. Creating a test is nothing more than declaring a test function and using the Expect class to define assertions in your code. For instance:


test ("Test name", () {
   
   // anonymous function containing test body
   Expect.isNotNull(someObj);

});

Resources

  1. Dart In Action.
  2. dartlang.org

Monday, November 24, 2014

Apache Mahout: Clustering

Overview

People tend to naturally and unconsciously group things together. For instance, honey and sugar are "sweet".  When we taste items exhibiting a similar taste, we immediately describe it as "sweet."

This activity that we as humans do so quickly and instinctively is called clustering, or grouping.

The components involved in a Mahout clustering solution contain:

  1. An algorithm implementation used to perform grouping
  2. A notion of similarity and dissimilarity 
  3. A stopping condition representing when groups can no longer be formed
Circles are a good way to visualize clustering:



The center of the circle is defined as the centroid, or mean (average) of a cluster. This is the point whose coordinates are the average of the xy coordinates in the cluster. The closer items are to a cluster's center, the more similar they are.

Clustering is all about finding the similarities between 2 points or 2 items in the xy plane. Mahout contains a few clustering algorithms such as: k-means, fuzzy k-means, and canopy. In this case, "k" refers to the number of clusters to form. In the example above, "k" has been set to 3.

A Mahout clustering algorithm involves the following steps:


  1. Create a SequenceFile with the input vectors (these are boolean points in the xy space) 
  2. Create a SequenceFile with the initial cluster centers. 
  3. Pick similarity measure to use, such as: EuclidianDistanceMeasure 
  4. Define A convergenceThreshold indicating when to stop 
  5. Define number of iterations to perform 
  6. Create the Vector implementation used in input files 
Cluster centers (in step 2) are sometimes estimated or guessed; but with non-trivial data, this can be very challenging. Even if the estimated centers are way off, the k-means algorithm will adjust the centers at each iteration by computing the average center, or centroid in each cluster.

Getting perfect clusters is a science in it of itself as there are many parameters to tweak in the algorithm.

Other distance measures available in Mahout are the: SquaredEuclidianDistanceMeasure, ManhattanDistanceMeasure, CosineDistanceMeasure, TanimotoDistanceMeasure, among others.

Resources

  1. Mahout in Action. O'Reilly

Monday, November 17, 2014

Apache Mahout: An Introduction

Overview

The web grows more and more each day. Social networking has created a vehicle for users to voice their opinion about pretty much anything in the world. All this public data can be used to drastically benefit a company's business. However,  companies struggle to keep up. Manual approaches to creating reports and analysis become overwhelming. As a result, businesses have resorted to using recommender engines to reduce the level of noise and make sensible statements out of all the data they receive. Good examples of this are: Amazon, Netflix, Youtube, eHarmony, among others.

Mahout is a Java open source machine learning (or collective intelligence) library from Apache born out of the popular text search engine Lucene. Conceived in 2008, and close to its 1.0 release, it was  designed with scalability and extensibility in mind. Being an Apache product, it features nice integration with the popular Map-Reduce implementation Apache Hadoop.

Mahout provides a framework for implementing three basic things:
  1. Recommendations (Netflix, Amazon, dating sites, social networks)
  2. Clustering (Google News) 
  3. Classification (spam detection)
Recommendations are the most widely used technique today. People tend to have relatively similar preferences. Based on a user's preference for a certain object, and other similar user's preferences, a relationship between a user and an object can be established. Clustering techniques help identify structure and hierarchy among a large collection of data. Finally, classification techniques can be used to decide how much an object is or isn't part of some category. For instance, Google decides whether an contains a person's face in it. It can also be used to classify patterns as usual or unusual.

Implementation

At its core, Mahout uses primitive-based vector data sets:  [userId, itemId, preference value]. Data is only numeric for storage efficiency and performance to avoid Java's object overhead, which adds 140% more space needed to store. Since this is an introduction, in this post I will only focus on recommendations, which is the reason most people are attracted to Mahout.

Recommendations and Collaborative Filtering

Recommenders algorithms are data-intensive in nature. These techniques require no knowledge of the actual attributes of the items themselves. Using mathematical principles, recommenders produce a recommendation solely based on the relationship between users and items.

Mahout thrives on boolean data composed of a [userId, itemId]. If available, a preference value can added (much like a rating) indicating the strength of the user's preference to this item.



All algorithms relate a user to an item, but there are some differences:

User-Based Recommenders

These algorithms recommend based upon the notion of similarities among users. The following pseudo code describes the user-based algorithm process at a high level:


for every other user w
   compute a similarity s between u and v
   retain top users, ranked by similarity, as a neighborhood n

for every item i that some user in n has a preference for,
    but that u has no preference for yet
  for every other user v in n that has a preference for i
    compute a similarity s between u and v
    incorporate v's preference for i, weighted by s, into a running average
return the top items, ranked by weighted average


Explaining all possible user-based algorithms is outside of the scope of this post. In general, every used-based recommender in Mahout involves the inter-play of the following components:

  • Data model, implemented via DataModel
  • User-User similarity metric, implemented via a UserSimilarity
  • User neighborhood definition, implemented via UserNeighborhood
  • Recommender Engine, implemented via a Recommender

These algorithms can become slower as the number of users in the system increases. 

Item-Based Recommender

A cousin of the User-based recommender, Item-based algorithms are derived from how similar items are to other items. Similar to the components above, item-based algorithms contain the following components:
  • Data model, implemented via DataModel
  • Item-Item similarity metric, implemented via a ItemSimilarity
  • User neighborhood definition, implemented via UserNeighborhood
  • Recommender engine, implemented via a Recommender
A pseudo view of this algorithm will look like the following:


for every item i that u has no preference for yet
  for every item j that u has a preference for
    compute a similarity s between i and j
    add u's preference for j, weighted by s, into a running average
return the top items, ranked by weighted average

Some item-item similarities are more fixed which are good candidate for pre-computation in order to return results quicker. For instance, it's likely that two CD albums from bands such as as Metallica and Guns & Roses will continue to be similar to each other this year and the next. However, caching and pre-computation can be memory intensive (such is the case with Slope-One Recommender).

These algorithms can become slower as the number of items in the system increases. 

Evaluating Results

It is recommended to use an evaluator to score your algorithm. Using an evaluator you can split the dataset as training data and test data using a threshold value. Using list of computations and math, it will determine how "good" your recommendations are.  The smaller the score, the better the recommendation is. I highly recommend using an evaluator to experiment and tweak your recommendation engine.

Conclusion

We took a brief look at two canonical recommendation techniques: user-based and item-based. The runtime of a user based algorithm goes up as the number of users increases. The runtime of a item based algorithms goes up as the number of items increases.

Depending on your needs, you can pick an item based recommender if you have small number of items for an infinite amount of users; the converse could be used as criteria to pick a user-based approach.

Due to performance requirements and efficiency, you might need to tweak some JVM settings. For instance, the book recommends setting


-Xmx=2048  -XX:+UseParallelGC -XX:+UseParallelOldGC


I highly recommend taking a look at this framework. The 1.0 release is coming up shortly and it seems to be getting a lot of traction in the community. For large scale data projects, I highly recommended implementing Mahout on top of a Hadoop installation.

Resources

  1. Mahout in Action. O'Reilly
  2. https://www.youtube.com/watch?v=zvfKH9Yb0s0

Wednesday, October 29, 2014

Streaming into the Future with Dart

Overview

Dart is a scalable language built primarily for the web. It contains a rich set of libraries, an emerging community, and its own Eclipse-based development environment that make development straightforward, from very small scripts to fully-featured applications. However, you can also write purely server side or standalone applications with Dart. In this post, I will write about the asynchronous nature of Dart as represented in its concept of Streams and Futures.

Streams

Part of the data:sync library, Streams provide an asynchronous flow of data which can be as small as single mouse clicks to large sequential chunks of large file; also, Streams can be used to handle user-generated events.

Since Dart is a single-threaded programming language, the of Streams is essential to handle all types of events and performing file I/O.

Futures

In very simple terms, a Future represents a value that is to be delivered in the future, asynchronously; in other words, Futures are non-blocking calls that allow the application to compute values without halting the execution of the script. This is one of the pillars of scalability. Futures are implemented in the dart:async library and used extensively throughout the dart:io library.

Futures not only return values ("future succeeds") but also errors ("future fails"). In order to handle a value or an error, receivers of a Future can register callback handlers that will be called once the value or error is available, respectively. In some ways, this is very similar to the onload( ) and onerror( ) functions in the Javascript XMLHttpRequest object.

The result of registering a pair of callbacks is a new Future called the "successor." The two most important methods are: then( ) and catchError(). The latter handles any errors emitted by the calling Future or "predecessor" and the former is called when it completes.

If a successor (callback) is not registered in the event that the Future throws an error, the error will bubble-up to the global error handler. This similar to a catch all handler. The successor that handles a value is registered with the then( ) method, and similarly for catchError( ).

Because Futures register one or more pairs of callbacks that are in itself a future successor, depending of which errors you want to handle, you have two options: you can use sequential handlers or parallel ones. I think this makes the code a bit hard to follow sometimes since you need to rely on "purposeful" indentation and use of braces. If a Future fails to ever complete, then no callback will ever be called.

Futures have a nice static method called Future.forEach( ), which you can use to invoke an asynchronous operation on each element of an Iterable or Collection. This can be used for modifying multiple files or broadcasting a message to multiple sockets.

Here is an example of a simple use of futures and streams: a server and client.

/**
 * Dart Client
 * Author: Luis Atencio
 */
import 'dart:io';
import 'dart:convert' show UTF8, JSON;

void main() {
    
    Socket thisSocket;
     
    Socket.connect(InternetAddress.LOOPBACK_IP_V4, 49633).
      then((Socket sock) { 
        thisSocket = sock;        
      });    

    // Connect standard in to the socket 
  stdin.listen((data) => 
      thisSocket.write(
        new String.fromCharCodes(data).trim() + '\n'));
      
}


/**
 * Dart Server
 * Author: Luis Atencio
 */
import 'dart:io';
import 'dart:convert';

void main() {
  // Create an HTTP server and bind it to port 4040
  ServerSocket.bind(InternetAddress.LOOPBACK_IP_V4, 49633).
    then((ServerSocket server) {
      server.listen(listenForRequests);
    }).
  catchError((e) => print(e.toString()));
}


void listenForRequests(Socket client) {
  
  print('Connection from '
      '${client.remoteAddress.address}:${client.remotePort}');
    
    BytesBuilder builder = new BytesBuilder();
    client.listen((List buffer){
        builder.add(buffer);
        String str = UTF8.decode(builder.takeBytes());
        print('String from client:  ' + str);
    }, onDone: () {      
      client.close();
    });    
}

Resources

  1. https://www.dartlang.org/
  2. https://www.dartlang.org/docs/tutorials/httpserver/
  3. http://www.w3schools.com/xml/xml_http.asp
  4. https://www.dartlang.org/docs/tutorials/futures/
Add to Flipboard Add to Flipboard Magazine.