Big Data on Azure – HDInsight

The HDInsight service on Azure has been in preview for some time. I have been anxious to start working with it as the idea of being able to leverage Hadoop using my favorite .NET programming language has a great appeal. Sadly I had never been able to successfully launch a cluster. Not, that is, until today. Perhaps I had not been patient enough in previous attempts, although on most tries I waited over an hour. Today, however, I was able to launch a cluster in the West US region that was up and running in about 15 minutes.

Once the cluster is running it can be managed through a web-based dashboard. It appears, however, that the dashboard will be eliminated in the future and that management will be done using PowerShell. I do hope that some kind of console interface remains but that may or may not be the case.

Figure 1. HDInsight Web-based dashboard

To make it easy to get started Microsoft provides some sample job flows. You can simply deploy any or all of these jobs to the provisioned cluster, execute the job and look at the output. All the necessary files to define the job flow and programming logic are supplied. These can also be downloaded and examined. I wanted to use a familiar language to write my mapper and reducer so I selected the C# sample. This is a simple word count job which is quite commonly used as an easily understood application of Map/Reduce. In this case the mapper and reducer are just simple C# console programs that read and write to stdin and stdout which are redirected to files or Azure Blob storage in the job flow.

Figure 2. Word count mapper and reducer C# code

One thing that is pretty cool about the Microsoft BI stack is that it is pretty straightforward to work with HDInsight output using the Microsoft BI Tools. For example the output from the job above can be consumed in Excel using the Power Query add-in.

Figure 3. Consuming HDInsight data in Excel using Power Query

That, however, is a discussion topic for another time!

If you are interested in learning more about Big Data, Cloud Computing or using Excel for Business Intelligence why not consider attending one of the new Learning Tree courses?

Kevin Kell


Learning Tree Logo

Cloud Computing Training

Learning Tree offers over 210 IT training and Management courses, including Cloud Computing training.

Enter your e-mail address to follow this blog and receive notifications of new posts by e-mail.

Join 53 other followers

Follow Learning Tree on Twitter

Archives

Do you need a customized Cloud training solution delivered at your facility?

Last year Learning Tree held nearly 2,500 on-site training events worldwide. To find out more about hosting one at your location, click here for a free consultation.
Live, online training
.NET Blog

%d bloggers like this: