Archive for the 'Cloud Computing' Category

What’s new in the world of Big Data and Cloud

Three things you need to know from the week of September 7 – 13, 2014

Apache Cassandra –

 Cassandra continues to gather momentum as a preferred noSQL database, both in terms of commercial backing and performance. Apache Cassandra v2.1, was announced at September 11 at the Cassandra Summit.

The most important change is a significant improvement in performance (up to x2). Fortunately the API is remaining stable.  The noSQL environment continues to be a battleground between different products optimized for, and targeted at, different applications ranging from document stores to tables or key-value pairs.

HP’s Purchase of Eucalyptus –

By contrast the Cloud market is starting to stabilize into a few offerings.  HP’s announcement that it had purchased Eucalyptus was greeted with surprise as HP is a major contributor to its competitor OpenStack.

HP clearly is trying to differentiate itself from the other systems suppliers, such as Cisco, Dell and IBM, by having its own AWS-compatible approach. Eucalyptus has already developed such a platform. HP management must have decided that it would be less costly to purchase the company to obtain a working AWS compatible platform that it would be to create one from scratch.

Maybe a merger is in the cards?

Big Success with Big Data –

 More than 90 percent of executives at organizations that are actively leveraging analytics in their production environments are satisfied with the results, according to a poll of more than 4,300 technology and business published by Accenture plc published last week.

Executives report big data delivering business outcomes for a wide spectrum of strategic corporate goals — from new revenue generation and new market development to enhancing the customer experience and improving enterprise-wide performance. Organizations regard big data as extremely important and central to their digital strategy. The landmark report concludes that only a negligible fraction of enterprises are not realizing what they consider adequate returns on their data investments.

Learning Tree’s Expert Cloud Instructor Kevin Jackson Announces Multiple Speaking Engagements

Kevin Jackson, a certified Learning Tree cloud computing instructor and Learning Tree Cloud Computing Curriculum Initiative Manager, is set to speak at two exciting cloud computing conferences in June.

On June 16, 2014, Mr. Jackson will be speaking at the inaugural “Cloud for Vets Training Class at Veterans 360 in San Diego, CA, and on June 21, 2014 will be speaking at the Congress of Cloud Computing 2014 at the Dalian World Expo Center in China.

Veterans 360 Services, a San Diego non-profit organization, is launching a new program aimed at helping veterans transition from the military and into a meaningful career in emerging cloud technology services. Traditional IT is rapidly transitioning to Cloud Technology and this organization aims to give veterans the cloud computing skills they need to succeed in this industry. Learning Tree is also a proud supporter of this organization. Learn more about the organization at their website: http://vets360.org/blog/

Next, Mr. Jackson will be speaking on Cloud Services Brokerage for International Disaster Response at BIT’s 3rd Annual World Congress of Cloud Computing in China. This event will aim to strengthen the technical and business ties in cloud computing in order to bring together experts and industry leaders to share technological advancements and experiences within the industry.

Kevin L. Jackson is the Founder and CEO of GovCloud Network, a management consulting firm specializing in helping corporation adapt to the new cloud computing environment. Through his “Cloud Musings blog, Mr. Jackson has been recognized as one of Cloud Computing JournalWorld’s 30 Most Influential Cloud Bloggers” (2009, 2010), a Huffington Post Top 100 Cloud Computing Experts on Twitter (2013) and the author of a FedTech MagazineMust Read Federal IT Blog” (2012, 2013).

To learn more about Learning Tree’s cloud computing curriculum, click here.

EC2 Security Revisited

A couple of weeks ago I was teaching Learning Tree’s Amazon Web Services course at our lovely Chicago area Education Center in Schaumburg, IL. In that class we provision a lot of AWS resources including several machine instances on EC2 for each attendee. Usually everything goes pretty smoothly. That week, however, we received an email from Amazon. They had received a complaint. It seemed that one of the instances we launched was making Denial of Service (DoS) attacks to other remote hosts on the Internet. This is specifically forbidden in the user agreement.

I was doubtful that any of the course attendees were intentionally doing this so I suspected that the machine had been hacked. The machine was based on an AMI from Bitnami and uses public key authentication, though, so it was puzzling how someone could have obtained the private key. Anyway, we immediately terminated the instance and launched a new one to take its place for the rest of the course.

In Learning Tree’s Cloud Security Essentials course we teach that the only way to truly know what is on an AMI is to launch an instance and do an inventory of it. I was pretty sure we had done that for this AMI but we might have missed something. I decided that I would do some further investigation this week when I got a break from teaching.

Serendipitously when I sat down this morning there was another email from Amazon:

>>

Dear AWS Customer,

Your security is important to us.  Bitrock, the creator of the Bitnami AMIs published in the EC2 Public AMI catalog, has made us aware of a security issue in several of their AMIs.  EC2 instances launched from these AMIs are at increased risk of access by unauthorized parties.  Specifically, AMIs containing PHP versions 5.3.x before 5.3.12 and 5.4.x before 5.4.2 are vulnerable and susceptible to attacks via remote code execution.   It appears you are running instances launched from some of the affected AMIs so we are making you aware of this security issue. This email will help you quickly and easily address this issue.

This security issue is described in detail at the following link, including information on how to correct the issue, how to detect signs of unauthorized access to an instance, and how to remove some types of malicious code:

http://wiki.bitnami.com/security/2013-11_PHP_security_issue

Instance IDs associated with your account that were launched with the affected AMIs include:

(… details omitted …)

Bitrock has provided updated AMIs to address this security issue which you can use to launch new EC2 instances.  These updated AMIs can be found at the following link:

http://bitnami.com/stack/roller/cloud/amazon

If you do not wish to continue using the affected instances you can terminate them and launch new instances with the updated AMIs.

Note that Bitnami has removed the insecure AMIs and you will no longer be able to launch them, so you must update any CloudFormation templates or Autoscaling groups that refer to the older insecure AMIs to use the updated AMIs instead.

(… additional details omitted …)

<<

So it seems there was a security issue in the AMI that had gone undetected. This is not uncommon as new exploits are continually discovered. That is why software must be continually patched and updated with the latest service releases. Since Amazon EC2 is an Infrastructure as a Service offering (IaaS) this is the user’s responsibility.

It was nice to have a resolution to the issue since it had been bothering me since it occurred. It was also nice that Amazon sent out this email and specifically identified instances that could have a problem. They also gave links to some specific instructions I could follow to harden each instance or a new AMI I could use to replace them.

In the end I think we will be replacing the AMI we use in the course. This situation was an example of the shared responsibility for security that exists between the cloud provider and the cloud consumer. You don’t always know exactly if you have a potential security issue until you look for it. Even then you may not be totally sure until something actually happens. In this case once the threat was identified the cloud provider moved quickly to mitigate damage.

Kevin Kell

Big, Simple or Elastic?

I recently audited Learning Tree’s Hadoop Development course. That course is listed under the “Big Data” curriculum. It was a pretty good course. During the course, though, I got to thinking “What is ‘Big Data’ anyway?”

As far as I have been able to deduce, many things that come from Google have the prefix “Big” (e.g. BigTable). Since the original MapReduce came out of some work Google was doing internally back in 2004 we get the term “Big Data”. I guess maybe if MapReduce came out of Amazon we would now be talking about SimpleData or ElasticData instead – but I digress. Oftentimes these terms end up being hyped and confusing anyway. Anyone remember the state of Cloud Computing four or five years ago?

What is often offered as a definition, and I don’t necessarily disagree, is “data too large to fit into traditional storage”. That usually means too big for a relational database (RDMS). Sometimes, too, the nature of the data (i.e. structured, semi-structured or unstructured) comes into play. So what now? Enter NoSQL.

It seems to me that mostly what is meant by that is storing data using key/value pairs, although there are other alternatives as well. Key/value pairs are also often referred to as a dictionary, hash table, or associative array. It doesn’t matter what you call it, the idea is the same. Give me the key, I will return to you the value. The key or the value may be a simple or a complex data type. Often the exact physical details (i.e. indexing) of how this occurs are abstracted from the consumer. Also, some storage implementations seek to replicate the familiar SQL experience for users already familiar with the RDBS paradigm.

In any particular problem domain you should store your data in the manner that makes the most sense for your application. You should not always be constrained to think in terms of relational tables, file systems, or anything else. Ultimately you have the choice to store nothing more meaningful than blobs of data. Should you do that? Not necessarily and not always. There are a lot of good things about structured storage in general and relational databases in particular.

Probably the most popular framework for processing Big Data is Hadoop. Hadoop is an Apache project which, among other things, implements MapReduce. Analyzing massive amounts of data also requires heavy duty computing resources. For this reason Big Data and Cloud Computing often complement one another.

In the cloud you can very easily, quickly and inexpensively provision massive clusters of high powered servers to analyze vast amounts of data stored wherever and however is most appropriate. You have the choice of building your own machines from scratch or consuming one of the higher level services provided. Amazon’s Elastic MapReduce (EMR) service, for example, is a managed Hadoop cluster available as a service.

Still, there are many organizations who do build their own Hadoop clusters on-premises and will continue to do so. To do that there are a number of packaged distributions available (Cloudera, Hortonworks, EMC) or the download directly from Apache. So, whether you use the cloud or not it is pretty easy to get started with Hadoop.

To learn more about various technologies and techniques used to process and analyze Big Data Learning Tree currently offers four hands on courses:

All are available in person at a Learning Tree education center and remotely via AnyWare.

Kevin Kell

The Cloud goes to Hollywood

Earlier this week I attended a one day seminar presented by Amazon Web Services in Los Angeles entitled “Digital Media in the AWS Cloud”. Since I was involved in a media project recently I wanted to see what services Amazon and some of their partners offer specifically to handle media workloads. Some of these services I had worked with before and others were new to me.

The five areas of consideration are:

  1. Ingest, Storage and Archiving
  2. Processing
  3. Security
  4. Delivery
  5. Automating workflows

Media workflows typically involve many huge files. To facilitate moving these assets into the cloud Amazon offers a service called Amazon Direct Connect. This service allows you to bypass the public Internet and create a dedicated network connection into AWS. This allows for transfer speeds up to 10 Gb/s. A fast file transfer product from Aspera and an open source solution called Tsunami UDP were also showcased as a way to reduce upload time. Live data is typically uploaded to S3 and then archived in Glacier. It turns out the archiving can be accomplished automatically by simply setting a lifecycle rule for objects in buckets that automatically moves them to Glacier at a certain date or when the objects reach a specified age. Pretty cool. I had not tried that before but I certainly will now!

For processing Amazon has recently added a service called Elastic Transcoder. Although technically still considered to be in beta this service looks extremely promising. It provides a cost effective way to transcode video files in a highly scalable manner using the familiar cloud on-demand, self-service payment and provisioning model. This lowers the barriers to entry for smaller studios which may have previously been unable to afford the large capital investment required to acquire on-premises transcoding capabilities.

In terms of security I was delighted to learn that AWS complies with the best practices established by Motion Picture Association of America (MPAA) for storage, processing and privacy of media assets. This means that developers who create solutions on top of AWS are only responsible for creating compliance at the operating system and application layers. It seems that Hollywood, with its very legitimate security concerns, is beginning to trust Amazon’s shared responsibility model.

Delivery is accomplished using Amazon’s CloudFront service. This service offers caching of media files to globally distributed edge locations which are geographically close to users. CloudFront works very nicely in conjunction with S3 but can also be used to cache static content from any web server whether it is running on EC2 or not.

Finally, the workflows can be automated using the Simple Workflow Service (SWF). This service provides a robust way to coordinate tasks and manage state asynchronously for use cases that involve multiple AWS services. In this way the entire pipeline from ingest through processing can be specified in a workflow then scaled and repeated as required.

So, in summary, there is an AWS offering for many of the requirements needed to produce a small or feature length film. The elastic scalability of the services allows both small and large players to compete by only paying for the resources they need and use. In addition there are many specialized AMIs available in the AWS Marketplace which are specifically built for media processing. That, however, is a discussion for another time!

To learn more about how AWS can be leveraged to process your workload (media or otherwise) you might like to attend Learning Tree’s Hands-on Amazon Web Services course.

Kevin Kell

Big Data on Azure – HDInsight

The HDInsight service on Azure has been in preview for some time. I have been anxious to start working with it as the idea of being able to leverage Hadoop using my favorite .NET programming language has a great appeal. Sadly I had never been able to successfully launch a cluster. Not, that is, until today. Perhaps I had not been patient enough in previous attempts, although on most tries I waited over an hour. Today, however, I was able to launch a cluster in the West US region that was up and running in about 15 minutes.

Once the cluster is running it can be managed through a web-based dashboard. It appears, however, that the dashboard will be eliminated in the future and that management will be done using PowerShell. I do hope that some kind of console interface remains but that may or may not be the case.

Figure 1. HDInsight Web-based dashboard

To make it easy to get started Microsoft provides some sample job flows. You can simply deploy any or all of these jobs to the provisioned cluster, execute the job and look at the output. All the necessary files to define the job flow and programming logic are supplied. These can also be downloaded and examined. I wanted to use a familiar language to write my mapper and reducer so I selected the C# sample. This is a simple word count job which is quite commonly used as an easily understood application of Map/Reduce. In this case the mapper and reducer are just simple C# console programs that read and write to stdin and stdout which are redirected to files or Azure Blob storage in the job flow.

Figure 2. Word count mapper and reducer C# code

One thing that is pretty cool about the Microsoft BI stack is that it is pretty straightforward to work with HDInsight output using the Microsoft BI Tools. For example the output from the job above can be consumed in Excel using the Power Query add-in.

Figure 3. Consuming HDInsight data in Excel using Power Query

That, however, is a discussion topic for another time!

If you are interested in learning more about Big Data, Cloud Computing or using Excel for Business Intelligence why not consider attending one of the new Learning Tree courses?

Kevin Kell

Google Compute Engine Revisited

It has been awhile since I have written anything about Google Cloud Computing. I started to take a look at Google Compute Engine over a year ago but I was stopped because it was in limited preview and I could not access it. It looks like GCE has been made generally available since May so I thought I’d check back to see what has happened.

To use GCE you sign into Google’s Cloud Console using your Google account. From the Cloud Console you can also access the other Google cloud services: App Engine, Cloud Storage, Cloud SQL and BigQuery. From the Cloud Console you can create a Cloud Project which utilizes the various services.

Figure 1. Google Cloud Console

Unlike App Engine, which lets you create projects for free, GCE requires billing to be enabled up front. This, of course, will require you to create a billing profile and provide a credit card number. After that is done you can walk through a series of steps to launch a virtual machine instance. This is pretty standard stuff for anyone who has used other IaaS offerings.

Figure 2. Creating a new GCE instance

The choice of machine images is certainly much more limited than other IaaS vendors I’ve used. At this time there seems to be only four available and they are all Linux based. Probably Google and/or the user community will add more as time passes. It is nice to see the per-minute charge granularity which, in actual fact, is based on a minimum charge of 10 minutes and then 1 minute increments beyond that. The smallest instance type I saw, though, was priced at $0.115 per hour which makes GCE considerably more expensive than EC2, Azure and Rackspace. When you click the Create button it only takes a couple of minutes for your instance to become available.

Connecting to the instance seemed to me to be a little more complicated than other providers. I am used to using PuTTY as my ssh client since I work primarily on a Windows machine. I had expected to be able to create a key pair when I launched the instance but I was not given that option. To access the newly created instance with PuTTY you have to create a key pair using a third party tool (such as PuTTYgen) and then upload the public key to GCE. You can do this through the Cloud Console by creating an entry in the instance Metadata with a key of sshKeys and a value in the format <username>:<public_key> where <username> is the username you want to create and <public_key> is the actual value of the public key (not the filename) you create. This can be copied from the PuTTYgen dialog. A bit of extra work but arguably a better practice anyway from a security perspective.

Figure 3. Creating Metadata for the public key

After that is done it is straightforward to connect to the instance using PuTTY.

Figure 4. Connected to GCE instance via PuTTY

At this point I do not believe that Google Compute Engine is a competitive threat to established IaaS providers such as Amazon EC2, Microsoft Azure or Rackspace. To me the most compelling reason to prefer GCE over other options would be the easy integration with other Google cloud services. No doubt GCE will continue to evolve. I will check back on it again soon.

Kevin Kell


Learning Tree Logo

Cloud Computing Training

Learning Tree offers over 210 IT training and Management courses, including Cloud Computing training.

Enter your e-mail address to follow this blog and receive notifications of new posts by e-mail.

Join 53 other followers

Follow Learning Tree on Twitter

Archives

Do you need a customized Cloud training solution delivered at your facility?

Last year Learning Tree held nearly 2,500 on-site training events worldwide. To find out more about hosting one at your location, click here for a free consultation.
Live, online training
.NET Blog

%d bloggers like this: