Friday, September 12, 2014

How do I begin with Hadoop?

"Tell me and I forget. Teach me and I remember. Involve me and I learn."

                                                                                   -Benjamin Franklin

I'm a big fan of practical learning, "implement as you learn" is my mantra for learning anything. Hadoop being open source gives the best opportunity for getting your hands dirty as you read about it. There are plenty of free resources online that you can refer to get started with and in this post, I'm going to list and refer some of the good ones I've come across.

Getting Started with Hadoop

Depending on your level of interest in learning and exploring Hadoop, you can enroll in any of the free online fundamental courses offered from Big Data University or watch video tutorials form edureka on YouTube. These two sources do not require a sign in from your corporate email id and give a basic overview on what Hadoop is? And of-course the documentation provided by Apache helps in understanding it detail, alternatively you can read the Yahoo Hadoop tutorial.


After you have read enough and feel like getting a hands on experience, you'll have two tracks to choose:
  1. Install it locally on your laptop

  2. Access it via a free Virtual Machine
The easy way of learning is of-course the VMs, they are available for different versions of Hadoop and will enable you to start quickly on trying out what's in the store for you? They do require a sign in from your corporate email ID but have a good knowledge base and tutorials followed by the exercises to practice what you learned. The good ones are Cloudera and Hortonworks. However, since they are a platform service provider company, you'll be hearing a sales pitch very often. Well, not bad at all to know for the resources they provide to  the community.  The only limitation I see with a VM is that you cannot perform a POC on your Company's in house live data source, unless you are flexible in taking a snapshot of it to the VM to play with.

On the other hand, you can perform Hadoop installation on your own laptop or a test server in your organization and setup things as you want. It is like doing it from scratch and it helps in learning in a better way but again it depends on what your interest is? Some of the good references for Hadoop installation  are  Apache installation guide and  single node cluster setup from Michael G. Noll.

Happy learning!

5 comments :

  1. thanks for your site, came to this through a linked in comment..thanks for pointing out the bigdatauniversity courses!!

    ReplyDelete
  2. Hi Uday, thank you for your article. it is helping me a lot.

    could you please suggest me some books to learn hadoop?

    ReplyDelete
  3. Hello Mathan, Thanks for visiting! Given the diversity of applications in Hadoop, I suggest reading it online. However, to start with the Hadoop Defenitive Guide is good, you can download the PDF version from IT eBooks for FREE

    ReplyDelete
  4. Hi Uday, Hadoop Definitive Guide is definitely a great book. And nice post by the way. I am having some problems when running MapRedufe on 5-node cluster with input file size of around 1 to 2 GB. It is taking much longer time than single-node !!. Please revert back so that I can describe the problem.

    Thanks

    ReplyDelete