Pages

Monday 25 August 2014

Hadoop 1.0

some rights reserved by intelfreepress

Got opportunity to work with Hadoop with a recent project. Understanding the architecture conceptually and then learn using it is fun. I am still learning it though. This part explains Hadoop 1.0 which I believe helped me to understand 2.0 well.

Before understanding Hadoop it would make more sense if we get rid of a few jargon that would come our way.


Oozie is like a cronjob. In addition to running jobs time based, it also allows running jobs on particular statuses, like:
i. Run job B when Job A completes
ii. Run Job B when a particular file is present.
Oozie is a kind of job coordinator. When you write Oozie, you need to specify in an xml the kind of workflow you want it to follow.

Hive is a dataware housing system for hadoop.

Pig Latin is a kind of Data Analysis. Those who are not java programmers and may not be able to write complex java code can use pig to run the MapReduce job on Hadoop. Syntax is similar to erlang programming language where you could say load this file , group by etc. A simple program of 2 -3 lines which in java may take a some effort could be written easily here. The limitation could be that you may not be able to write very complex logic in Pig. If you have something simple as a log file that you just want to read data from that is separated by tabs, you could use Pig.

Mahout is a machine learning. It provides all the machine learning algorithms which are in place. Mahout just writes a wrapper around existing algos so that you can use it along with Hadoop.

HBase is hadoop database. Its a NoSQL kind of DB which is inside Hadoop and a lower layer of file system that is HDFS.

HDFS Hadoop distributed file system, will sit on top of your existing filesystem. In the native OS the file system could be ext3, ext4 etc. The HDFS sits as a pseudo filesystem on top of your existing filesystem so that you can visualize all the different filesystems on differnt nodes and clusters as one.

Flume and Scoop: Flume handles unstructured data like tweets, logs or semistructured data like xmls. Scoop handles structured data. We can configure flume to read data form apache log directory : /var/log/apache2/error.log

Hadoop has two very important parts.
1. Storage unit
2. Processing unit



Storage unit is where you store, read and write data onto i.e HDFS. This HDFS is distributed across nodes. They are natively redundant, that means it is by default redundant and you do not need to configure RAID or any other software or hardware for this. It would by default make 3 copies of each block. This number is configurable to any number of copies you need.

Hadoop has a Master Slave architecture. It has a master node and a few slave nodes that talk to the master. The master node in Hadoop is known as Name Node. All the nodes that have Data Nodes running could be slave nodes. Different instances can have different or same type of filesystems, but HDFS will be laid across all these nodes. So when you run the command "hadoop fs -ls /" it doesn't bother as to which node the file is present, It will bring files from all nodes data node1 data node2 data node3... and show up. The NameNode keeps track of all the data blocks.

Processing Unit is provided by the MapReduce framework. This can be called as the brain of Hadoop. When you supply a job to Hadoop, it will split the jobs according to the processors you have. If you have a Hadoop cluster with two nodes, each node with one processor. You can run two processes in one cluster parallely.
Locality of reference: Earlier, when you wanted to process a data you would move data close to the processing unit. This required you to 1st copy the data.  It would pull the data in the RAM and then it would process it there. This is fine when dealing with small amount of data. But in case of Hadoop you will be dealing with data in terabytes and petabytes. Here copying or moving of huge data is time consuming. In order to not invlove any data movement, now Hadoop instead moves the processing unit itself to where the data is present. This is called locality of reference.

(To be contd..)

Sunday 27 July 2014

Mumbai Technology Meetup - DevOps Special



On July 27th 10 am a DevOps special meetup was conducted at Directiplex, Mumbai. Its very rare to see a meetup given importance as much as any other technology conference. Speakers from different organizations were present and shared their knowledge. Tremendous knowledge and experience shared free of cost. The meetup went on from 10 am (Started a little late) and went around till 5.30 pm. The agenda itself was too appealing.

No entry fees. Its a free event. Just ensure you learn and make use of that learniing :-)

======================================

11.00 am - 12.00 pm : SaltStack [incl. LXC basic]: by Rigved Rakshit - Directi

Rigved introduced to LXC its setup, its concepts and how its similar/different than Docker. SOme commands and configs. Due to lack of time he could not cover Saltstack though

=====================================

12.00 pm - 1.00 pm : Configuration Management at Rackspace by Shaunak Kashyap - Rackspace

Shaunak conducted this on Hangout while he was at a 12 hr difference. Shaunak showed how rackspace uses Ansible for getting provisioning and other CM task automated.

======================================

1.00 pm - 1.30 pm : Chef Fundamentals and DevOps by Sanju Burkule - OpexSoftware

Sanju took a brief introduction to Chef and how OpexSoftware who are partners of Chef and conduct professional Chef training with certifications. Sanju also shared his knowledge on how Chef is different from Puppet as he has used both.

======================================

1.30 pm - 2.00 pm : Lunch - Lets not talk about this. Blame the rain.

======================================

2.00 pm - 3.00 pm: Puppet [incl. preparatory VirtualBox fundamentals] by Ashish Chandra. - Reliance Jio

Ashish took introduction to Puppet some basics , how easy is it to setup a Puppet master and get going. He also shared some of his scripts that he uses to provision 500 instances in 6 - 7 minutes.

=======================================

3.00 pm - 4.00 pm: Ansible by Aditya Patawari - BrowserStack

This was the 2nd time I met Aditya, we met earlier at RootConf in Bangalore. Aditya shared introduction to Ansible and how is it better/different than Chef//Puppet.

=======================================

4.00 pm - 5.00 pm : Capistrano by Mayur Rokade - Directi

Mayur conducted a live demo of how to use Cap for deploys and a a little intro and setup for Capistrano.

=======================================

5.00 pm - 6.00 pm : Docker Fundamentals by Augustine Correa - Organizer of the event

Wednesday 16 July 2014

Bugzilla Mail Sending Issue


Lately an issue was assigned to me where Bugzilla Email notification failed with an 504 gateway timed out error. We use gmail service for sending mails.

After checking the configuration everything seemed to be just fine, except the email was not getting sent and while updating any issue in Bugzilla a 504 error was sure to come.

After a little debugging we got rid of 504 by disabling Email service, but this was not quiet what we wanted. After googling a bit I got to know Bugzilla did not support Gmail as SMTP earlier, but now it did and that we need to install a few external packages for this. I found a tonne of articles with some misleading information or I don't know if I was doing something wrong there.

I applied this patch 1st in the Bugzilla setup directory.

patch < mypatchfile

I first installed Net-SMTP-SSL package after reading a few blogs with CPAN Shell 

perl -MCPAN -e shell

cpan> install Net::SMTP::SSL

./checksetup.pl

Check for Net-SMTP-SSL (v1.01)     ok: found v1.01

Later I tried sending notifications with SMTP in Administration > Parameters > Email (Many articles say that you will see a Gmail option or TLS option, however after installing many packages I didn't see any of those. I was unable to send a mail with SMTP even after many trails.

I finally switched to Sendmail, there was a delay but the mail was getting sent now. But the old problem was still there. Every time you update an issue, you get a 504. I suspect that because there was a delay in the mail being sent, the page used to wait for the mail being sent and then show up, but since it was too long, there could be a time-out value either in Apache or in Bugzilla config that showed up the 504 page.

Next I chose the option use_mailer_queue to be ON, and started the jobqueue.pl deamon. Now the mails are getting sent with no 504. I still suspect that it might have worked without the Net-SMTP-SSL package too.

RootConf 2014


I attended conference and workshop on Devops and Cloud Infrastructure - RootConf 2014 at Bangalore from 14th May to 17th May 2014.

Go Continuous Integration Tool developed by Thoughtworks

CI and Release Management tool developed by Thoughtworks. Helps to manage the build, test it and finally release. It allows you to distribute your build across many systems. So you can run your software on different platforms and make sure that it runs over all of them. You can even divide your tests and parallely run them on different systems, that way you get faster results. All the environments can be managed centrally, so you can promote builds from one environment to the next one.

Simple Steps:
1. Install Go Agent software on all the machines that are part of you system/cloud. 
2. Next configure all the agents to connect to the Go Server
3. Finally Approve every build from Management Dashboard.
4. Associate relevant resources tags for appropriate build tasks with the compatible agents (e.g resource linux, etc)

SELinux : Security Enhanced Linux

This session typically discussed about how SELinux is useful and how people do not understand its importance. Some security policies were discussed. Three important modes of SELinux; enforcing, permissive and disable. Also the hands on session discussed the behavior of these 3 modes of SELinux.

Docker : Light weight linux container

The dry-run of this session was conducted in the Docker-meetup we attended. It was a nice revision as well as some additional parts of docker were known in the session. Docker is a containerization tool that helps you make a light weight Linux container to pack, ship and run you application anywhere. It is an easy to learn tool with very few commands to be learnt and make your own Dockerfile. The containers can be shipped by making an image registry(or by using docker's image registry) by pushing the container from a dev environment and then pulling it from stage, test or prod. You are sure to see the similar environment as that of dev. So no more, "It works on my machine" reason to be heard.

Ansible : Configuration Management Tool

This is just another configuration management tool just like chef. The pros about the tools are :
1. No client server architecture
2. Very easy to install and almost no configuration needed.
3. Very simple to write playbooks, a non programmer can as well understand and write the code within no time since its just a yaml file.

Cons are:
1. Doesn't work on windows.
2. Not much support available since its relatively new.

Jenkins : Continuous Integration

Continuous Integration workshop was pretty much the way we do it traditionally. The workshop covered basics of Jenkins how a job is made and a build is tested an automated.

Conference Update 

Most of the conference talks focused on Docker LXC, Puppet/AnsibleSelf healing techniques was introduced and sounded something that we could try and introduce wherein the servers would first intelligently check and see if the problem could be solved by its own with various scripts based on the type of issue. Technique to on board a new team member without merely asking him to read documentation and rather practice hands on was discussed in brief. Tsuru is a new tool that we heard of that may help us to simplify and automate docker as well. A demo on how to testing your infrastructure with Kitchen was shown and that is exactly how we learned it also a part of it showed integration with Docker which is something we need to implement. Introduction to Microsoft Azure cloud gave us an idea of how it is different than other cloud providers. Heartbleed was discussed in brief along with few other security threats and a demo on how Heartbleed actually affected.

Wednesday 4 June 2014

Install a Patched Ruby Interpreter With Rbenv and Ruby-build for 2.0.0-p247


Installation of Ruby 2.0.0-p247 recently had some issues with Openssl package for Centos 6.5. I had to patch the version to get it running. Following script was written later that worked to automate the patch later.

#!/bin/sh rm ~/.rbenv/cache/* -rf mkdir /tmp/build wget https://raw.github.com/sstephenson/ruby-build/master/share/ruby-build/2.0.0-p247 cp 2.0.0-p247 /tmp/build/ # download and patch the ruby sources wget http://ftp.ruby-lang.org/pub/ruby/2.0/ruby-2.0.0-p247.tar.gz tar xvzf ruby-2.0.0-p247.tar.gz cd ruby-2.0.0-p247 curl https://gist.githubusercontent.com/spkane/8059362/raw/01585dcf6b33254124566f4521a3946e6f26e0a9/ruby-2.0.0-p247-openssl-el65.patch | patch -p1 cd .. tar -cvzf ruby-2.0.0-p247-openssl.tar.gz ruby-2.0.0-p247 # download and patch the ruby-build version definition sed 's|"2.0.0-p247.*|"2.0.0-p247-openssl.tar.gz" "file:///tmp/ruby-build/2.0.0-p247"|' < 2.0.0-p247 > 2.0.0-p247 #install the patched version rbenv install /tmp/build/2.0.0-p247 rbenv rehash

Tuesday 3 June 2014

Integrating Docker with Chef


Ever since Docker was introduced to me, the first thing that came to my mind was, Docker is a replacement for Chef. Docker does almost everything that Chef did and provides me with a light weight solution. While Chef just configures the system. Well I was wrong. I read many articles, heard many speeches and came to a conclusion that, it was not Chef vs Docker, it was Chef with Docker. When they work together, you get the most powerful tool to deliver fast, light and efficiently.

I conducted a demo at Docker-Bangalore meetup #4 to show how integration with Chef and Docker works. More information on the meetup page.

What is Chef?
  • Configuration Management Tool.
  • Helps write Infrastructure as a code.
  • Users write code called ‘recipes’ to help manage and configure servers and applications.
  • Runs in client/server as well as stand-alone configuration called chef-solo.
  • Ensures each resource is properly configured and is in the desired state.
  • More on my blog Learning Chef.
What is Docker?
  • Light weight Linux container that packages an application and all its dependencies in a virtual container.
  • The container can be packed, shipped and the application can run on any Linux machine on public cloud, private cloud or bare metal.
  • Does not include OS.
  • Layers the changes made in the container just like a version control system, which allows you to reach any previous state within no time.
  • More on my blog Docker.
Why use Chef+Docker, What are the issues with Vagrant+Chef?
  • If each project is in its own VM, resource usage is prohibited. 
  • All the projects in a Single VM, management will be difficult, different versions for same software for different projects. 
  • Building and Rebuilding takes long time, partial builds or rebuilds are difficult. Unless explicitly snapshot is created at various stages. 
  • If you rely on external dependencies(which we do), full rebuilds can fail  due to one or more broken dependencies. 
  • If your stack has multiple components (web, db, cache, etc) and they are installed in 1 Vagrant VM, the resulting setup differs from production. Or you could use multiple VMs… but point # 1 holds.
Chef Pros:
  • Great at provisioning (knife bootstrap and plugins)
  • Configuring services – Writing your infrastructure as a code.
  • Testing your Infrastructure
Chef Cons:
  • Packaging applications
  • Deployments and rollbacks
  • Dynamic service delivery
Docker Pros:
  • Packaging applications
  • Deployments and rollbacks
  • Dynamic service delivery
Docker Cons:
  • Managing persistent storage
  • Complex networking
  • Monolithic services
How to get Chef and Docker work together?
The key is understanding the Developers and Operations workflow and making them work together.

Developers own:
  • Code and Libraries
  • Build and Test Automation
  • System Packages
  • Runtime Configuration
  • Release Management
  • Logs monitoring
  • Horizontal Scaling
Operations own the Platform:
  • Hosts
  • Routers
  • Monitoring
  • Logs
  • Security
  • Backing Services
Basically,
Developers take the control of the containers
&
Everything outside containers for Operations

Setting up Chef and Docker to work together:
  • Install Chef workstation with Omnibus Installer.
  • Create an account on manage.opscode.com or create your own chef-server.
  • Create a vagrant node or a cloud instance.
  • Site install docker community cookbook with site install.
  • Create your own cookbook which depends on docker.
  • Upload all cookbooks.
  • Bootstrap the node.
  • Login and verify of the node has the required images.
References:
  • Brain Flad's chef-docker cookbook
  • Gabriel Monroy's ChefConf presentation
  • StackOverflow, Google groups and Quora

Friday 23 May 2014

Learning Chef - Part - II


some rights reserved by Matt Ray

...continued from  Learning Chef - Part - I

Consider you have to install an application. You 1st install and configure the application on a single server. It could be a developers laptop/workstation. In order to setup the application you have to perform various installation procedures i.e install packages, start services, manage database etc.
After sometime you are going to make the Application available to a larger number of public than your laptop/workstation can handle and so will need to add a database server and will make this a multi tier application. So now we have one server handling the Application request and a separate server for database. To avoid data loss we will add another App server and Database server to keep the data redundant so that data loss is avoided.
As time passes the load increases on the server with more number of people trying to access the server so we may need to add a scaling solution like a load-balancer to the server.
As in how the application usage increase and the amount of users increases we will need to add more and more app servers and add more load balancers for them.
As the database is not able to cope up with the high demand, we need to add a DB cache to the existing solution, Making the infra even more complex.

Chef is Infrastructure as a code. Using Chef you can programmatically provision and configure components. Chef ensures that each node complies to the policy. Policies are determined by the configurations included in each Node's run list. You can define the policy in your Chef configuration. Your policy states what state each resource should be in, but not how to get there. Chef-client will pull the policy from the Chef-server and enforce the policy on the Node. Policy will state what needs to installed but not how it needs to install. Chef is intelligent enough to figure that out. Chef will enforce the policy based on the resource that you specified.

Setup:

Setup a Chef Environment first by setting up Chef workstation, use the following command on Ubuntu,


curl -L https://www.opscode.com/chef/install.sh | sudo bash
 
 Login to manage.opscode.com and download the starter-kit there. Extract the chef-repo to your home directory. It should show the following contents.

cd chef-repo
ls
  .berkshelf
  .chef
  cookbooks
  roles
  .gitignore
  Berksfile
  chefignore
  README.md
  Vagrantfile

Check the .chef file present in the directory, it should show the following content.

cd .chef
  knife.rb
  org-validator.pem
  user.pem

Knife.rb will show your chef-server configuration for the workstation to be identified by the chef-server.

vim knife.rb
  # See http://docs.opscode.com/config_rb_knife.html for more information on knife configuration options

  current_dir = File.dirname(__FILE__)
  log_level                :info
  log_location             STDOUT
  node_name                "user"
  client_key               "#{current_dir}/user.pem"
  validation_client_name   "org-validator"
  validation_key           "#{current_dir}/org-validator.pem"
  chef_server_url          "https://api.opscode.com/organizations/user"
  cache_type               'BasicFile'
  cache_options( :path => "#{ENV['HOME']}/.chef/checksums" )
  cookbook_path            ["#{current_dir}/../chef-repo/cookbooks"]

Writing Recipes:

package "apache2" do
  action :install
end

template "/etc/apache2/apache.conf" do
  source "apache2.conf.erb"
  owner "root"
  group "root"
  mode "0644"
  variable(:allow_override => "All")
  notifies :reload, "service[apache2]

service "apache2" do
  action [:enable,:start]
  supports :reload => true
end

Lets consider the above recipe and understand it.
Each recipe has resources in it. The resources have :
- types -> package, template, service are the types of resources in the code
- names -> apache2, /etc/apache2/apache.conf, apache2(service) are the names of the resources in the code
- parameters ->   source "apache2.conf.erb"   owner "root"   group "root"   mode "0644"   supports :reload => true
- action to put the resource on desired state -> action :install, action [:enable,:start]
- send notification to other resources -> notifies :reload, "service[apache2]


A cookbook can be created by the command
knife cookbook create cookbookname
This will automatically create the cookbook along with all the necessary files inside it. You can delete, check the available cookbooks, delete a cookbook, upload and download a cookbook using different knife commands. You can check all the options by
knife cookbook --help

To bootstrap a new node you need to use the following command:
knife bootstrap hostname --sudo -x username -P password --ssh-port 2222 -N nodename

Creating Environments
Many a times you will want development environment and production environment to have little different configurations. e.g xdebug to be installed on dev but not on production. PayPal enabled on prod but not dev etc. Chef allows you to define different environments and also allows you to assign different nodes to a particular environment.

create a directory called environments in the chef-repo. Add the following content to a file dev.rb in it.

name "dev"
description "The dev environment"

create another file called prod.rb there and add the following content to it.

name "prod"
description "The prod environment"

Upload the environment to the chef server.
You can verify whether the environments are created in the chef server by logging in there.

Creating roles:
Roles provide a way to apply a group of recipes and attributes to all the nodes performing a particular function. e.g all the nodes which would work as db server can be assigned a db server roles and accordingly all the db server specific recipes can be applied to those nodes.

Roles can be created in the following manner. Create a roles directory in the chef-repo if it does not exist. add a file base.rb there with the following content.

name "base"
description "Base role applied to all nodes."
run_list(
  "recipe[users::sysadmins]",
  "recipe[sudo]",
  "recipe[apt]",
  "recipe[git]",
  "recipe[build-essential]",
  "recipe[vim]"
)
override_attributes(
  :authorization => {
    :sudo => {
      :users => ["ubuntu", "vagrant"],
      :passwordless => true
    }
  }
)

Here the runlist method defines a list of recipes to be applied to all the nodes that have base role. The override_attributes method tells lets us override the default attributes used by the recipes in the list. Here we are overriding attributes used by the sudo cookbook so that "vagrant" and "ubuntu" users can run sudo without entering password.

Next create another role Webserver by creating a file webserver.rb in the roles directory with the following content.

name "webserver"
description "Web server role"
all_env = [
  "role[base]",
  "recipe[php]",
  "recipe[php::module_mysql]",
  "recipe[apache2]",
  "recipe[apache2::mod_php5]",
  "recipe[apache2::mod_rewrite]",
]

run_list(all_env)

env_run_lists(
  "_default" => all_env,
  "prod" => all_env,
  #"dev" => all_env + ["recipe[php:module_xdebug]"],
  "dev" => all_env,
)

Here it shows that a method env_run_lists method in a role to define different run lists for different environments. To simplify things we create an all_env array to define the common run list for all environments, and then merge in any additional run list items unique to each environment.

Next create another role db_master.rb file with following contents:

name "db_master"
description "Master database server"

all_env = [
  "role[base]",
  "recipe[mysql::server]"
]

run_list(all_env)

env_run_lists(
  "_default" => all_env,
  "prod" => all_env,
  "dev" => all_env,
)

upload the created roles to chef-server and also verify the same. Upload roles by:
knife role from file roles/base.rb
knife role from file roles/webserver.rb
knife role from file roles/db_master.rbenv


Setting up a user account for sys-admin
Define a user account for yourself on all the nodes with admin privileges. This can be done by defining a data bag for the users cookbook, with attributes that describe the user account to create.

mkdir -p data_bags/users
vim data_bags/users/$USER.json

Add the following to the $USER.json
{
  "id": "jkg",
  "ssh_keys": "ssh-rsa ...SecretKey... roshan4074@gmail.com",
  "groups": [ "sysadmin", "dba", "devops" ],
  "uid": 2001,
  "shell": "\/bin\/bash"
}

Upload the data bag as well to the chef-server and verify
knife data bag create users
knife data bag from file users $USER.json

Sunday 11 May 2014

Meetup.com practices

Some rights reserved by Christain Senger
For quiet sometime I have been attending meetups/sessions organised through meetup.com at different locations. I have seen that the general practice is quiet similar everywhere. 
  • The Agenda is posted on the meetup group by the organizers/speakers . 
  • People mark RSVP(even when most of them wont show up). 
  • Less than 30 percent of attendance is seen.
  • The actual event would start at least 30-40 minutes (or more) late than it was scheduled because of late comers.
  • The speakers and the participants would just socialize or sit idle.
  • The late comers would give the same reason always; couldn't find the location/stuck in traffic.
  • Many 1st time visitors for the meetup who will probably have no clue of what the meetup is all about and will be expecting to have the basics covered 1st.
  • The organizers would probably consider reviewing the basics based on the majority.
  • The meetup will cover most of the times everything as per the agenda.
  • A break with some snacks/refreshments and for socializing.
  • Meetup concludes with an informal planning for the next meetup to be arranged.
  • Feedback mail received for the meetup.
Finally it all goes well here, but I see a few problems that could be avoided. 

Latecomers :  
 This is something that cant be avoided. However I honestly feel that this could be minimized to a certain extent for sure. Also, most of the organisers already practice this and I think it helps them for sure.
  • While giving a time, keep a buffer of at least 15-20 minutes as certain things like traffic cannot be avoided.
  • Provide a google maps link for the users to locate things fast. Also some important landmark nearby could help too.
  • Traffic situation at a particular time in general will also be helpful for people to leave little early. e.g if the meetup is conducted in the evening, at a busy location, it would take hours to reach the location.
  • A way to reach the location would help too. e.g if someone comes by bus, the bus stop he should be informing to the bus-conductor and the bus number he needs to look for. If by rickshaw, the nearest possible landmark to the location and approximate walkable distance from any location if its complicated to find.

Newcomers :
Many meetups will have new faces who expect basics or the introductory things to be covered for them to keep up with the pace. When the meetup starts directly these people don't get most of the things and then they would probably not join in the next meetup as well. Covering the basics time and again will make the regularly coming people bored as it would just eat up their time.
  • Newcomers need to make an attempt to reach the venue as early as possible and ask questions to the organizers or the people present there and make the maximum out of the time to understand the introductory part.
  • The organizers could play slides of the previous meetup or of the introductory meetups for the beginners/newcomers to know what was covered or perhaps covering the basics for the 1st 15-30 minutes buffer can also be a good idea provided the newcomers are coming early.
  • Beginners can also read about the past meetups and check if the slides for the past meetups are available on the page and review them, understand what topics are covered, read them and then join the meetup. Its just like attending the classes in the college and reading a brief info of the topic before the lecture.

Socialize : 
Less participation is been seen in terms of socializing at a few meetups. You never know, who could help you in what kind of problems where you are stuck at office. I have come across so many situations where I don't get what use cases could be followed for a particular problems and the experts I meet in the meetup/conferences have really splendid and simple solutions that I could not have think of. Not only will they give you a solution, they would also explain why would that solution be the best one to be used. Michael Ducy from Chef(old Opscode) provided a really good, simple and descriptive answer for the best use case to be followed for Chef-Docker integration.

Incorrect RSVP :
One practice that has been seen is that people would simply click attending/going and will not turn up for the meetup. In almost every meetup I see an attendance less than 30%. Planning to attend a meetup in advance and marking RSVP is good, however if the plan changes, updating the RSVP is a good practice too. It will keep the organizers updated and help them arrange the function well. It gives me a feeling that people would just RSVP yes to a meetup cos its free. It becomes difficult for the organisers to arrange the meetup because of incorrect RSVPs. I have seen last minute arrangements being made to many meetups and I will surely not blame them for it, its just the RSVP that they cant trust and then have to rely on the last minute attendance.
  • Update your RSVP whenever you change your decision for any reason.
  • Mention reason in the comments if your decision changes due to any reason so that the co-participants will look for you in the next meetup. Everyone is an important member in the meetup.
  • If you plan to bring a friend/colleague along with you, update the RSVP to reflect the change.

Feedback :
The organizers keep looking for feedback for the meetup that they organized voluntarily and selflessly (marketing the brand can be ignored for sometime). The attendance is as low as 30% and the feedback is even lesser, close to around 10%. An honest feedback helps the organizers organize better in future. 
  • Always provide a feedback after the meetup verbally as well as on the meetup page.
  • Let the people know that it was nice to see them at the meetup. This builds a good network, there are high chances that next time they would attend the meetup to meet you and socialize further. 
  • Its all about giving respect and getting it back.

Friday 9 May 2014

Docker - Lightweight Linux Container



Docker: Its a tool that helps you to pack, ship and run any application as a light-weight Linux container. More on https://www.docker.io/

Works best on Linux kernel 3.8 Ubuntu 12.04 precise has 3.2 and needs to be upgraded. 

Install Docker with on Ubuntu 12.04:

sudo apt-get update
sudo apt-get install linux-image-generic-lts-raring linux-headers-generic-lts-raring

sudo reboot

To check docker version:
sudo docker version

Client version: 0.11.1
Client API version: 1.11
Go version (client): go1.2.1
Git commit (client): fb99f99
Server version: 0.11.1
Server API version: 1.11
Git commit (server): fb99f99
Go version (server): go1.2.1
Last stable version: 0.11.1

To check info about docker installed:
sudo docker info

Containers: 1
Images: 9
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 11
Execution Driver: native-0.2
Kernel Version: 3.11.0-20-generic
WARNING: No swap limit support

To pull an existing docker image:
sudo docker pull <imagename>

sudo docker pull busybox

HelloWorld in docker:
sudo docker run busybox echo HelloWorld

Search for an existing image in the index:
docker search <image-name>
sudo docker search stackbrew/ubuntu
NAME                       DESCRIPTION                                     STARS     OFFICIAL   TRUSTED
stackbrew/ubuntu           Barebone ubuntu images                          36                   
jprjr/stackbrew-node       A stackbrew/ubuntu-based image for Docker,...   2                    [OK]
hcvst/erlang               Erlang R14B04 based on stackbrew/ubuntu         0                    [OK]
stackbrew/ubuntu-upstart                                                   0                    


Pull an existing image:
sudo docker pull ubuntu

Pulling repository ubuntu
a7cf8ae4e998: Pulling dependent layers 
3db9c44f4520: Downloading [=================>                                 ] 22.18 MB/63.51 MB 2m19s
74fe38d11401: Pulling dependent layers 
316b678ddf48: Pulling dependent layers 
99ec81b80c55: Pulling dependent layers 
5e019ab7bf6d: Pulling dependent layers 
511136ea3c5a: Download complete 
6cfa4d1f33fb: Download complete 


To the check the available images:
sudo docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
ubuntu              13.10               5e019ab7bf6d        2 weeks ago         180 MB
ubuntu              saucy               5e019ab7bf6d        2 weeks ago         180 MB
ubuntu              12.04               74fe38d11401        2 weeks ago         209.6 MB
ubuntu              precise             74fe38d11401        2 weeks ago         209.6 MB
ubuntu              12.10               a7cf8ae4e998        2 weeks ago         171.3 MB
ubuntu              quantal             a7cf8ae4e998        2 weeks ago         171.3 MB
ubuntu              14.04               99ec81b80c55        2 weeks ago         266 MB
ubuntu              latest              99ec81b80c55        2 weeks ago         266 MB
ubuntu              trusty              99ec81b80c55        2 weeks ago         266 MB
ubuntu              raring              316b678ddf48        2 weeks ago         169.4 MB
ubuntu              13.04               316b678ddf48        2 weeks ago         169.4 MB
busybox             latest              2d8e5b282c81        2 weeks ago         2.489 MB
ubuntu              10.04               3db9c44f4520        2 weeks ago         183 MB
ubuntu              lucid               3db9c44f4520        2 weeks ago         183 MB

To run a command within an image:
docker run<image> command

sudo docker run ubuntu echo HelloWorld
HelloWorld

To install something on an ubuntu image
sudo docker run apt-get install <package>

find ID of the container 
sudo docker ps -l
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                      PORTS               NAMES
0dac167b178d        ubuntu:14.04        ps aux              12 minutes ago      Exited (0) 12 minutes ago                       goofy_bell

committing changes made to the images:
docker commit 0da firstcommit
723aa6ead77a14ff05cd2c640163345ec5a36fa9a4c757a6872a1ec919ab9345

To get log of the present container:
sudo docker logs 0dac167b178d
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   7132   644 ?        Rs   09:10   0:00 ps aux

To inpect the details of an image
sudo docker inspect <id: 3-4 characters of id will work too>
sudo docker inspect 0da

<json output>

Push container image to the index
sudo docker push ubuntu


Creating Dockerfile:

All instructions in Dokerfile are in the form of 
INSTRUCTION arguments

Instruction are not case sensitive but CAPS are recommended. The first instruction in any Dockerfile is the FROM instruction. The syntax is:

FROM <image>
FROM ubuntu

This will look for the image in Docker index. You can also search docker index by the command docker search

Next is the RUN instruction. The RUN instruction will execute any commands on the current image. After executing, it will also commit the changes. The committed image can be used for the next instructions from the Dockerfile. This way the committed changes form a layer of changes just like any other source code control system. Syntax of RUN command:
RUN <command>
RUN apt get install -y apache2

Here the RUN command is equivalent to docker run image command + docker commit container_id. Here image will be automatically replaced with the current image and container_id is the result of the previous commit.

Once you have created your Dockerfile you can use docker build to create you image from it. You can use the command in this way.

Create a Dockerfile with the content
FROM ubuntu
RUN apt-get install -y memcached

Save and close the file. If the file is in you r present directory:
docker build .
If the file in in some other location 
docker build path/to/file
If passing through STDIN
docker build - < Dockerfile
If passing through github URL
docker build github.com/roshan4074

you can check the container with the command:
sudo docker images

To apply a tag to an image you the command: docker tag 
sudo docker tag <container_id>

To comment a code use the "#' symbol followed by the text

To specify the contact info of the Maintainer of the Dockerfile:
MAINTAINER Name contact@email

To trigger a command as soon as a container starts, use ENTRYPOINT instruction
ENTRYPOINT echo "Hello, Container Started"

Another Format to use ENTRYPOINT Instrcution is 
ENTRYPOINT ["echo", "Hello, Container Started"]
This is the preferred format

e.g
ENTRYPOINT ["wc", "-l"]

To execute a certain command by a particular user use the command USER
USER roshan

T open a particular port for a process use EXPOSE instruction
EXPOSE 8080

Saturday 26 April 2014

Learning Chef - Part - I

some rights reserved by Matt Ray
The information here has been collected from Nathan Harvey's Video tutorials on Chef's website and from Chef's official documentation. Before starting with the tutorial, I thought it would be better to understand common jargons used in chef.
Three primary entities: workstation, chef-server, node
  • Chef-Work Station : System from where the configuration management professional / devops / sys admin will be working.
  • Chef-Server : System/Server where all the infrastructure as a code will be stored. Also the Chef-Server will have many other features that we will seee later.
  • Nodes : Servers in your infrastructure that will be managed by chef, They may represent a physical server or a virtual server. They may represent hardware you own or multiple compute instances in a public or a private cloud. Each node will belong to one organization and and other organization may not have access to it. Each node will belong to one environment. Either in staging or Production etc. Each node will have zero or more roles. An application called chef-client run on each of the node. The chef-client will gather the current system configuration. It will download the desired system configuration from the Chef-server and configure that node such that it adheres to the policy defined.
  • Knife : Command line utility that acts as an interface between local chef-repo(on work station) and server. Knife lets you manage nodes, cookbooks, recipes, roles, stores of json data, including encrypted data, environments, cloud resources including provisioning. The installation of chef on management workstations, Searching of indexed data on chef server. You can even extend knife to use plugins for managing cloud resources. E.g knife-ec2, knife-rackspace, knife-vcloud plugins.
  • Cookbooks : A cookbook is a container to describe or to contain our configuration data. It contain the recipes. It can also include templates, files, custom resources etc.
  • Recipes : A Recipe is a configuration file that describes resources and their desired state. A recipe can install and configure software components, Manage files, Deploy Applications, Execute other recipes, etc.
    • Sample Recipe
package "apache2" // 1st resource is package and chef knows that it should be installed on the server. If the package doesn’t exist it will install it.

template "/etc/apache2/apache.conf" do //Next resource is a template. The template will manage a file at /etc/apache2/apache.conf
source "apache2.conf.erb"
owner "root"
group "root"
mode "0644"
variable(:allow_override => "All")
notifies :reload, "service[apache2] //state of the apache2 if apache2.conf exist it knows that it doesn’t need to create that file. However chef needs to make sure that the file has proper contents. So it will generate a temporary file. It will use the source that we specified above apache2.conf.erb and then it will also use any variable content that we specified, i.e AllowOverride All. Once the temporary file is created, Chef will then compare the two files. If they are the same, chef will discard the temporary file and then move on to the next resource, then notifies line will be ignored. However if the two files are different. The chef-client will discard the version on the disk and place the temporary file into the proper location, i.e overwrite existing file. Whenever the overwrite happens a notification will be sent. Then it will tell Apache to reload with new configs.
end

service "apache2" do //service should be enabled and start automatically
action [:enable,:start]
supports :reload => true
end
  • Roles : A way of identfying different types of servers. e.g Load-balancer, app server, DB cache, DB , monitoring etc. Roles may include list of configs to be applied called as runlist. May include data attributes for configuring infra, i.e ports to listen on, list of apps to be deployed.
  • Data bags : Stores of json data
  • Attributes : Attributes are mentioned in cookbooks/recipes. An attribute gives the detail about the node. It tells about the state of the node; before the chef-client run, present state and state after the chef-client run
  • Resources : Items that we sysadmins manipulate to manage complexity. i.e Networking, Files, Directories, Symlinks, Mounts, Registry key, Scripts, Users, Groups, Packages, Services File-systems etc. Resource represent a piece of the system and its desired state. e.g a package to be installed, A service to be running, A file to be generated, a cronjob to be configured, a user to be managed, etc.
  • Ohai : Ohai is a tool used to detect the attributes on the node. These attributes are then passed to the chef client at the beginning of the chef-client run. Ohai is installed on a node as a part of chef-client installation. Ohai has the following types of attributes: Platform details, Network usage, Memory usage, Processor usage, Kernel data, Hostnames, FQDNs, etc. So, Ohai is a utility that will give you all the information of your system level data.
  • Shef : Chef-Shell was earlier known as Shef. Its a recipe debugging tool that allows breakpoints within recipes. Its runs as an irb session.
  • Environments : Environments can be development, test, staging and production. They may contain data attributes specific to an environment. Starts with single environment e.g default is 1st. Different names/URLs for payment services, location for package repository, version of chef configs etc.
  • Run List : The joining of a node to a set of policies is called as a run-list. The chef-client will download all the necessary components that make up the run-list.e.g recipe[npt::client], recipe[users], role[webserver]. Run List is a collection of policies that a node should follow. Chef-client obtains the run-list from chef-server. chef-client ensures the node complies with the policy in the run-list.
  • Search : You can search for nodes with Roles, Find topology data, i.e IP addresses, hostnames, FQDNs. Searchable index about your infrastructure. e.g load balancer needs to know which application should I be sending requests to? Chef-client can ask Chef-server which application servers are available and which application server should I be sending load to. And in return the chef server can send a list of nodes and then the load balancer can figure out which one based on the hostname or IP address or Fully Qualified Domain Name.
  • Organization : Everyone has their own infra and wont manage anyone else's infra. Organizations are independent tenants on Enterprise chef. So this could be different companies, business units or departments for managing.

Wednesday 23 April 2014

Automation for VMware vCloud Director using Chef's knife-vcloud - Part-II

Version 1.2.0


 Some right reserved by Phil Wiffen

For some reason with the previous repo I could not see the list of all vApps. Only some of it (a mixture of both chef node and non chef nodes) were seen. So I went ahead with another version of knife-vcloud plugin available which solved my problem to a large extent.
Plugin is available at https://github.com/astratto/knife-vcloud

Configuration used:
  • CentOS 6.5
  • Chef 11.8.2
  • knife-vcloud 1.2.0
Following steps were used to complete the automation process:
Installation is fairly simple
gem install knife-vcloud
gem list | grep vcloud
- See if after entering the above command you see the gem knife-cloud. If yes the setup was successful. If no something went wrong.

cd ~./chef
vim knife.rb
Configuration is almost automated:
knife vc configure

You will be prompted for vcloud_url, login and password. After entering the details check that the details you entered are reflected in the knife.rb file.

knife[:vcloud_url] = 'https://vcloud.server.org'
knife[:vcloud_org_login] = 'vcloud_organization'
knife[:vcloud_user_login] = 'vcloud_user'
knife[:vcloud_password] =

Note: The organization was not updated for me, and it kept giving authorization failure for quite sometime. If you see that the organization is not updated automatically, please update it manually in the knife.rb file.

The subsequent commands would also change for the detailed listing. Although the documentation at many instances says that the name of VM or vApp should suffice to pull up the required details, note that at many instances you will be required to enter the ID and not just the name.

To see the list of catalog items

[root@chefworkstation ~]# knife vc catalog show All_ISOs
Description: All ISO Dumps
Name                                           ID                                          
CentOS-6.3                                          WhAtEvEr-Id-tO-bE-SeEn1       
CentOS-6.4_x64                                   WhAtEvEr-Id-tO-bE-SeEn2        
Ubuntu-copy                                           WhAtEvEr-Id-tO-bE-SeEn3        

To see details of the organization

[root@chefworkstation ~]# knife vc org show MYORG
CATALOGS                                                                 
Name                                  ID                                 
All_ISOs                                  WhAtEvEr-Id-tO-bE-SeEn4
Master Catalog                        WhAtEvEr-Id-tO-bE-SeEn5
                                                                         
VDCs                                                                     
Name                                  ID                                 
MyorgVDC-Tier1     WhAtEvEr-Id-tO-bE-SeEn6
MyorgVDC-Tier2        WhAtEvEr-Id-tO-bE-SeEn7
MyorgVDC-Tier3        WhAtEvEr-Id-tO-bE-SeEn8

NETWORKS                                                                 
Name                                  ID                                 
MyorgNet-Router                   WhAtEvEr-Id-tO-bE-SeEn9

TASKLISTS                                                                
Name                                  ID                                 
                        WhAtEvEr-Id-tO-bE-SeEn10
To create a new vApp:

[root@chefworkstation ~]# knife vc vapp create MyorgVDC-Tier1 chefnode2 "Just Created node2" WhAtEvEr-Id-tO-bE-SeEn
vApp creation...
Summary: Status: error - time elapsed: 52.012 seconds
WARNING: ATTENTION: Error code 400 - The following IP/MAC addresses have already been used by running virtual machines: MAC addresses: 10:20:30:40:50:0f IP addresses: 192.168.0.20 Use the Fence vApp option to use same MAC/IP. Fencing allows identical virtual machines in different vApps to be powered on without conflict, by isolating the MAC and IP addresses of the virtual machines.
vApp created with ID: WhAtEvEr-Id-tO-bE-SeEn

Note: that there are certain problems that were corrected later.
To show the deatils of created vApp:
[root@chefworkstation ~]# knife vc vapp show WhAtEvEr-Id-tO-bE-SeEn1
Note: --vdc not specified, assuming VAPP is an ID
Name: chefnode2
Description: Just Created node2
Status: stopped
IP: 192.168.0.12
Networks
MyorgNet-Router
   Gateway      Netmask        Fence Mode  Parent Network       Retain Network
      192.168.0.1  255.255.255.0  bridged     MyorgNet-Router  false        
      VMs
      Name    Status   IPs           ID                                    Scoped ID                          
      centos  stopped  192.168.0.12  WhAtEvEr-Id-tO-bE-SeEn  WhAtEvEr-Id-tO-bE-SeEn

To show the vm specific details:

[root@chefworkstation ~]# knife vc vm show WhAtEvEr-Id-tO-bE-SeEn --vapp MyvApp_Chef
Note: --vapp and --vdc not specified, assuming VM is an ID
VM Name: centos
OS Name: CentOS 4/5/6 (64-bit)
Status: stopped
Cpu                                          
Number of Virtual CPUs  1 virtual CPU(s)     

Memory                                       
Memory Size             2048 MB of memory    

Disks                                        
Hard disk 1             16384 MB             
Hard disk 2             16384 MB             

Networks                                     
MyorgNet-Router                          
Index                 0                    
Ip                    192.168.0.12         
External ip                                
Is connected          true                 
Mac address           10:20:30:40:50:0f    
Ip allocation mode    MANUAL               

Guest Customizations                         
Enabled                 false                
Admin passwd enabled    true                 
Admin passwd auto       false                
Admin passwd                                 
Reset passwd required   false                
Computer name           centos
  

To set new info to the vm:

[root@chefworkstation ~]# knife vc vm set info --name ChefNewNode WhAtEvEr-Id-tO-bE-SeEn --vapp MyvApp_Chef centos
Note: --vapp and --vdc not specified, assuming VM is an ID
Renaming VM from centos to ChefNewNode
Summary: Status: success - time elapsed: 7.09 seconds

To update other info:


[root@chefworkstation ~]# knife vc vm set info --ram 512 WhAtEvEr-Id-tO-bE-SeEn --vapp MyvApp_Chef
Note: --vapp and --vdc not specified, assuming VM is an ID
VM setting RAM info...
Summary: Status: success - time elapsed: 9.843 seconds

To edit network info:


[root@chefworkstation ~]# knife vc vm network edit WhAtEvEr-Id-tO-bE-SeEn MyorgNet-Router --net-ip 192.168.0.117 --ip-allocation-mode MANUAL
Note: --vapp and --vdc not specified, assuming VM is an ID
Forcing parent network to itself
VM network configuration...
Guest customizations must be applied to a stopped VM, but it's running. Can I STOP it? (Y/N) Y
Stopping VM...
Summary: Status: success - time elapsed: 7.092 seconds
VM network configuration for MyorgNet-Router...
Summary: Status: success - time elapsed: 6.783 seconds
Forcing Guest Customization to apply changes...
Summary: Status: success - time elapsed: 22.639 seconds

To show the changes made:

[root@chefworkstation ~]# knife vc vm show WhAtEvEr-Id-tO-bE-SeEn
Note: --vapp and --vdc not specified, assuming VM is an ID
VM Name: ChefNewNode
OS Name: CentOS 4/5/6 (64-bit)
Status: running

Cpu                                          
Number of Virtual CPUs  1 virtual CPU(s)     

Memory                                       
Memory Size             512 MB of memory     

Disks                                        
Hard disk 1             16384 MB             
Hard disk 2             16384 MB             

Networks                                     
MyorgNet-Router                          

Index                 0                    
Ip                    192.168.0.117        
External ip                                
Is connected          true                 
Mac address           10:20:30:40:50:0f    
Ip allocation mode    MANUAL               

Guest Customizations                         
Enabled                 true                 
Admin passwd enabled    true                 
Admin passwd auto       false                
Admin passwd                                 
Reset passwd required   false                

Computer name           centos

Reference Links: