Pages

Showing posts with label Configuration Management. Show all posts
Showing posts with label Configuration Management. Show all posts

Tuesday, 26 April 2016

Configuration Management with Ansible



What is Ansible?
Opensource IT Automation tool that handles:
- Application Deployment
- Muti-tier Orchestration
- Configuration Management

Why Ansible?
- Agentless architechture
- Operates over SSH
- Configuration as data and not as code
- Python
- Self Documented/Explanatory
- Feature rich - more than 150 modules - new modules are easy to write
- Full configuration management and deployment

Installing ansible:
Python package index - pip install ansible
OS Package install - sudo apt-get install ansible
Git repo and run setup.py - git clone https://github.com/ansible/ansible.git

Ansible Modes:
Playbook mode - Execution of a series of commands in order as per written in a playbook
Non Playbook mode - Executing an ansible module command on target host

Getting Started:
- Clone the parallax repo
- the repo contains a ansible.cfg file which contains following:
------------
[defaults]
# more at http://docs.ansible.com/intro_configuration.html#the-ansible-configuration-file
# host_key_checking=False
remote_user=user
-----------
This file contains global config setting to adjust ansible

Playbooks:
They represent collection of 'plays', configuration policies, which gets applied to defined groups of hosts
A sample playbook is as follows:
-----------
- name: Install all the packages and stuff required for an EXAMPLE SERVICE
  hosts: example_servers
  user: user
  sudo: yes
  roles:
    - common
    - mongodb
    - zeromq
    - service_example
    - nodejs
#    - nginx
#    - python
#    - postgresql
#    - redis
#    - memcached
#    - deployment
------------

Anatomy of a Playbook:
A sample playbook structure is as follows:
-----------
.
├── example_servers.yml
├── group_vars
│   ├── all
│   └── example_servers
├── host_vars
│   └── example-repository
├── hosts
├── repository_server.yml
├── roles
│   ├── __template__
│   ├── common
│   ├── gridfs
│   ├── memcached
│   ├── mongodb
│   ├── nginx
│   ├── nodejs
│   ├── redis
│   ├── repository
│   ├── service_example
│   └── zeromq
└── site.yml
--------------
If we look at the tree we see a few YAML files and a few directories. There is also a filed called as 'hosts'. The hosts file is the the Ansible inventory file, it stores the hosts and their mappings to the hostgroups. The hosts file looks like this.

Simple Playbook:
--- //The three dashes on the top tells you that this is a YAML(Yet another markup language) file. Ansible playbooks are written in yaml.
- name: install and start apache //"name" keyword defines the name of the play.
  hosts: webservers //"hosts" keyword tells which hosts will the play target.
  user: root //"user" keyword tells what system user will ansible use to execute the task below

 tasks: //"tasks" under tasks you can define what module you use and you configuration
- name: install httpd
  yum: name=httpd state=present //"yum" module is being used to install httpd

- name: start httpd
  service: name=httpd state=running //"service" module is being used to start the httpd service

Ansible Architechture:
It runs as a server on your laptop. It has a inventorial host and has set of modules, there are series of playbooks that define the automation tasks. It pushes the outstanding modules modules out to the managed servers using SSH, the module runs and the resukt is returned and then the module is removed from the system. No agents are necessary for this process. This is only SSH and python are the requirements.

How does a playbook work?
When you execute a palybook, the 1st thing that happens is we gather facts. Ansible will first gather lot of useful facts from that remote system. THese can be used later on in playbooks templates and config files as variables. The tasks provided in the playbook will then be performed say 'install apache'. SO we will see that we get a changed response. which means something has been changed on the systems. if you run the same playbook again you will not get the changed response for the 2nd time, as the changes were already done in the 1st run. This is because that the expected state that we told Ansible to perform was already there and hence it did not do it for the 2nd time. This is the idempotency of Ansible.

Host Inventory: Basics
Host inventory can come from several different places, it is usually a list of hosts that you organize in a group. It can come from a file or a directory of files or from a cloud provisioning environment like EC2, Rackspace etc.

contd...

Saturday, 10 October 2015

Infrastructure Monitoring with Nagios


Image Credits : xmodulo

Server management is a real pain and the pain keeps getting worse with more and more server getting added to the infrastructure. So how do organizations sustain with huge server farms, datacenters in place? How can super admins promise an SLA of 99.99% uptime with a very low response and resolution time? Quiet obviously the answer is server monitoring solutions. It could have been so tedious for a human to monitor servers 24x7 especially when most of the systems are stable and its only once in a while some manual intervention is needed.

So what is it that needs to be really monitored? It really depends from one organization to other. For a  web development platform, response time of the page may matter a lot. The kind of traffic, 4xx's 5xx's could be a concern too. Disk Space, CPU, Memory, Swap space, particular processes and services running, DB server replication, read writes, no. of connections, query execution time and many more parameters together. Most of these checks are required by all organizations. Out of the many monitoring tools out there, one of the most used is Nagios.

Nagios is an open source software application that helps in monitoring systems, network and Infrastructure. Nagios is on top of the Linux and hence, whatever you could do with Linux could also be done with Nagios. The best part of using Nagios is the plugin based architecture and 100's and 1000's of plugins that it supports to literally allow you to monitor anything.

Nagios comes with multiple notable features that makes it distinguishing. It uses the standard protocols i.e TCP, UDP, ICMP for monitoring servers across network. You can perform multiple resource checks on any host using the NRPE addon, the checks varies from CPU, Disk RAM and many more. Not just resource checks, you could also add event handlers that perform certain actions when certain events are noticed. Checks are performed at the specified intervals, by default the interval is 5 minutes. There are 2 types of checks, Active - The one that are nags initiated. Passive - The one that are initiated externally.

Nagios consists of various objects that needs to be defined and used.

  1. Hosts : Hosts are the systems/ servers that need to be monitored in the infrastructure. Nagios also provides the facility to group set of hosts together to give a better monitoring experience. Say you can group all web servers together in a "WebServers" host group. Typically a host definition may look like : "define host{
    use                             linux-box 
    host_name                       test_host 
    alias                           CentOS 6 
    address                         5.175.142.66 
    }"
  2. Services : Services are the checks that needs to be performed. There are a wide range of service checks that can be performed on any host. Just like host group, service checks can also be grouped together. E.g you may need to check the CPU utilization of all servers together, you may group it that way. A service definition may look like : "define service{
            use                     generic-service
            host_name               test_host
            service_description     CPU Load
            check_command           check_nrpe!check_load 
            }"
  3. Contacts : Contacts are the people who need to be contacted if a notification needs to be sent for any event that occurs. You can configure contacts to send emails, samosas, or even custom messages to any service that allows messaging. Contacts can also be grouped together into a contact group. E.g there is a notification about come process getting shut down on QA server that the Admin may not necessarily be bothered about, in such a case the notification can only be sent to QA group. A contact definition will look like : "[define contact{
            name                            generic-contact
            service_notification_period     24x7
            host_notification_period        24x7
            service_notification_options    w,u,c,r,f,s
            host_notification_options       d,u,r,f,s
            service_notification_commands   notify-service-by-email
            host_notification_commands      notify-host-by-email
            register                        0   

            }
    "
  4. Commands : Commands define the exact command that will be executed on the remote hosts while executing a particular check. These are the simplest way to get particular check executed, you may also pass bash commands to perform any particular check. A command definition may look like : "define command{
            command_name check_nrpe
            command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
            }"
  5. Time Period : If a downtime is scheduled at a particular time regularly and you don't want Nagios to send you any alert at these hours, you can achieve this by adding a time period definition. This looks like : "define timeperiod{
            timeperiod_name 24x7-except-night-12-2
            alias           24x7 Except 00:00 - 02:00
            sunday          02:00-23:59
            monday          02:00-23:59
            tuesday         02:00-23:59
            wednesday       02:00-23:59
            thursday        02:00-23:59
            friday          02:00-23:59
            saturday        02:00-23:59
    }"
You can also set a monitoring schedule for a particular object if you do not want to add it to the existing service/hosts check. This allows you to explicitly look at a particular check.
Sometimes writing the definition can become a real pain using the same definition for all services and hosts can be a real pain even if you decide to copy-paste the definitions. Templates come for help here. You can define a template with all the necessary details of definition and simply use the same template everywhere in the configs. A typical template definition look like :
define host{
        name                            generic-host    
        notifications_enabled           1               
        event_handler_enabled           1               
        flap_detection_enabled          1               
        process_perf_data               1               
        retain_status_information       1               
        retain_nonstatus_information    1               
        notification_period             24x7            
        register                        0               
        }

define contact{
        name                            generic-contact         
        service_notification_period     24x7                    
        host_notification_period        24x7                    
        service_notification_options    w,u,c,r,f,s             
        host_notification_options       d,u,r,f,s               
        service_notification_commands   notify-service-by-email 
        host_notification_commands      notify-host-by-email    
        register                        0                        
        }

Monitoring in Nagios is parallel, i.e a number of hosts and service checks will go simultaneously in parallel. This could be resource consuming but this is always better than sequential monitoring as you can be sure that all your servers are doing well and don't have to wait too long for any kind of update. The add ons for Nagios are simple to make and add to the Nagios community. The configs are all split and simple to understand too. Nagios has a huge documentation and help examples for quickly getting started. 

Happy Monitoring!!

Tuesday, 18 August 2015

Software Configuration Management System

Picture credits : Paul Downey

Any application would generally consist of Web servers, Application Servers, Memcache systems, SQL and NoSQL Database servers, Load Balancers, Messaging queues, etc. Although this is pretty much enough, however as a precaution/privilege we also ensure proper redundancies so that whenever there is a failure we have a back plan in place to handle the failure. In order to keep a track of server performances we also have logging servers, Analytics servers and Monitoring servers in place. All these servers need to available again within no time in case something goes wrong(which does go wrong).

In traditional systems the admin guy managed all these by managing the wiring of the server and SSHing the servers and maintaining them throughout. There was nothing wrong with the idea except of time taken to get the process done. When something goes wrong get into that machine and spend hours finding out what went wrong and correct it my defining a good downtime. With a configuration management(CM) system in place now, we describe a state of a server and use some tool that just ensures that the server resides in that state throughout. The CM system ensures that right packages are installed, config files have correct values and permissions set and that the expected services are running on the host system and many more.

Software Deployment is another concern that a Devops person has to take care of which is at times addressed by CM tools too, although may not be considered a good practice always. Deployment is the process where the software that is written/developed by a company is built/compiled/processed and the required binaries and static files and other necessary files are copied to the server. The expected services are started as well. This is done mostly by using some scripting language and now we have some deployment specific tools that have their own advantages over scripting languages rollback being an important one. Capistrano and Fabric are famous ones.

Many a times the deployment process involves multiple remote servers. In complex environments the deployment process, the order of execution of tasks play an important role. A deployment may fail if an expected event occurs before another. E.g the database server needs to be up and running before the web server is brought up. Or in a high availability environment servers needs to be 1st taken out of the load balancer one by one before deployment and later added back to the load balancer post successful deployment. This automated arrangement, coordination and management of complex systems is called orchestration.

With a bunch of IAAS providers in the cloud market, virtualization has taken up huge pace. The evaluation of any new CM tool that comes to the IT world is largely done based on the number of cloud providers it supports. An important feature of a CM tool is provisioning. Provisioning is the process of spinning up of server for that cloud provider automatically. Many CM tools providers have plugins written to communicate with many cloud providers. Chef, Ansible, Puppet, CFEnginer, Salt have already become favorite for many out there.

I have personally used Ansible and Chef as of now. Cloud is fun indeed .. :)