Techsperiments

Saturday 10 October 2015

Infrastructure Monitoring with Nagios

Image Credits : xmodulo

Server management is a real pain and the pain keeps getting worse with more and more server getting added to the infrastructure. So how do organizations sustain with huge server farms, datacenters in place? How can super admins promise an SLA of 99.99% uptime with a very low response and resolution time? Quiet obviously the answer is server monitoring solutions. It could have been so tedious for a human to monitor servers 24x7 especially when most of the systems are stable and its only once in a while some manual intervention is needed.

So what is it that needs to be really monitored? It really depends from one organization to other. For a web development platform, response time of the page may matter a lot. The kind of traffic, 4xx's 5xx's could be a concern too. Disk Space, CPU, Memory, Swap space, particular processes and services running, DB server replication, read writes, no. of connections, query execution time and many more parameters together. Most of these checks are required by all organizations. Out of the many monitoring tools out there, one of the most used is Nagios.

Nagios is an open source software application that helps in monitoring systems, network and Infrastructure. Nagios is on top of the Linux and hence, whatever you could do with Linux could also be done with Nagios. The best part of using Nagios is the plugin based architecture and 100's and 1000's of plugins that it supports to literally allow you to monitor anything.

Nagios comes with multiple notable features that makes it distinguishing. It uses the standard protocols i.e TCP, UDP, ICMP for monitoring servers across network. You can perform multiple resource checks on any host using the NRPE addon, the checks varies from CPU, Disk RAM and many more. Not just resource checks, you could also add event handlers that perform certain actions when certain events are noticed. Checks are performed at the specified intervals, by default the interval is 5 minutes. There are 2 types of checks, Active - The one that are nags initiated. Passive - The one that are initiated externally.

Nagios consists of various objects that needs to be defined and used.

Hosts : Hosts are the systems/ servers that need to be monitored in the infrastructure. Nagios also provides the facility to group set of hosts together to give a better monitoring experience. Say you can group all web servers together in a "WebServers" host group. Typically a host definition may look like : "define host{
use linux-box

host_name test_host

alias CentOS 6

address 5.175.142.66
}"
Services : Services are the checks that needs to be performed. There are a wide range of service checks that can be performed on any host. Just like host group, service checks can also be grouped together. E.g you may need to check the CPU utilization of all servers together, you may group it that way. A service definition may look like : "define service{
use generic-service

host_name test_host

service_description CPU Load

check_command check_nrpe!check_load
}"
Contacts : Contacts are the people who need to be contacted if a notification needs to be sent for any event that occurs. You can configure contacts to send emails, samosas, or even custom messages to any service that allows messaging. Contacts can also be grouped together into a contact group. E.g there is a notification about come process getting shut down on QA server that the Admin may not necessarily be bothered about, in such a case the notification can only be sent to QA group. A contact definition will look like : "[define contact{
name generic-contact

service_notification_period 24x7

host_notification_period 24x7

service_notification_options w,u,c,r,f,s

host_notification_options d,u,r,f,s

service_notification_commands notify-service-by-email

host_notification_commands notify-host-by-email

register 0

}"
Commands : Commands define the exact command that will be executed on the remote hosts while executing a particular check. These are the simplest way to get particular check executed, you may also pass bash commands to perform any particular check. A command definition may look like : "define command{
command_name check_nrpe

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}"
Time Period : If a downtime is scheduled at a particular time regularly and you don't want Nagios to send you any alert at these hours, you can achieve this by adding a time period definition. This looks like : "define timeperiod{
timeperiod_name 24x7-except-night-12-2

alias 24x7 Except 00:00 - 02:00

sunday 02:00-23:59

monday 02:00-23:59

tuesday 02:00-23:59

wednesday 02:00-23:59

thursday 02:00-23:59

friday 02:00-23:59

saturday 02:00-23:59
}"

You can also set a monitoring schedule for a particular object if you do not want to add it to the existing service/hosts check. This allows you to explicitly look at a particular check.

Sometimes writing the definition can become a real pain using the same definition for all services and hosts can be a real pain even if you decide to copy-paste the definitions. Templates come for help here. You can define a template with all the necessary details of definition and simply use the same template everywhere in the configs. A typical template definition look like :

define host{

        name                            generic-host    

        notifications_enabled           1               

        event_handler_enabled           1               

        flap_detection_enabled          1               

        process_perf_data               1               

        retain_status_information       1               

        retain_nonstatus_information    1               

        notification_period             24x7            

        register                        0               

}

define contact{

        name                            generic-contact         

        service_notification_period     24x7                    

        host_notification_period        24x7                    

        service_notification_options    w,u,c,r,f,s             

        host_notification_options       d,u,r,f,s               

        service_notification_commands   notify-service-by-email 

        host_notification_commands      notify-host-by-email    

        register                        0                        

}

Monitoring in Nagios is parallel, i.e a number of hosts and service checks will go simultaneously in parallel. This could be resource consuming but this is always better than sequential monitoring as you can be sure that all your servers are doing well and don't have to wait too long for any kind of update. The add ons for Nagios are simple to make and add to the Nagios community. The configs are all split and simple to understand too. Nagios has a huge documentation and help examples for quickly getting started.

Happy Monitoring!!

Tuesday 18 August 2015

Software Configuration Management System

Picture credits : Paul Downey

Any application would generally consist of Web servers, Application Servers, Memcache systems, SQL and NoSQL Database servers, Load Balancers, Messaging queues, etc. Although this is pretty much enough, however as a precaution/privilege we also ensure proper redundancies so that whenever there is a failure we have a back plan in place to handle the failure. In order to keep a track of server performances we also have logging servers, Analytics servers and Monitoring servers in place. All these servers need to available again within no time in case something goes wrong(which does go wrong).

In traditional systems the admin guy managed all these by managing the wiring of the server and SSHing the servers and maintaining them throughout. There was nothing wrong with the idea except of time taken to get the process done. When something goes wrong get into that machine and spend hours finding out what went wrong and correct it my defining a good downtime. With a configuration management(CM) system in place now, we describe a state of a server and use some tool that just ensures that the server resides in that state throughout. The CM system ensures that right packages are installed, config files have correct values and permissions set and that the expected services are running on the host system and many more.

Software Deployment is another concern that a Devops person has to take care of which is at times addressed by CM tools too, although may not be considered a good practice always. Deployment is the process where the software that is written/developed by a company is built/compiled/processed and the required binaries and static files and other necessary files are copied to the server. The expected services are started as well. This is done mostly by using some scripting language and now we have some deployment specific tools that have their own advantages over scripting languages rollback being an important one. Capistrano and Fabric are famous ones.

Many a times the deployment process involves multiple remote servers. In complex environments the deployment process, the order of execution of tasks play an important role. A deployment may fail if an expected event occurs before another. E.g the database server needs to be up and running before the web server is brought up. Or in a high availability environment servers needs to be 1st taken out of the load balancer one by one before deployment and later added back to the load balancer post successful deployment. This automated arrangement, coordination and management of complex systems is called orchestration.

With a bunch of IAAS providers in the cloud market, virtualization has taken up huge pace. The evaluation of any new CM tool that comes to the IT world is largely done based on the number of cloud providers it supports. An important feature of a CM tool is provisioning. Provisioning is the process of spinning up of server for that cloud provider automatically. Many CM tools providers have plugins written to communicate with many cloud providers. Chef, Ansible, Puppet, CFEnginer, Salt have already become favorite for many out there.

I have personally used Ansible and Chef as of now. Cloud is fun indeed .. :)

Thursday 23 April 2015

s3cmd to push large files greater than 5GB to Amazon S3

image credits: Stefano Bertolo

Use command line utility to push s3cmd files on Amazon S3.

Install s3cmd from s3tools.org or
apt-get install yum install s3cmd OR s3cmd

Configure s3cmd by
vim ~ / .s3cfg

<Paste những info add to it and you access-key and secret-key>

[Default]
access_key = TUOWAAA99023990001
access_token =
add_encoding_exts =
add_headers =
bucket_location = US
cache_file =
cloudfront_host = cloudfront.amazonaws.com
default_mime_type = binary / octet-stream
delay_updates = False
delete_after = False
delete_after_fetch = False
delete_removed = False
dry_run = False
enable_multipart = True
encoding = UTF-8
encrypt = False
EXPIRY_DATE =
expiry_days =
expiry_prefix =
follow_symlinks = False
force = False
get_continue = False
gpg_command = / usr / bin / gpg
gpg_decrypt =% (gpg_command) s -d --verbose --no-use-agent --batch --yes --passphrase-fd% (passphrase_fd) s -o% (output_file) s% (input_file) s
gpg_encrypt =% (gpg_command) s -c --verbose --no-use-agent --batch --yes --passphrase-fd% (passphrase_fd) s -o% (output_file) s% (input_file) s
gpg_passphrase =
guess_mime_type = True
host_base = s3.amazonaws.com
host_bucket =% (bucket) s.s3.amazonaws.com
human_readable_sizes = False
ignore_failed_copy = False
invalidate_default_index_on_cf = False
invalidate_default_index_root_on_cf = True
invalidate_on_cf = False
list_md5 = False
log_target_prefix =
max_delete = -1
mime_type =
multipart_chunk_size_mb = 15
preserve_attrs = True
progress_meter = True
proxy_host =
proxy_port = 0
put_continue = False
recursive = False
recv_chunk = 4096
reduced_redundancy = False
restore_days = 1
secret_key = sd / ceP_vbb # eDDDK
send_chunk = 4096
server_side_encryption = False
simpledb_host = sdb.amazonaws.com
skip_existing = False
socket_timeout = 300
urlencoding_mode = normal
use_https = True
use_mime_magic = True
verbosity = WARNING
website_endpoint = http: //% (bucket) s.s3-their Website% (location) s.amazonaws.com/
website_error =
website_index = index.html

access_key = YOUR-ACCESS-KEY-HERE
You can see how to use s3cmd at: http://s3tools.org/usage

Here I came across a typical scenario where I could not upload files greater than 5GB. You could do this to print two Ways:

Using the --multipart-chunk-size-mb flag: s3cmd put --multipart-chunk-size-mb = 4096 201412.tar.gz s3: // apache-logs / I could not do this since I Had an older version of s3cmd installed and I did not really have time to download and install những version.
Splitting Into the large files using small files and then uploading it split command.

Original file

-rw-r - r--. 1 root root 5.4G Jan 20 06:54 201412.tar.gz

Split Command

split -b 3G 2014backup.tar.gz "201 412"

Post Split

-rw-r - r--. 1 root root 23 Apr 06:41 3.0G 201412aa
-rw-r - r--. 1 root root 23 Apr 06:43 2.4G 201412ab

Upload files những

201 412 * s3cmd put s3: // apache-logs /

Saved some time :)

Techsperiments

Pages

Pages

Contact Form

Featured post

SkenAI's role in DevSecOps CI/CD pipeline

Popular Posts

Blog Archive

Saturday 10 October 2015

Infrastructure Monitoring with Nagios

Tuesday 18 August 2015

Software Configuration Management System

Thursday 23 April 2015

s3cmd to push large files greater than 5GB to Amazon S3

About Me

Followers

Total Pageviews

Labels

Techsperiments

Pages

Pages

Contact Form

Featured post

SkenAI's role in DevSecOps CI/CD pipeline

Popular Posts

Blog Archive

Saturday 10 October 2015

Infrastructure Monitoring with Nagios

Tuesday 18 August 2015

Software Configuration Management System

Thursday 23 April 2015

s3cmd to push large files greater than 5GB to Amazon S3

About Me

Followers

Subscribe To

Total Pageviews

Labels