Many SparkleFormation templates make extensive use of the cfn-init and cfn-signal commands provided by the aws-cfn-bootstrap module, utilities authored by Amazon Web Services. Amazon’s recommended install method seems to be calling easy_install against an unversioned tarball artifact:

1
easy_install https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz

Here easy_install downloads the artifact, unpacks it, reads its dependencies, connects to the PyPi package index, retrieves information about where to get those dependencies, and so on. This all works well enough, until one of the many different package sources for one of the module’s dependencies begins to behave erratically. On more than one occassion this process has taken so long to return an error from misbehaving artifact source that all stack deployments subsequently fail due to timeouts.

Having been bitten by this more than once, I determined that vendorizing the aws-cfn-bootstrap code, along with its dependencies, would probably be the best way to make my builds more reliable.

Initially I experimented with virtualenv, but ultimately found it difficult to use for manufacturing a truly portable artifact for this purpose. Additional literature review indicated that repackaging aws-cfn-bootstrap and its dependencies as a Python wheels might be just what I needed.

On a default Amazon AMI, I installed pip via the prescribed installation method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ curl --silent -O https://bootstrap.pypa.io/get-pip.py
$ python get-pip.py
Downloading/unpacking pip
  Downloading pip-1.5.6-py2.py3-none-any.whl (1.0MB): 1.0MB downloaded
Installing collected packages: pip
Successfully installed pip
Cleaning up...
$ pip --version
pip 1.5.6 from /usr/lib/python2.6/site-packages (python 2.6)
$ pip install wheels
Downloading/unpacking wheel
  Downloading wheel-0.24.0-py2.py3-none-any.whl (63kB): 63kB downloaded
Requirement already satisfied (use --upgrade to upgrade): argparse in /usr/lib/python2.6/site-packages (from wheel)
Installing collected packages: wheel
Successfully installed wheel
Cleaning up...

With pip and wheel installed, building wheels for each module can be done in one simple command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ pip wheel -w aws-cfn-bootstrap-wheelhouse https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz
Downloading/unpacking https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz
  Downloading aws-cfn-bootstrap-latest.tar.gz (441kB): 441kB downloaded
  Running setup.py (path:/var/folders/17/2k89dx490h77lxq1dx76229m0000gn/T/pip-H4qLj3-build/setup.py) egg_info for package from https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz

Downloading/unpacking python-daemon>=1.5.2 (from aws-cfn-bootstrap==1.4)
  Downloading python-daemon-1.6.1.tar.gz (47kB): 47kB downloaded
  Running setup.py (path:/private/var/folders/17/2k89dx490h77lxq1dx76229m0000gn/T/pip_build_cwj/python-daemon/setup.py) egg_info for package python-daemon

Downloading/unpacking pystache>=0.4.0 (from aws-cfn-bootstrap==1.4)
  Downloading pystache-0.5.4.tar.gz (75kB): 75kB downloaded
  Running setup.py (path:/private/var/folders/17/2k89dx490h77lxq1dx76229m0000gn/T/pip_build_cwj/pystache/setup.py) egg_info for package pystache
    pystache: using: version '5.4.2' of <module 'setuptools' from '/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/setuptools-5.4.2-py2.7.egg/setuptools/__init__.pyc'>

Downloading/unpacking setuptools (from python-daemon>=1.5.2->aws-cfn-bootstrap==1.4)
  Downloading setuptools-6.1-py2.py3-none-any.whl (533kB): 533kB downloaded
  Saved ./aws-cfn-bootstrap-wheelhouse/setuptools-6.1-py2.py3-none-any.whl
Downloading/unpacking lockfile>=0.9 (from python-daemon>=1.5.2->aws-cfn-bootstrap==1.4)
  Downloading lockfile-0.10.2-py2-none-any.whl
  Saved ./aws-cfn-bootstrap-wheelhouse/lockfile-0.10.2-py2-none-any.whl
Building wheels for collected packages: python-daemon,pystache,aws-cfn-bootstrap
  Running setup.py bdist_wheel for python-daemon
  Destination directory: /Users/cwj/aws-cfn-bootstrap-wheelhouse
  Running setup.py bdist_wheel for pystache
  Destination directory: /Users/cwj/aws-cfn-bootstrap-wheelhouse
  Running setup.py bdist_wheel for aws-cfn-bootstrap
  Destination directory: /Users/cwj/aws-cfn-bootstrap-wheelhouse
Successfully built python-daemon pystache aws-cfn-bootstrap
Cleaning up...

The aws-cfn-bootstrap-wheelhouse directory we specified has been created and now contains a .whl file for the aws-cfn-bootstrap module and its dependencies

1
2
3
4
5
6
$ ls -1 aws-cfn-bootstrap-wheelhouse
aws_cfn_bootstrap-1.4-py2-none-any.whl
lockfile-0.10.2-py2-none-any.whl
pystache-0.5.4-py2-none-any.whl
python_daemon-1.6.1-py2-none-any.whl
setuptools-6.1-py2.py3-none-any.whl

Creating a tarball of this directory yields an artifact I can place in an S3 bucket for my infrastructure, along side my own copy of get-pip.py. I have versioned these artifaces with a date stamp in their file names, and because there’s nothing proprietary about the artifacts, I have marked them as world-readable. After updating our bootstrap code in the appropriate SparkleFormation registry, the relevant bootstrap script reads as follows:

1
2
3
4
5
curl -o get-pip.py https://s3.amazonaws.com/my_infrastructure_bucket/get-pip-12172014.py
curl -o aws-cfn-bootstrap-wheelhouse.tar.gz https://s3.amazonaws.com/my_infrastructure_bucket/aws-cfn-bootstrap-wheelhouse-12172014.tar.gz
python /tmp/get-pip.py
tar -zxvf aws-cfn-bootstrap-wheelhouse.tar.gz
pip install --no-index --find-links=/tmp/aws-cfn-bootstrap-wheelhouse aws-cfn-bootstrap

This process is relatively simple and can be distiled into a CI/CD pipeline job, but, as I have been unable to find tagged versions of the module, it might only be appropriate to build new artifacts on a manual trigger.

This article was originally published as part of the 2014 AWS Advent series.

Introduction

This article assumes some familiarity with CloudFormation concepts such as stack parameters, resources, mappings and outputs. See the AWS Advent CloudFormation Primer for an introduction.

Although CloudFormation templates are billed as reusable, many users will attest that as these monolithic JSON documents grow larger, they become “all encompassing JSON file[s] of darkness,” and actually reusing code between templates becomes a frustrating copypasta exercise.

From another perspective these JSON documents are actually just hashes, and with a minimal DSL we can build these hashes programmatically. SparkleFormation provides a Ruby DSL for merging and compiling hashes into CFN templates, and helpers which invoke CloudFormation’s intrinsic functions (e.g. Ref, Attr, Join, Map).

SparkleFormation’s DSL implementation is intentionally loose, imposing little of its own opinion on how your template should be constructed. Provided you are already familiar with CloudFormation template concepts and some minimal ammount of Ruby, the rest is merging hashes.

Templates

Just as with CloudFormation, the template is the high-level object. In SparkleFormation we instantiate a new template like so:

1
SparkleFormation.new(:foo)

But an empty template isn’t going to help us much, so let’s step into it and at least insert the required AWSTemplateFormatVersion specification:

1
2
3
SparkleFormation.new(:foo) do
  _set('AWSTemplateFormatVersion', '2010-09-09')
end

In the above case we use the _set helper method because we are setting a top-level key with a string value. When we are working with hashes we can use a block syntax, as shown here adding a parameter to the top-level Parameters hash that CloudFormation expects:

1
2
3
4
5
6
7
8
9
SparkleFormation.new(:foo) do
  _set('AWSTemplateFormatVersion', '2010-09-09')

  parameters(:food) do
    type 'String'
    description 'what do you want to eat?'
    allowed_values %w( tacos nachos hotdogs )
  end
end

Reusability

SparkleFormation provides primatives to help you build templates out of reusable code, namely:

  • Components
  • Dynamics
  • Registries

Components

Here’s a component we’ll name environment which defines our allowed environment parameter values:

1
2
3
4
5
6
7
8
SparkleFormation.build do
  _set('AWSTemplateFormatVersion', '2010-09-09')
  parameters(:environment) do
    type 'String'
    default 'test'
    allowed_values %w( test staging production )
  end
end

Resources, parameters and other CloudFormation configuration written into a SparkleFormation component are statically inserted into any templates using the load method. Now all our stack templates can reuse the same component so updating the list of environments across our entire infrastructure becomes a snap. Once a template has loaded a component, it can then step into the configuration provided by the component to make modifications.

In this template example we load the environment component (above) and override the allowed values for the environment parameter the component provides:

1
2
3
4
5
SparkleFormation.new(:perpetual_beta).load(:environment).overrides do
  parameters(:environment) do
    allowed_values %w( test staging )
  end
end

Dynamics

Where as components are loaded once at the instantiation of a SparkleFormation template, dynamics are inserted one or more times throughout a template. They iteratively generate unique resources based on the name and optional configuration they are passed when inserted.

In this example we insert a launch_config dynamic and pass it a config object containing a run list:

1
2
3
4
5
6
SparkleFormation.new('zookeeper').load(:base).overrides do
  dynamic!(:launch_config, 'zookeeper', :run_list => ['role[zookeeperd]'])

  ...

end

The launch_config dynamic (not pictured) can then use intrisic functions like Fn::Join to insert data passed in the config deep inside a launch configuration, as in this case where we want our template to tell Chef what our run list should be.

Registries

Similar to dynamics, a registry entry can be inserted at any point in a SparkleFormation template or dynamic. e.g. a registry entry can be used to share the same metadata between both AWS::AutoScaling::LaunchConfiguration and AWS::EC2::Instance resources.

Translating a ghost of AWS Advent past

This JSON template from a previous AWS Advent article provisions a single EC2 instance into an existing VPC subnet and security group:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
{
    "AWSTemplateFormatVersion" : "2010-09-09",

    "Description" : "make an instance, based on region, ami, subnet, and security group",

    "Parameters" : {

        "KeyName" : {
            "Description" : "Name of an existing EC2 KeyPair to enable SSH access to the instance",
            "Type" : "String"
        },

        "VpcId" : {
            "Type" : "String",
            "Description" : "VpcId of your existing Virtual Private Cloud (VPC)"
        },

        "SubnetId" : {
            "Type" : "String",
            "Description" : "SubnetId of an existing subnet in your Virtual Private Cloud (VPC)"
        },

        "AmiId" : {
            "Type" : "String",
            "Description" : "AMI to use"

        },

        "SecurityGroupId": {
            "Type" : "String",
            "Description" : "SecurityGroup to use"
        }

    },

    "Resources" : {

        "Ec2Instance" : {
            "Type" : "AWS::EC2::Instance",
            "Properties" : {
                "ImageId" : { "Ref" : "AmiId" },
                "SecurityGroupIds" : [{ "Ref" : "SecurityGroupId" }],
                "SubnetId" : { "Ref" : "SubnetId" },
                "KeyName" : { "Ref" : "KeyName" },
                "UserData" : { "Fn::Base64" : { "Fn::Join" :
                  ["", [
                        "#!/bin/bash -v\n",
                        "curl http://aprivatebucket.s3.amazonaws.com/bootstrap.sh -o /tmp/bootstrap.sh\n",
                        "bash /tmp/bootstrap.sh\n",
                        "# If all went well, signal success\n",
                        "cfn-signal -e $? -r 'Chef Server configuration'\n"
                    ]]}}
            }
        }
    },

    "Outputs" : {
        "InstanceId" : {
            "Value" : { "Ref" : "Ec2Instance" },
            "Description" : "Instance Id of newly created instance"
        },

        "Subnet" : {
            "Value" : { "Ref" : "SubnetId" },
            "Description" : "Subnet of instance"
        },

        "SecurityGroupId" : {
            "Value" : { "Ref" : "SecurityGroupId" },
            "Description" : "Security Group of instance"
        }
    }

}

Not terrible, but the JSON is a little hard on the eyes. Here’s the same thing in Ruby, using SparkleFormation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
SparkleFormation.new(:vpc_instance).new do
  set!('AWSTemplateFormatVersion' '2010-09-09')
  description 'make an instance, based on region, ami, subnet, and security group'

  parameters do
    key_name do
      type 'String'
      description 'Name of an existing EC2 KeyPair to enable SSH access to the instance'
    end
    vpc_id do
      type 'String'
      description 'VpcId of your existing Virtual Private Cloud (VPC)'
    end
    subnet_id do
      type 'String'
      description 'SubnetId of an existing subnet in your Virtual Private Cloud (VPC)'
    end
    ami_id do
      type 'String'
      description 'AMI to use'
    end
    security_group_id do
      type 'String'
      description 'SecurityGroup to use'
    end
  end

  resources(:ec2_instance) do
    type 'AWS::EC2::Instance'
    properties do
      image_id ref!(:ami_id)
      security_group_ids [ref!(:security_group_id)]
      subnet_id ref!(:subnet_id)
      key_name ref!(:key_name)
      user_data base64!(
        join!(
          "#!/bin/bash -v\n",
          "curl http://aprivatebucket.s3.amazonaws.com/bootstrap.sh -o /tmp/bootstrap.sh\n",
          "bash /tmp/bootstrap.sh\n",
          "# If all went well, signal success\n",
          "cfn-signal -e $? -r 'Chef Server configuration'\n"
        )
      )
    end
  end

  outputs do
    instance_id do
      description 'Instance Id of newly created instance'
      value ref!(:instance_id)
    end
    subnet do
      description 'Subnet of instance'
      value ref!(:subnet_id)
    end
    security_group_id do
      description 'Security group of instance'
      value ref!(:security_group_id)
    end
  end

end

Without taking advantage of any of SparkleFormation’s special capabilities, this translation is already a few lines shorter and easier to read as well. That’s a good start, but we can do better.

The template format version specification and parameters required for this template are common to any stack where EC2 compute resources may be used, whether they be single EC2 instances or Auto Scaling Groups, so lets take advantage of some SparkleFormation features to make them reusable.

Here we have a base component that inserts the common parameters into templates which load it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
SparkleFormation.build do
  set!('AWSTemplateFormatVersion', '2010-09-09')

  parameters do
    key_name do
      type 'String'
      description 'Name of and existing EC2 KeyPair to enable SSH access to the instance'
    end
    vpc_id do
      type 'String'
      description 'VpcId of your existing Virtual Private Cloud (VPC)'
    end
    subnet_id do
      type 'String'
      description 'SubnetId of an existing subnet in your Virtual Private Cloud (VPC)'
    end
    ami_id do
      type 'String'
      description 'AMI You want to use'
    end
    security_group_id do
      type 'String'
      description 'SecurityGroup to use'
    end
  end

  outputs do
    subnet do
      description 'Subnet of instance'
      value ref!(:subnet_id)
    end
    security_group_id do
      description 'Security group of instance'
      value ref!(:security_group_id)
    end
  end

end

Now that the template version and common parameters have moved into the new base component, we can make use of them by loading that component as we instantiate our new template, specifying that the template will override any pieces of the component where the two intersect.

Let’s update the SparkleFormation template to make use of the new base component:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
SparkleFormation.new(:vpc_instance).load(:base).overrides do

  description 'make an instance, based on region, ami, subnet, and security group'

  resources(:ec2_instance) do
    type 'AWS::EC2::Instance'
    properties do
      image_id ref!(:ami_id)
      security_group_ids [ref!(:security_group_id)]
      subnet_id ref!(:subnet_id)
      key_name ref!(:key_name)
      user_data base64!(
        join!(
          "#!/bin/bash -v\n",
          "curl http://aprivatebucket.s3.amazonaws.com/bootstrap.sh -o /tmp/bootstrap.sh\n",
          "bash /tmp/bootstrap.sh\n",
          "# If all went well, signal success\n",
          "cfn-signal -e $? -r 'Chef Server configuration'\n"
        )
      )
    end
  end

  outputs do
    instance_id do
      description 'Instance Id of newly created instance'
      value ref!(:instance_id)
    end
  end
end

Because the basecomponent includes the parameters we need, the template no longer explicitly describes them.

Advanced tips and tricks

Since SparkleFormation is Ruby, we can get a little fancy. Let’s say we want to build 3 subnets into an existing VPC. If we know the VPC’s /16 subnet we can provide it as an environment variable (export VPC_SUBNET="10.1.0.0/16"), and then call that variable in a template that generates additional subnets:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
SparkleFormation.build do
  set!('AWSTemplateFormatVersion', '2010-09-09')

  octets = ENV['VPC_SUBNET].split('.').slice(0,2).join('.')

  subnets = %w(1 2 3)

  parameters(:vpc_id) do
    type 'String'
    description 'Existing VPC ID'
  end

  parameters(:route_table_id) do
    type 'String'
    description 'Existing VPC Route Table'
  end

  subnets.each do |subnet|
    resources("vpc_subnet_#{subnet}".to_sym) do
    type 'AWS::EC2::Subnet'
    properties do
      vpc_id ref!(:vpc_id)
      cidr_block octets + '.' + subnet + '.0/24'
      availability_zone 'us-west-2a'
    end
  end

  resources("vpc_subnet_route_table_association_#{subnet}".to_sym) do
    type 'AWS::EC2::SubnetRouteTableAssociation'
    properties do
      route_table_id ref!(:route_table_id)
      subnet_id ref!("vpc_subnet_#{subnet}".to_sym)
    end
  end
end

Of course we could place the subnet and route table association resources into a dynamic, so that we could just call the dynamic with some config:

1
2
3
subnets.each do |subnet|
  dynamic!(:vpc_subnet, subnet, subnet_cidr => octets + '.' + subnet + '.0/24')
end

Okay, this all sounds great! But how do I operate it?

SparkleFormation by itself does not implement any means of sending its output to the CloudFormation API. In this simple case, a SparkleFormation template named ec2_example.rb is output to JSON which you can use with CloudFormation as usual:

1
2
3
4
5
6
require 'sparkle_formation'
require 'json'

puts JSON.pretty_generate(
  SparkleFormation.compile('ec2_example.rb')
)

The knife-cloudformation plugin for Chef’s knife command adds sub-commands for creating, updating, inspecting and destroying CloudFormation stacks described by SparkleFormation code or plain JSON templates. Using knife-cloudformation does not require Chef to be part of your toolchain, it simply leverages knife as an execution platform.

Advent readers may recall a previous article on strategies for reusable CloudFormation templates which advocates a “layer cake” approach to deploying infrastructure using CloudFormation stacks:

The overall approach is that your templates should have sufficient parameters and outputs to be re-usable across environments like dev, stage, qa, or prod and that each layer’s template builds on the next.

Of course this is all well and good, until we find ourselves, once again, copying and pasting. This time its stack outputs instead of JSON, but again, we can do better.

The recent 0.2.0 release of knife-cloudformation adds a new --apply-stack parameter which makes operating “layer cake” infrastructure much easier.

When passed one or more instances of --apply-stack STACKNAME, knife-cloudformation will cache the outputs of the named stack and use the values of those outputs as the default values for parameters of the same name in the stack you are creating.

For example, a stack “coolapp-elb” which provisions an ELB and an associated security group has been configured with the following outputs:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ knife cloudformation describe coolapp-elb
Resources for stack: coolapp-elb
Updated                  Logical Id                Type                                     Status
Status Reason
2014-11-17 22:54:28 UTC  CoolappElb               AWS::ElasticLoadBalancing::LoadBalancer
CREATE_COMPLETE
2014-11-17 22:54:47 UTC  CoolappElbSecurityGroup  AWS::EC2::SecurityGroup
CREATE_COMPLETE

Outputs for stack: coolapp-elb
Elb Dns: coolapp-elb-25352800.us-east-1.elb.amazonaws.com
Elb Name: coolapp-elb
Elb Security Group: coolapp-elb-CoolappElbSecurityGroup-JSR4RUT66Z66

The values from the ElbName and ElbSecurityGroup would be of use to us in attaching an app server auto scaling group to this ELB, and we could use those values automatically by setting parameter names in the app server template which match the ELB stack’s output names:

1
2
3
4
5
6
7
8
9
10
11
12
13
SparkleFormation.new(:coolapp_asg) do

  parameters(:elb_name) do
    type 'String'
  end

 parameters(:elb_security_group) do
    type 'String'
  end

  ...

end

Once our coolapp_asg template uses parameter names that match the output names from the coolapp-elb stack, we can deploy the app server layer “on top” of the ELB layer using --apply-stack:

1
$ knife cloudformation create coolapp-asg --apply-stack coolapp-elb

Similarly, if we use a SparkleFormation template to build our VPC, we can set a number of VPC outputs that will be useful when building stacks inside the VPC:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
  outputs do
    vpc_id do
      description 'VPC ID'
      value ref!(:vpc_id)
    end
    subnet_id do
      description 'VPC Subnet ID'
      value ref!(:subnet_id)
    end
    route_table_id do
      description 'VPC Route Table'
      value ref!(:route_table)
    end
  end

This ‘apply stack’ approach is just the latest way in which the SparkleFormation tool chain can help you keep your sanity when building infrastructure with CloudFormation.

Further reading

I hope this brief tour of SparkleFormation’s capabilities has piqued your interest. For some AWS users, the combination of SparkleFormation and knife-cloudformation helps to address a real pain point in the infrastructure-as-code tool chain, easing the development and operation of layered infrastructure.

Here’s some additional material to help you get started:

Introduction

I have a lot of warm feelings for Sensu, a flexible, scalable open source monitoring framework. At Needle our team has used Chef to build a Sensu instance for each of our environments, allowing us to test our automated monitoring configuration before promoting it to production, just like any other code we deploy.

Speaking of deploying code, isn’t it obnoxious to see alerts from your monitoring system when you know that your CM tool or deploy method is running? We think so too, so I set about writing a Chef handler to take care of this annoyance.

Sensu API and Stashes

Among Sensu’s virtues is its RESTful API which provides access to the data Sensu servers collect, such as clients & events.

The API also exposes an interface to stashes. Stashes are arbitrary JSON documents, so any JSON formatted data can be stored under the /stashes API endpoint.

Sensu handlers are expected to check the stashes under the /stashes/silence path when processing events, and silence events whose client has a matching stash at /stashes/silence/$CLIENT or whose client and check match a stash at /stashes/silence/$CLIENT/$CHECK.

Chef

Chef’s handler functionality can be used to trigger certain behaviors in response to specific situations during a chef-client run. At this time there are three different handler types implemented by Chef::Handler:

  • start handlers, triggered when the defined aspect of a chef-run starts
  • exception handlers, triggered when the defined aspect of a chef-run fails
  • report handlers, triggered when the defined aspect of a chef-run succeeds

Tying it all together

Combined, Sensu’s stash API endpoint and Chef’s exception and report handlers provide an excellent means for Chef to silence Sensu alerts during the time it is running on a node.

We achieved our goal by implementing Chef::Handler::Sensu::Silence, which runs as a start handler, and Chef::Handler::Sensu::Unsilence, which runs as both an exception and a report handler. All of this is bundled up in our chef-sensu-handler cookbook.

The cookbook installs and configures the handler using the node['chef_client']['sensu_api_url'] attribute. Once configured, the handler will attempt to create a stash under /stashes/silence/$CLIENT when the Chef run starts, and delete that stash when the Chef run fails or succeeds.

We also wanted to guard against conditions where Chef could fail catastrophically and its exception handlers might not run. To counter that possibility, the handler writes a timestamp and owner name into the stash it creates when silencing the client:

1
{ "timestamp": 1380133104, "owner": "chef" }

We then authored a Sensu plugin, check-silenced.rb, which compares the timestamp in existing silence stashes against a configurable timeout (in seconds). Once configured as part of our Sensu monitoring system, this plugin acts as a safety net which prevents clients from being silenced too long.

Since releasing a Campfire LWRP for Chef a few weeks ago, my team has evaluated and subsequently transitioned to Atlassian’s HipChat service. Luckily I was able to reuse the framework I had already created for Campfire as the basis for a HipChat LWRP.

The LWRP should work with any modern version of Chef. When you use include_recipe to access the LWRP in your own recipes, the default recipe for this cookbook will install the required ‘hipchat’ gem.

Attributes

  • room - the name of the room you would like to speak into (requied).
  • token - authentication token for your HipChat account (required).
  • nickname - the nickname to be used when speaking the message (required).
  • message - the message to speak. If a message is not specified, the name of the hipchat_msg resource is used.
  • notify - toggles whether or not users in the room should be notified by this message (defaults to true).
  • color - sets the color of the message in HipChat. Supported colors include: yellow, red, green, purple, or random (defaults to yellow).
  • failure_ok - toggles whether or not to catch the exception if an error is encountered connecting to HipChat (defaults to true).

Usage example

1
2
3
4
5
6
7
8
9
include_recipe 'hipchat'

hipchat_msg 'bad news' do
  room 'The Pod Bay'
  token '0xdedbeef0xdedbeef0xdedbeef'
  nickname 'HAL9000'
  message "Sorry Dave, I'm afraid I can't do that: #{some_error}"
  color 'red'
end

Availability

You can find this cookbook on github or on the Opscode community site.

This weekend I decided that I’d had enough with reusing the same pattern for manipulating tags on EC2 instances across multiple recipes. Since Opscode already publishes an aws cookbook with providers for other AWS resources, I figured it would be worthwhile to create a provider for manipulating these tags and contribute it back upstream.

The result of this Saturday project is the resource_tag LWRP. Source available here, Opscode ticket here.

Actions

  • add - Add tags to a resource.
  • update - Add or modify existing tags on a resource – this is the default action.
  • remove - Remove tags from a resource, but only if the specified values match the existing ones.
  • force_remove - Remove tags from a resource, regardless of their values.

Attribute Parameters

  • aws_secret_access_key, aws_access_key - passed to Opscode::AWS:Ec2 to authenticate, required.
  • tags - a hash of key value pairs to be used as resource tags, (e.g. { "Name" => "foo", "Environment" => node.chef_environment },) required.
  • resource_id - resources whose tags will be modified. The value may be a single ID as a string or multiple IDs in an array. If no resource_id is specified the name attribute will be used.

Usage

resource_tag can be used to manipulate the tags assigned to one or more AWS resources, i.e. ec2 instances, ebs volumes or ebs volume snapshots.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
include_recipe "aws"
aws = data_bag_item("aws", "main")

# Assigining tags to a node to reflect it's role and environment:
aws_resource_tag node['ec2']['instance_id'] do
  aws_access_key aws['aws_access_key_id']
  aws_secret_access_key aws['aws_secret_access_key']
  tags({"Name" => "www.example.com app server",
        "Environment" => node.chef_environment})
  action :update
end

# Assigning a set of tags to multiple resources, e.g. ebs volumes in a disk set:
aws_resource_tag 'my awesome raid set' do
  aws_access_key aws['aws_access_key_id']
  aws_secret_access_key aws['aws_secret_access_key']
  resource_id [ "vol-d0518cb2", "vol-fad31a9a", "vol-fb106a9f", "vol-74ed3b14" ]
  tags({"Name" => "My awesome RAID disk set",
        "Environment" => node.chef_environment})
end

When setting tags on the node’s own EC2 instance, I recommend wrapping resource_tag resources in a conditional like if node.has_key?('ec2') so that your recipe will still run on Chef nodes outside of EC2 as well.

Like many small teams, Needle uses 37signals’ Campfire chat platform for collaborating online. Along with messages exchanged between coworkers, we also use Campfire for announcing new git commits, jira tickets and successful application deployments.

Since the code I’ve been using to send messages from Chef recipes to Campfire is virtually identical between a number of our cookbooks, I decided to turn that code into a LWRP that anyone can use in their own recipes. The cookbook for this LWRP is available on github.

Requirements

  • a Campfire API token (these are unique to each Campfire user, so if you want your messages to come from a particular user, get their token)
  • the tinder gem (installed by the campfire::default recipe)

Attributes

  • subdomain - the subdomain for your Campfire instance (required)
  • room - the name of the room you would like to speak into (requied)
  • token - authentication token for your Campfire account (required)
  • message - the message to speak. If a message is not specified, the name of the campfire_msg resource is used.
  • paste - toggles whether or not to send the message as a monospaced “paste” (defaults to false)
  • play_before - play the specified sound before speaking the message
  • play_after - play the specified sound after speaking the message
  • failure_ok - toggles whether or not to catch the exception if an error is encountered connecting to Campfire (defaults to true)

A list of emoji and sounds available in Campfire can be found here: http://www.emoji-cheat-sheet.com/

Usage examples

1
2
3
4
5
6
7
8
9
include_recipe 'campfire'

campfire_msg 'bad news' do
  subdomain 'example'
  room 'Important Stuff'
  token '0xdedbeef0xdedbeef0xdedbeef'
  message "I have some bad news... there was an error: #{some_error}"
  play_after 'trombone'
end

Chef’s deploy and deploy_revision resources provide a useful mechanism for deploying applications as part of a chef-client or chef-solo run, without depending on an external system (e.g. Capistrano.) Many Chef users learning to use these resources for the first time will find that they also need to install an SSH deploy key and an SSH wrapper script for Git before they can make effective use of these deploy resources, and that the Chef wiki doesn’t provide much documentation around this issue.

Enter deploy_wrapper: a Chef definition which handles the installation of an SSH deploy key and SSH wrapper script to be used by a deploy or deploy_revision resource.

Before deploy_wrapper, a recipe to configure the required resources to make an automated deploy or deploy_revision possible might look something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
directory '/root/.ssh' do
  owner "root"
  group "root"
  mode 0640
end

directory '/opt/myapp/shared' do
  owner "root"
  group "root"
  mode 0755
  recursive true
end

deploy_key = data_bag_item('keys', 'myapp_deploy_key')

template "/root/.ssh/myapp_deploy_key" do
  source "deploy_key.erb"
  owner "root"
  group "root"
  mode 0600
  variables({ :deploy_key => deploy_key })
end

template "/opt/myapp/shared/myapp_deploy_wrapper.sh" do
  source "ssh_wrapper.sh.erb"
  owner "root"
  group "root"
  mode 0755
  variables({
    :deploy_key_path => "/root/.ssh/myapp_deploy_key"
  })
end

deploy_revision "/opt/myapp" do
  repository node['myapp']['repository']
  revision node['myapp']['revision']
  ...
  ssh_wrapper "/opt/myapp/shared/myapp_deploy_wrapper.sh"
end

Not counting the source to template files for these resources, thats almost 30 lines of code just to set the stage for a deployment. It didn’t take long for me to grow tired of reusing this rather verbose pattern across a growing number of recipes.

Here’s how I accomplish the same thing with the deploy_wrapper definition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
deploy_key = data_bag_item('keys', 'myapp_deploy_key')

deploy_wrapper "myapp" do
  ssh_wrapper_dir "/opt/myapp/shared"
  ssh_key_dir "/root/.ssh"
  ssh_key_data deploy_key
  sloppy true
end

deploy_revision "/opt/myapp" do
  repository node['myapp']['repository']
  revision node['myapp']['revision']
  ...
  ssh_wrapper "/opt/myapp/shared/myapp_deploy_wrapper.sh"
end

Much better, right? Well, a lot shorter anyway. Now let’s talk about what the deploy_wrapper parameters used in the above example are doing.

The ssh_key_dir and ssh_wrapper_dir parameters specify directories which will be created by Chef. In the case of ssh_wrapper_dir, the git SSH wrapper script will automatically be created in this directory following the pattern “APPNAME_deploy_wrapper.sh”, using the value of the name parameter (in this case, myapp) in place of “APPNAME”.

Similarly, an SSH key file containing the data passed to the ssh_key_data parameter will be created in the directory specified as the value for the ssh_key_dir parameter. The key file will be named following the pattern “APPNAME_deploy_key”, using the value of the name parameter (myapp) in place of “APPNAME”.

The sloppy parameter is the only optional one. Because the default configuration of most most ssh installations is to require manual verification when accepting a remote host’s key for the first time, the sloppy parameter allows one to toggle key checking (StrictHostKeyChecking) on or off.

When the value for sloppy is true, the wrapper script will accept any host key without prompting. The default value for sloppy is false, meaning that additional Chef resources, or … *gasp* … manual intervention, will be required in order to set up a known_hosts file before deployments can run successfully.

Monitoring with Pingdom

Swedish firm Pingdom offers a flexible, affordable service for monitoring the availability and response time of web sites, applications and other services. At Needle we provision an instance of our chat server for each partner we work with, and as a result I’ve found myself creating a Pingdom service check to monitor each of these instances. As you might imagine, this is a rather repetitive task, and the configuration is basically the same for each service check – a process ripe for automation!

Thankfully Pingdom provides a REST API for interacting with the service programatically, which has made it possible for me to write a Chef LWRP for creating and modifying Pingdom service checks. Source available here: http://github.com/cwjohnston/chef-pingdom

Requirements

Requires Chef 0.7.10 or higher for Lightweight Resource and Provider support. Chef 0.10+ is recommended as this cookbook has not been tested with earlier versions.

A valid username, password and API key for your Pingdom account is required.

Recipes

This cookbook provides an empty default recipe which installs the required json gem (verison <=1.6.1). Chef already requires this gem, so it’s really just included in the interests of completeness.

Libraries

This cookbook provides the Opscode::Pingdom::Check library module which is required by all the check providers.

Resources and Providers

This cookbook provides a single resource (pingdom_check) and corresponding provider for managing Pingdom service checks.

pingdom_check resources support the actions add and delete, add being the default. Each pingdom_check resource requires the following resource attributes:

  • host - indicates the hostname (or IP address) which the service check will target
  • api_key - a valid API key for your Pingdom account
  • username - your Pingdom username
  • password - your Pingdom password

pingdom_check resources may also specifiy values for the optional type and check_params attributes.

The type attribute will accept one of the following service check types. If no value is specified, the check type will default to http.

  • http
  • tcp
  • udp
  • ping
  • dns
  • smtp
  • pop3
  • imap

The optional check_params attribute is expected to be a hash containing key/value pairs which match the type-specific parameters defined by the Pingdom API. If no attributes are provided for check_params, the default values for type-specific defaults will be used.

Usage

In order to utilize this cookbook, put the following at the top of the recipe where Pingdom resources are used:

1
include_recipe 'pingdom'

The following resource would configure a HTTP service check for the host foo.example.com:

1
2
3
4
5
6
pingdom_check 'foo http check' do
  host 'foo.example.com'A
  api_key node[:pingdom][:api_key]
  username node[:pingdom][:username]
  password node[:pingdom][:password]
end

The resulting HTTP service check would be created using all the Pingdom defaults for HTTP service checks.

The following resource would configure an HTTP service check for the host bar.example.com utilizing some of the parameters specific to the HTTP service check type:

1
2
3
4
5
6
7
8
9
10
11
pingdom_check 'bar.example.com http status check' do
  host 'bar.example.com'
  api_key node[:pingdom][:api_key]
  username node[:pingdom][:username]
  password node[:pingdom][:password]
  check_params :url => "/status",
               :shouldcontain => "Everything is OK!",
               :sendnotificationwhendown => 2,
               :sendtoemail => "true",
               :sendtoiphone => "true"
end

Caveats

At this time I consider the LWRP to be incomplete. The two major gaps are as follows:

  • Changing the values for check_params does not actually update the service check’s configuration. I have done most of the initial work to implement this (available in the check-updating branch on github), but there are still bugs.
  • The LWRP has no support for managing contacts.

Future

  • Add update action for service checks which modifies existing checks to match the values from check_params
  • Add enable and disable actions for service checks
  • Add support for managing contacts (pingdom_contact resource)
  • Convert TrueClass attribute values to "true" strings
  • Validate classes passed as check_params values
  • One must look up contact IDs manually when setting contactids in check_params

Introduction

Recently I have been experimenting with the logging-as-a-service platform at Loggly. It seems pretty promising, and there’s a free tier for those who are indexing less than 200MB per day.

Since I am using Chef to manage my systems, I decided I would take a crack at writing a LWRP that would allow me to manage devices and inputs on my Loggly account through Chef. This makes it possible for new nodes to register themselves as Loggly devices when they are provisioned, without requiring me to make a trip to the Loggly control panel. The resulting cookbook is available here: http://github.com/cwjohnston/chef-loggly

Requirements

  • Valid Loggly account username and password
  • json ruby gem

Required node attributes

  • node['loggly']['username'] - Your Loggly username.
  • node['loggly']['password'] - Your Loggly password.

In the future these attributes should be made optional so that usernames and passwords can be specified as parameters for resource attributes.

Recipes

  • default - simply installs the json gem. Chef requires this gem as well, so it should already be available.
  • rsyslog - creates a loggly input for receiving syslog messages, registers the node as a device on that input and configures rsyslog to forward syslog messages there.

Resources

loggly_input - manage a log input

Attributes

  • domain - The subdomain for your loggly account
  • description - An optional descriptor for the input
  • type - The kind of input to create. May be one of the following:
    • http
    • syslogudp
    • syslogtcp
    • syslog_tls
    • syslogtcp_strip
    • syslogudp_strip

Actions

  • create - create the named input (default)
  • delete - delete the named input

Usage

1
2
3
4
5
6
loggly_input "production-syslog" do
    domain "examplecorp"
    type "syslogtcp"
    description "syslog messages from production nodes"
    action :create
end

loggly_device - manage a device which sends logs to an input

The name of a loggly_device resource should be the IP address for the device. Loggly doesn’t do DNS lookups, it just wants the device’s IP.

Resource Attributes

  • username - Your Loggly username. if no value is provided for this attribute, the value of node['loggly']['username'] will be used.
  • password - Your Loggly password. if no value is provided for this attribute, the value of node['loggly']['password'] will be used.
  • domain - The subdomain for your loggly account
  • input - the name of the input this device should be added to

Resource Actions

  • add - add the device to the named input (default)
  • delete - remove the device from the named input

Usage

1
2
3
4
5
loggly_device node[:ipaddress] do
    domain "examplecorp"
    input "production-syslog"
    action :add
end