Showing posts with label Solaris. Show all posts
Showing posts with label Solaris. Show all posts

Monday, May 31, 2010

Improving Satan and Solaris SMF

One of the features of Solaris that we heavily rely on in our production environment at work is Service Management Facility or SMF for short. SMF can start/stop/restart services, track dependencies between services and use that to optimize the boot process and lots more. Often handy in production environment is that SMF keeps track of processes that a particular service started and if a process dies, SMF restarts its services.

One gripe I have with SMF is that its process monitoring capabilities are rather simple. A process associated with a contract (service) must die in order for SMF to get the idea that something is wrong and that the service should be restarted. In practice, more often than not a process gets into a weird state that prevents it from working properly, yet it doesn't die. Failures might include excessive cpu or memory usage or even application level failures that can be detected only by interacting with the application (e.g. http health check). SMF in its current implementation is incapable of detecting these failures. And this is where Satan comes into the play.

Satan a small ruby script that monitors a process and following the Crash-only Software philosophy, kills it when a problem is detected. It then relies on SMF to detect the process death(s) and restart the given service. I fell in love with the simplicity of Satan (which was inspired by God) and started exploring the feasibility of using it to improve the reliability of SMF on our production servers.

Upon a code review of the script, I noticed several things that I wished were implemented differently. Here are some:
  • Satan watches processes rather than services as defined via SMF
  • One Satan instance is designed to watch many different processes for different services, which adds unnecessary complexity and lacks isolation
  • Satan is merciless (what a surprise! :-) ) and uses kill -9 without a warning
  • Satan has no test suite!!! :-( (i.e. I must presume that it doesn't work)


Thankfully the source code was out there on GitHub and licensed under BSD license so it was just a matter of a few keystrokes to fork it (open source FTW!). By the time I was done with my changes, there wasn't much of the original source code left, but oh well :-)

I'm happy to present to you http://github.com/IgorMinar/satan for review and comments. The main changes I made are the following:
  • One Satan instance watches single SMF service and its one or more processes
  • The single service to monitor design allows for automatic monitoring suspension via SMF dependencies while the monitored service is being started, restarted or disabled
  • Several bugfixes around how rule failures and recoveries are counted before a service is deemed unhealthy
  • At first Satan tries to invoke svcadm restart and only if that doesn't occur within a specified grace period, it uses kill -9 to kill all processes for the given contract (service)
  • Satan now has decent RSpec test suite (more on that in my previous post)
  • Improved HTTP condition with a timeout setting
  • New JVM free heap space condition to monitor those pesky JVM memory leaks
  • Extensible design now allows for new monitoring conditions (rules) to be defined outside of the main Satan source code
As always there are more things to improve and extend but, I'm hoping that my Satan fork will be a decent version that will allow us to keep our services running more reliably. If you have suggestions, or comments feel free to leave feedback.

Tuesday, January 13, 2009

Using ZFS with Mac OS X 10.5

A few days ago I got a new MacBook Pro. While waiting for it to be delivered, I started thinking about how I want to layout the installation of the OS. For a long long time I wanted to try to use ZFS file system on Mac and this looked like a wonderful opportunity. Getting rid of HFS+, which was causing me lots of problems (especially its case insensitive re-incarnation), sounds like a dream come true.

If you've never heard of ZFS before, check out this good 5min screencast of some of the important features.

A brief google search revealed that there are several people using and developing ZFS for Mac. There is a Mac ZFS porting project at http://zfs.macosforge.org and I found a lot of good info at AlBlue's blog.

Some noteworthy info:
  • The current ZFS port (build 119) is based on ZFS code that shipped with Solaris build 72
  • It's currently not possible to boot Mac OS X from a ZFS filesystem
  • Finder integration is not perfect yet - Finder lists a ZFS pool as an unmountable drive under devices
  • There are several reports of kernel panics, most of which appeared in connection to the use of cheap external USB disks (I haven't experienced any)
  • There are a bunch of minor issues, which I'm sure will eventually go away.
None of the above was a show stopper for me, so I went ahead with the installation. My plan was simple - repartition the internal hard drive to a small bootable partition and a large partition used by ZFS, which will hold my home directory and other filesystems.

Install ZFS

Even though MacOS X 10.5 comes with ZFS support, it's only a read-only support. In order to be able to really use ZFS, full ZFS implementation must be installed.

The installation is very simple and can be done by following these instructions: http://zfs.macosforge.org/trac/wiki/downloads. Alternatively, AlBlue created a fancy installer for the lazy ones out there.

Repartition Disk

Once ZFS is installed and the OS was rebooted, I could repartition the internal disk. If you are using an external hard drive, you'll most likely need to use zpool command instead.

First let's check what the disk looks like:
$ diskutil list
/dev/disk0
#:                       TYPE NAME                    SIZE       IDENTIFIER
0:      GUID_partition_scheme                        *298.1 Gi   disk0
1:                        EFI                         200.0 Mi   disk0s1
2:                  Apple_HFS boot                    297.8 Gi   disk0s2
Good, the internal disk was identified as /dev/disk0 and it currently contains an EFI (boot) slice and ~300G data slice/partition. Let's repartition the disk so that it contains two data partitions.
$ sudo diskutil resizeVolume disk0s2 40G ZFS tank 257G
Password:
Started resizing on disk disk0s2 boot
Verifying
Resizing Volume
Adjusting Partitions
Formatting new partitions
Formatting disk0s3 as ZFS File System with name tank
[ + 0%..10%..20%..30%..40%..50%..60%..70%..80%..90%..100% ]
Finished resizing on disk disk0
/dev/disk0
#:                       TYPE NAME                    SIZE       IDENTIFIER
0:      GUID_partition_scheme                        *298.1 Gi   disk0
1:                        EFI                         200.0 Mi   disk0s1
2:                  Apple_HFS boot                    39.9 Gi    disk0s2
3:                        ZFS tank                    252.0 Gi   disk0s3


Great, the disk was repartitioned and the existing data partition, which I call boot, was resized into a smaller 40GB partition and the extra space was used to create a ZFS pool called tank. Btw all the data on the boot partition was preserved.

Let's check my new pool:
$ zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
tank                    256G    360K    256G     0%  ONLINE     -
$ zpool status
pool: tank
state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
 still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
 pool will no longer be accessible on older software versions.
scrub: none requested
config:

 NAME        STATE     READ WRITE CKSUM
 tank        ONLINE       0     0     0
   disk0s3   ONLINE       0     0     0

errors: No known data errors
The warning above just means that a new ZFS storage format is available but is not used by the current pool. As far as I could find there are no benefits for upgrading to the new format on Mac, but if I did, I would lose compatibility with Macs that have only the read-only ZFS support.

Create Filesystems

So now that the new pool exists, I can create a shiny new filesystem using a single command:
$ sudo zfs create tank/me3x
$ zfs list
NAME        USED  AVAIL  REFER  MOUNTPOINT
tank        388K   252G   270K  /Volumes/tank
tank/me3x    19K   252G    19K  /Volumes/tank/me3x
To configure this new filesystem as my home directory, I created a temporary admin account, logged in under this account and mounted the ZFS fs as /Users/me3x:
$ sudo mv /Users/me3x /Users/me3x.hfs
$ sudo zfs set mountpoint=/Users/me3x tank/me3x
$ sudo cp -rp /Users/me3x.hfs /Users/me3x
That's it. My Mac account now resides on a ZFS file system. Now I can finally enjoy all the benefits of using ZFS on my OpenSolaris box in my office as well as on my Mac. Bye bye HFS, I won't miss you! 

Saturday, September 02, 2006

Solaris + Lighttpd + FastCGI + SSL HowTo

Update (07-05-18): You might also want to check out CoolStack, an optimized open source software stack for the Sun Solaris Operating System.

I hadn't heard about Lighttpd (Lighty) web server before I started to be interested in Ruby on Rails (RoR). Lighttpd is fast, scalable, secure, flexible and lightweight webserver which in RoR community is preferred production web server. One of the main advantages of Lighttpd compared to Apache, for example, is the built in support for FastCGI and very easy yet flexible configuration.

When I was preparing the production environment for one of the RoR applications I was working on lately, I found that even though there is quite a lot of HowTos for installing lighttpd on linux and MacOS, there is not a single one that I could find that would describe this procedure on Solaris / OpenSolaris OS.

While setting up the production environment I got stuck on some steps, and I hope that this HowTo will help anyone trying to deploy his RoR application on Solaris and to avoid problems I had.

Targeted configuration: Solaris 10 + Lighttpd + FastCGI + SSL

1. Prerequisites I started out with the core installation of Solaris 10, this means that I needed to add some packages. Before you start, make sure you have these Solaris packages installed:
  • SUNWhea >>> needed by lighttpd
  • SUNWtoo >>
  • SUNWlibmr >>> needed by fastcgi
  • SUNWlibm >>
If you plan to use only sun compiler to compile everything you will also need SUNWbtool and SUNWsprot packages.

From blastwave.org via pkg-get get:
  • openssl >>> needed by lighttpd for ssl
And one more thing, you must get Sun C Compiler, otherwise you won't be able to install ruby-fastcgi bindings (explained later). Sun C Compiler is part of Sun Studio, which you can download for free here (a free SDN account is needed). There are two ways described on the website how one can get the compiler:
  • option 1) 202MB - works
  • option 2) 365MB - haven't tried
Once downloaded, follow the "Setup" instructions on the website above.

Now we should be ready to start.

2. Lighttpd
At first we should install Lighttpd which can be downloaded from here. (I used gcc to compile this one)
./configure --with-openssl=/opt/csw --with-ldap --with-bzip2 --with-zlib
The output from the configure command should end like this:
Features:

enabled:
auth-crypt
auth-ldap
compress-bzip2
compress-deflate
compress-gzip
large-files
network-ipv6
network-openssl
regex-conditionals
...
...


Make sure the "network-openssl" is in the list of enabled features, if it is not there, check the output from configure tests and if you see that openssl was found and right below it is:
checking for BIO_f_base64 in -lcrypto... no
check your configure arguments and make sure that --with-openssl=/path/to/openssl is set correctly (if installed via pkg-get it will resided in /opt/csw). If this is not done properly everything will compile correctly but the --with-openssl option will be ignored silently!

Let's finish up the installation:

make
make install
Test if everything went well with -v switch
$ ligttpd -v
lighttpd-1.4.11 (ssl) - a light and fast webserver
Build-Date: Aug 30 2006 15:19:28
If there is no "(ssl)" after the version, something is wrong, check if you set --with-openssl=/path correctly.
3. FastCGI
Now we need to install FastCGI which can be downloaded from here:
./configure   --prefix=/usr/local
make
make install

4. ruby-fcgi

Next step is to install ruby-fcgi bindings. This is where I got really stuck. To be able to install this, you need the Sun C Compiler mentioned earlier. If you installed ruby and ruby gems via pkg-get as I did, it comes preconfigured to use Sun compiler by default, even if one is not installed. And if you don't have this compiler you will see all sorts of weird errors when installing gems with native extensions.

Once you have the compiler, installation is trivial. If you are building it manually, download sources here:
ruby install.rb config -- --with-fcgi-dir=/usr/local
ruby install.rb setup
ruby install.rb install
or when using ruby gems:
gem install fcgi -- --with-fcgi-dir=/usr/local
Note the double "--", that is not a typo, it's simply the way how to send parameters to extconf.rb

If you see an error like this:
install.rb: entering config phase...
---> lib
<--- lib ---> ext
---> ext/fcgi
/opt/csw/bin/ruby /root/ruby-fcgi-0.8.6/ext/fcgi/extconf.rb
checking for fcgiapp.h... yes
checking for FCGX_Accept() in -lfcgi... no
<--- ext/fcgi <--- ext
Note this part:
checking for fcgiapp.h... yes
checking for FCGX_Accept() in -lfcgi... no
It most likely means that you are using GCC and not Sun C Compiler, check if cc in your path is pointing to Sun C Compiler, you might also check if the environmental variable CC is not set to gcc. Let's test if fcgi and ruby-fcgi bindings were correctly installed by these commands in irb:
irb(main):001:0> require 'fcgi.so'
=> true
irb(main):001:0> require 'fcgi'
=> true
If both "require" calls return true, you have successfully installed FastCGI and ruby is able to invoke it. 5. SSL Certificate
If you don't have your server certificate yet, you can create a self signed one like this:
openssl req -new -x509 -keyout server.pem -nodes -out server.pem -days 1000
6. Configuring Lighty
Create lighttpd.conf
server.port = 443
server.bind = "0.0.0.0"
server.modules = ( "mod_rewrite", "mod_fastcgi", "mod_accesslog" )
url.rewrite = ( "^/$" => " index.html", "^([^.]+)$" => "$1.html" )
server.error-handler-404 = "/dispatch.fcgi"
server.document-root = "/path_to_your_app/public"
server.errorlog = "/path_to_your_app/log/server.log"
accesslog.filename = "/path_to_your_app/log/access_log"
ssl.engine = "enable"
ssl.pemfile = "/path_to_your_pem_file/server.pem"

fastcgi.server = (".fcgi" =>
( "localhost" =>
(
"min-procs" => 10,
"max-procs" => 10,
"socket"    => "/tmp/yourapp.fcgi.socket",
"bin-path"  => "/path_to_your_app/public/dispatch.fcgi",
"bin-environment" => ( "RAILS_ENV" => "production" )
))
)

mimetype.assign = (
".css"        =>  "text/css",
".gif"        =>  "image/gif",
".html"       =>  "text/html",
".jpeg"       =>  "image/jpeg",
".jpg"        =>  "image/jpeg",
".js"         =>  "text/javascript",
".pdf"        =>  "application/pdf",
".png"        =>  "image/png",
".txt"        =>  "text/plain",
)

7. Ta-da!
Let's start the server

lighttpd -f lighttpd.conf

I hope that these instructions helped you to get Lighty on Solaris.

If you have other recommendations or have problems with the installation, feel free to leave a comment.

Enjoy..

PS: A big thanks to my friend J who helped me figure out that you need Sun C compiler for ruby-fcgi to compile.

UPDATE: added reference to SUNWbtool and SUNWsprot packages in case you want to compile everything with Sun's compiler.