Showing posts with label Ruby. Show all posts
Showing posts with label Ruby. Show all posts

Monday, May 31, 2010

Improving Satan and Solaris SMF

One of the features of Solaris that we heavily rely on in our production environment at work is Service Management Facility or SMF for short. SMF can start/stop/restart services, track dependencies between services and use that to optimize the boot process and lots more. Often handy in production environment is that SMF keeps track of processes that a particular service started and if a process dies, SMF restarts its services.

One gripe I have with SMF is that its process monitoring capabilities are rather simple. A process associated with a contract (service) must die in order for SMF to get the idea that something is wrong and that the service should be restarted. In practice, more often than not a process gets into a weird state that prevents it from working properly, yet it doesn't die. Failures might include excessive cpu or memory usage or even application level failures that can be detected only by interacting with the application (e.g. http health check). SMF in its current implementation is incapable of detecting these failures. And this is where Satan comes into the play.

Satan a small ruby script that monitors a process and following the Crash-only Software philosophy, kills it when a problem is detected. It then relies on SMF to detect the process death(s) and restart the given service. I fell in love with the simplicity of Satan (which was inspired by God) and started exploring the feasibility of using it to improve the reliability of SMF on our production servers.

Upon a code review of the script, I noticed several things that I wished were implemented differently. Here are some:
  • Satan watches processes rather than services as defined via SMF
  • One Satan instance is designed to watch many different processes for different services, which adds unnecessary complexity and lacks isolation
  • Satan is merciless (what a surprise! :-) ) and uses kill -9 without a warning
  • Satan has no test suite!!! :-( (i.e. I must presume that it doesn't work)


Thankfully the source code was out there on GitHub and licensed under BSD license so it was just a matter of a few keystrokes to fork it (open source FTW!). By the time I was done with my changes, there wasn't much of the original source code left, but oh well :-)

I'm happy to present to you http://github.com/IgorMinar/satan for review and comments. The main changes I made are the following:
  • One Satan instance watches single SMF service and its one or more processes
  • The single service to monitor design allows for automatic monitoring suspension via SMF dependencies while the monitored service is being started, restarted or disabled
  • Several bugfixes around how rule failures and recoveries are counted before a service is deemed unhealthy
  • At first Satan tries to invoke svcadm restart and only if that doesn't occur within a specified grace period, it uses kill -9 to kill all processes for the given contract (service)
  • Satan now has decent RSpec test suite (more on that in my previous post)
  • Improved HTTP condition with a timeout setting
  • New JVM free heap space condition to monitor those pesky JVM memory leaks
  • Extensible design now allows for new monitoring conditions (rules) to be defined outside of the main Satan source code
As always there are more things to improve and extend but, I'm hoping that my Satan fork will be a decent version that will allow us to keep our services running more reliably. If you have suggestions, or comments feel free to leave feedback.

Testing matters, even with shell scripts

A few months ago, we migrated out production environment at work from Solaris 10 to OpenSolaris. We loved the change because it allowed us to take advantage of the latest inventions in Solaris land. All was good and dandy until one day one of our servers ran out of disk space and died. WTH? We have monitoring scripts that alert us long before we get even close to running out of space, yet no alert was issued this time. While investigating the cause of this incident, we found out that our monitoring scripts that work well on Solaris 10, didn't monitor the disk space correctly on OpenSolaris. When I asked our sysadmins if they didn't have any tests for their scripts that could validate their functionality, they laughed at me.

Fast forward a few months. A few days ago I started looking at Satan, to augment the self healing capabilities of Solaris SMF (think initd or launchd on stereoids). At first sight I loved the simplicity of the solution, but one thing that startled me during the code review was that there were no tests for the code, except for some helper scripts that made manual testing a bit less painful. At the same time, I spotted several bugs that would have resulted in an unwanted behavior.

Satan relies on invoking solaris commands from ruby and parsing the output and acting upon it. Thanks to its no BS nature, ruby makes for an excellent choice when it comes to writing programs that interact with the OS by executing commands. There are several ways to do this, but the most popular looks like this:
ps_output = `ps -o pid,pcpu,rss,args -p #{pid}`

All you need to do is to stick the command into backticks and optionally use #{variable} for variable expansion. To get a hold of the output, just assign the return value to a variable.

Now if you stick a piece of code like this in the middle of the ruby script you get something next to untestable:
module PsParser
 def ps(pid)
   out_raw = `ps -o pid,pcpu,rss,args -p #{pid}`
   out = out_raw.split(/\n/)[1].split(/ /).delete_if {|arg| arg == "" or arg.nil? }
   { :pid=>out[0].to_i,
     :cpu=>out[1].to_i,
     :rss=>out[2].to_i*1024,
     :command=>out[3..out.size].join(' ') }
 end
end

With the code structured (or unstructured) like this, you'll never be able to test if the code can parse the output correctly. However if you extract the command execution into a separate method call:
module PsParser
 def ps(pid)
   out = ps_for_pid(pid).split(/\n/)[1].split(/ /).delete_if {|arg| arg == "" or arg.nil? }
   { :pid=>out[0].to_i,
     :cpu=>out[1].to_i,
     :rss=>out[2].to_i*1024,
     :command=>out[3..out.size].join(' ') }
 end

 private
 def ps_for_pid(pid)
   `ps -o pid,pcpu,rss,args -p #{pid}`
 end
end

You can now open the module and redefine the ps_for_pid in your tests like this:
require 'ps_parser'

PS_OUT = {
 1 => "  PID %CPU    RSS ARGS
12790   2.7 707020 java",
 2 => "  PID %CPU    RSS ARGS
12791  92.7 107020 httpd"
}

module PsParser
 def ps_for_pid(pid)
   PS_OUT[pid]
 end
end

And now you can simply call the pid method and check if the fake output stored in PS_OUT is being parsed correctly. The concept is the same as when mocking webservices or other complex classes, but applied to running system command and programs.

To conclude, what makes you more confident about a software you want to rely on. An empty test folder: Or all green results from a test/spec suite?

Saturday, January 31, 2009

Benchmarking JRuby on Rails

Last night, while working on a project I found a really neat use of Rails Components, but I also noticed that this part of Rails is deprecated, among other reasons because it's slow.

Well, how slow? During my quest to find out, I collected some interesting data, and even more importantly put JRuby and MRI Ruby face to face.

Disclaimer: the benchmarks were not done on a well isolated and specially configured test harness, but I did my best to gather data with informational value. All the components were used with OOB settings.

Setup

  • ruby 1.8.6 (2008-03-03 patchlevel 114) [universal-darwin9.0] + Mongrel Web Server 1.1.4
  • jruby 1.1.6 (ruby 1.8.6 patchlevel 114) (2008-12-17 rev 8388) [x86_64-java] + GlassFish gem version: 0.9.2
  • common backend: mysql5 5.0.75 Source distribution (InnoDB table engine, Rails pool set to 30)


Benchmarks

I used an excellent high quality benchmarking framework Faban for my tests. I was lazy, so I only used fhb (very similar to ab, but without its flaws) to invoke simple benchmarks:
  • simple request benchmark: bin/fhb -r 60/120/5 -c 10 http://localhost:3000/buckets/1
  • component request benchmark: bin/fhb -r 60/120/5 -c 10 http://localhost:3000/bucket1/object1
Both tests were run with JRuby as well as with RMI Ruby and in addition to that I ran the tests with Rails in single-threaded as well as multi-threaded modes. I didn't use mongler clusters or glassfish pooled instances - there was always only one Ruby instance serving all the requests.

Results

ruby 1.8.6 + mongrel
---------------------------------
simple action + single-threaded:
ops/sec: 210.900
% errors: 0.0
avg. time: 0.047
max time: 0.382
90th %: 0.095

simple action + multi-threaded:
ops/sec: 226.483
% errors: 0.0
avg. time: 0.044
max time: 0.180
90th %: 0.095

component action + single-threaded:
ops/sec: 132.950
% errors: 0.0
avg. time: 0.075
max time: 0.214
90th %: 0.130

component action + multi-threaded:
ops/sec: 131.775
% errors: 0.0
avg. time: 0.076
max time: 0.279
90th %: 0.125

jruby 1.2.6 + glassfish gem 0.9.2
----------------------------------
simple action + single-threaded:
ops/sec: 141.417
% errors: 0.0
avg. time: 0.070
max time: 0.259
90th %: 0.115

simple action + multi-threaded:
ops/sec: 247.333
% errors: 0.0
avg. time: 0.040
max time: 0.318
90th %: 0.065

component action + single-threaded:
ops/sec: 107.858
% errors: 0.0
avg. time: 0.092
max time: 0.595
90th %: 0.145

component action + multi-threaded:
ops/sec: 179.042
% errors: 0.0
avg. time: 0.055
max time: 0.357
90th %: 0.085


Platform/ActionSimple+/-Component+/-
Ruby ST 210ops0%132ops0%
Ruby MT226ops7.62%131ops-0.76%
JRuby ST141ops-32.86%107ops-18.94%
JRuby MT247ops17.62%179ops35.61%
(ST - single-threaded; MT - multi-threaded)

Conclusion

From my tests it appears that MRI is faster in single threaded mode, but JRuby makes up for the loss big time in the multi-threaded tests. It's also interesting to see that the multi-threaded mode gives MRI(green threads) a performance boost, but it's nowhere close to the boost that JRuby(native threads) can squeeze out from using multiple threads.

During the tests I noticed that rails was reporting more times spent in the db when using JRuby (2-80ms) compared to MRI (1-3ms). I don't know how reliable this data is but I wonder if this is the bottleneck that is holding JRuby back in the single threaded mode.

Friday, January 18, 2008

MultiSudokuSolver - A fun project I worked on during the Xmas break

While I was visiting my family in Slovakia during the winter 2005/2006 I found a sudoku magazine in my dad's apartment. I like this kind of stuff, so I mediately started solving one puzzle after another, until I got to a "Speciality MultiSudoku" puzzle:



This puzzle is composed of five regular 9x9 puzzles that are interconnected and you can't solve either of them alone, meaning that you need to be solving all five of them at the same time and use intermediate results from one puzzle to get an intermediate result for another puzzle that this puzzle is connected with. It sounded like a good challenge so I started working on it. After two or three nights I realized that I made a mistake!! Ughhh. I erased everything I had solved and started from scratch. After another few nights I found another problem, got turned off by this puzzle and put it away.

I came across the puzzle once again when my wife and I returned to Slovakia this winter. I found the puzzle in the apartment and, being a person who likes challenges and doesn't get turned off my failures for too long, I decided to erase everything and this time be very careful and solve the puzzle once and for all.

After about two evenings I found something I didn't want to see. An error!!! It was hard to believe it, but there it was. I tried to fix it, but if you don't spot an error in a sudoku puzzle early enough, you'll spend more time fixing it than if you started from scratch.

Do you think I felt like starting from scratch? No! But I couldn't let this be. So since I had 10 or so days off during the Christmas break (thanks Sun!!!), I decided to use my brain in a more productive way and to write a small program that would solve the puzzle for me.

Given my interest in Ruby and JRuby and the type of problem I was about to solve, the language choice was an easy one to make.

A few days later I had a script that solved the puzzle for me within 1.07 seconds. Ya! You heard right! Solved not in a few evenings but in just a little more than one second.


Top Left:
718|956|324
452|387|619
936|142|587
-----------
341|528|796
695|713|842
287|694|153
-----------
523|461|978
869|275|431
174|839|265

Top Right:
635|428|971
214|679|853
897|531|426
-----------
976|382|514
152|764|389
348|915|267
-----------
463|857|192
529|146|738
781|293|645

Bottom Left:
728|154|396
194|836|752
653|279|184
-----------
519|368|427
346|725|918
872|491|563
-----------
231|987|645
467|512|839
985|643|271

Bottom Right:
215|849|637
348|627|591
976|153|824
-----------
869|431|752
154|972|368
732|568|149
-----------
521|796|483
483|215|976
697|384|215

Center:
978|215|463
431|678|529
265|439|781
-----------
813|754|692
649|382|157
527|961|834
-----------
396|847|215
752|196|348
184|523|976


Lessons learned: "automate automate automate!" and "Don't work hard, but work smart!" :)

The algorithm is very simple. The model is based on the simplest unit - a cell which is part of a row, column and a square. Each puzzle consists of 9 columns, 9 rows and 9 squares. Cells are flexible enough to be part of more than one square, row or a column at a time, so I can have cells that are present in more than one puzzle at a time. With a flexible model like this, all that the program needs to do is to use some basic rules to eliminate candidates and determine cell values. This type of solver is often referred to as human-style solver, because it uses the same techniques used by humans.

If anyone is interested in having a look at the source code it can be downloaded from here: multi_sudoku_solver.rb. A word of warning - the code is not cleaned up nor documented. This was just a fun project that I worked on, and the fact that I achieved what I set out to achieve was a good enough milestone to call this project done.

After this experience I don't feel like solving sudoku puzzles any more. They are just a repetitious problems that are well suited for automated solution. Most importantly - I actually had more fun writing the program than I had with solving the puzzle :).

Wednesday, September 12, 2007

NetBeans IDE 6.0 Beta 1 is Out!

While searching for a nightly build of NetBeans I noticed that NB6 Beta 1 is out. Great!

I'm looking forward to using the new version. After spending the last couple of weeks (or months?) developing in NB6M10, I can't wait to see what this release brings.

The first thing that stuck me though is the lack of "drag&drop" installation on MacOS which was replaced with "pkg-style" installer. I liked the drag&drop installation better - it's more mac-ish. I hope the surprises that will follow will only be pleasant.

Thanks, NB guys!

Download link: http://bits.netbeans.org/download/6.0/milestones/latest/


Tuesday, April 17, 2007

script type="text/ruby" Impossible? Ha!

Check out Dion Almaer's blog with a description of the prototype and a demo page.

Saturday, September 02, 2006

Solaris + Lighttpd + FastCGI + SSL HowTo

Update (07-05-18): You might also want to check out CoolStack, an optimized open source software stack for the Sun Solaris Operating System.

I hadn't heard about Lighttpd (Lighty) web server before I started to be interested in Ruby on Rails (RoR). Lighttpd is fast, scalable, secure, flexible and lightweight webserver which in RoR community is preferred production web server. One of the main advantages of Lighttpd compared to Apache, for example, is the built in support for FastCGI and very easy yet flexible configuration.

When I was preparing the production environment for one of the RoR applications I was working on lately, I found that even though there is quite a lot of HowTos for installing lighttpd on linux and MacOS, there is not a single one that I could find that would describe this procedure on Solaris / OpenSolaris OS.

While setting up the production environment I got stuck on some steps, and I hope that this HowTo will help anyone trying to deploy his RoR application on Solaris and to avoid problems I had.

Targeted configuration: Solaris 10 + Lighttpd + FastCGI + SSL

1. Prerequisites I started out with the core installation of Solaris 10, this means that I needed to add some packages. Before you start, make sure you have these Solaris packages installed:
  • SUNWhea >>> needed by lighttpd
  • SUNWtoo >>
  • SUNWlibmr >>> needed by fastcgi
  • SUNWlibm >>
If you plan to use only sun compiler to compile everything you will also need SUNWbtool and SUNWsprot packages.

From blastwave.org via pkg-get get:
  • openssl >>> needed by lighttpd for ssl
And one more thing, you must get Sun C Compiler, otherwise you won't be able to install ruby-fastcgi bindings (explained later). Sun C Compiler is part of Sun Studio, which you can download for free here (a free SDN account is needed). There are two ways described on the website how one can get the compiler:
  • option 1) 202MB - works
  • option 2) 365MB - haven't tried
Once downloaded, follow the "Setup" instructions on the website above.

Now we should be ready to start.

2. Lighttpd
At first we should install Lighttpd which can be downloaded from here. (I used gcc to compile this one)
./configure --with-openssl=/opt/csw --with-ldap --with-bzip2 --with-zlib
The output from the configure command should end like this:
Features:

enabled:
auth-crypt
auth-ldap
compress-bzip2
compress-deflate
compress-gzip
large-files
network-ipv6
network-openssl
regex-conditionals
...
...


Make sure the "network-openssl" is in the list of enabled features, if it is not there, check the output from configure tests and if you see that openssl was found and right below it is:
checking for BIO_f_base64 in -lcrypto... no
check your configure arguments and make sure that --with-openssl=/path/to/openssl is set correctly (if installed via pkg-get it will resided in /opt/csw). If this is not done properly everything will compile correctly but the --with-openssl option will be ignored silently!

Let's finish up the installation:

make
make install
Test if everything went well with -v switch
$ ligttpd -v
lighttpd-1.4.11 (ssl) - a light and fast webserver
Build-Date: Aug 30 2006 15:19:28
If there is no "(ssl)" after the version, something is wrong, check if you set --with-openssl=/path correctly.
3. FastCGI
Now we need to install FastCGI which can be downloaded from here:
./configure   --prefix=/usr/local
make
make install

4. ruby-fcgi

Next step is to install ruby-fcgi bindings. This is where I got really stuck. To be able to install this, you need the Sun C Compiler mentioned earlier. If you installed ruby and ruby gems via pkg-get as I did, it comes preconfigured to use Sun compiler by default, even if one is not installed. And if you don't have this compiler you will see all sorts of weird errors when installing gems with native extensions.

Once you have the compiler, installation is trivial. If you are building it manually, download sources here:
ruby install.rb config -- --with-fcgi-dir=/usr/local
ruby install.rb setup
ruby install.rb install
or when using ruby gems:
gem install fcgi -- --with-fcgi-dir=/usr/local
Note the double "--", that is not a typo, it's simply the way how to send parameters to extconf.rb

If you see an error like this:
install.rb: entering config phase...
---> lib
<--- lib ---> ext
---> ext/fcgi
/opt/csw/bin/ruby /root/ruby-fcgi-0.8.6/ext/fcgi/extconf.rb
checking for fcgiapp.h... yes
checking for FCGX_Accept() in -lfcgi... no
<--- ext/fcgi <--- ext
Note this part:
checking for fcgiapp.h... yes
checking for FCGX_Accept() in -lfcgi... no
It most likely means that you are using GCC and not Sun C Compiler, check if cc in your path is pointing to Sun C Compiler, you might also check if the environmental variable CC is not set to gcc. Let's test if fcgi and ruby-fcgi bindings were correctly installed by these commands in irb:
irb(main):001:0> require 'fcgi.so'
=> true
irb(main):001:0> require 'fcgi'
=> true
If both "require" calls return true, you have successfully installed FastCGI and ruby is able to invoke it. 5. SSL Certificate
If you don't have your server certificate yet, you can create a self signed one like this:
openssl req -new -x509 -keyout server.pem -nodes -out server.pem -days 1000
6. Configuring Lighty
Create lighttpd.conf
server.port = 443
server.bind = "0.0.0.0"
server.modules = ( "mod_rewrite", "mod_fastcgi", "mod_accesslog" )
url.rewrite = ( "^/$" => " index.html", "^([^.]+)$" => "$1.html" )
server.error-handler-404 = "/dispatch.fcgi"
server.document-root = "/path_to_your_app/public"
server.errorlog = "/path_to_your_app/log/server.log"
accesslog.filename = "/path_to_your_app/log/access_log"
ssl.engine = "enable"
ssl.pemfile = "/path_to_your_pem_file/server.pem"

fastcgi.server = (".fcgi" =>
( "localhost" =>
(
"min-procs" => 10,
"max-procs" => 10,
"socket"    => "/tmp/yourapp.fcgi.socket",
"bin-path"  => "/path_to_your_app/public/dispatch.fcgi",
"bin-environment" => ( "RAILS_ENV" => "production" )
))
)

mimetype.assign = (
".css"        =>  "text/css",
".gif"        =>  "image/gif",
".html"       =>  "text/html",
".jpeg"       =>  "image/jpeg",
".jpg"        =>  "image/jpeg",
".js"         =>  "text/javascript",
".pdf"        =>  "application/pdf",
".png"        =>  "image/png",
".txt"        =>  "text/plain",
)

7. Ta-da!
Let's start the server

lighttpd -f lighttpd.conf

I hope that these instructions helped you to get Lighty on Solaris.

If you have other recommendations or have problems with the installation, feel free to leave a comment.

Enjoy..

PS: A big thanks to my friend J who helped me figure out that you need Sun C compiler for ruby-fcgi to compile.

UPDATE: added reference to SUNWbtool and SUNWsprot packages in case you want to compile everything with Sun's compiler.