apache – Grey Panthers Savannah

Proxying pypi / npm / etc for fun and profit!

gpanther — Wed, 05 Feb 2014 15:26:00 +0000

Package managers for source code (like pypi, npm, nuget, maven, gems, etc) are great! We should all use them. But what happens if the central repository goes down? Suddenly all your continious builds / deploys fail for no reason. Here is a way to prevent that:

Configure Apache as a caching proxy fronting these services. This means that you can tolerate downtime for the services and you have quicker builds (since you don’t need to contact remote servers). It also has a security benefit (you can firewall of your build server such that it can’t make any outgoing connections) and it’s nice to avoid consuming the bandwidth of those registries (especially since they are provided for free).

Without further ado, here are the config bits for Apache 2.4

/etc/apache2/force_cache_proxy.conf – the general configuration file for caching:


# Security - we don't want to act as a proxy to arbitrary hosts
ProxyRequests Off
SSLProxyEngine On
 
# Cache files to disk
CacheEnable disk /
CacheMinFileSize 0
# cache up to 100MB
CacheMaxFileSize 104857600
# Expire cache in one day
CacheMinExpire 86400
CacheDefaultExpire 86400
# Try really hard to cache requests
CacheIgnoreCacheControl On
CacheIgnoreNoLastMod On
CacheStoreExpired On
CacheStoreNoStore On
CacheStorePrivate On
# If remote can't be reached, reply from cache
CacheStaleOnError On
# Provide information about cache in reply headers
CacheDetailHeader On
CacheHeader On
 
# Only allow requests from localhost

        Order Deny,Allow
        Deny from all
        Allow from 127.0.0.1

 

        # Don't send X-Forwarded-* headers - don't leak local hosts
        # And some servers get confused by them
        ProxyAddHeaders Off


# Small timeout to avoid blocking the build to long
ProxyTimeout    5

Now with this prepared we can create the individual configurations for the services we wish to proxy:

For pypi:


# pypi mirror
Listen 127.1.1.1:8001


        Include force_cache_proxy.conf

        ProxyPass         /  https://pypi.python.org/ status=I
        ProxyPassReverse  /  https://pypi.python.org/

For npm:


# npm mirror
Listen 127.1.1.1:8000


        Include force_cache_proxy.conf

        ProxyPass         /  https://registry.npmjs.org/ status=I
        ProxyPassReverse  /  https://registry.npmjs.org/

After configuration you need to enable the site (a2ensite) as well as needed modules (a2enmod – ssl, cache, disk_cache, proxy, proxy_http).

Finally you need to configure your package manager clients to use these endpoints:

For npm you need to edit ~/.npmrc (or use npm config set) and add registry = http://127.1.1.1:8000/

For Python / pip you need to edit ~/.pip/pip.conf (I recommend having download-cache as per Stavros’s post):


[global]
download-cache = ~/.cache/pip/
index-url = http://127.1.1.1:8001/simple/

If you use setuptools (why!? just stop and use pip :-)), your config is ~/.pydistutils.cfg:


[easy_install]
index_url = http://127.1.1.1:8001/simple/

Also, if you use buildout, the needed config adjustment in buildout.cfg is:


[buildout]
index = http://127.1.1.1:8001/simple/

This is mostly it. If your client is using any kind of local caching, you should clear your cache and reinstall all the dependencies to ensure that Apache has them cached on the disk. There are also dedicated solutions for caching the repositories (for example devpi for python and npm-lazy-mirror for node), however I found them somewhat unreliable and with Apache you have a uniform solution which already has things like startup / supervision implemented and which is familiar to most sysadmins.

Virtually Hosted SSL – almost there

gpanther — Fri, 07 Aug 2009 14:00:00 +0000

Virtual hosting (hosting multiple sites on the same IP address) became possible with HTTP/1.1 because it declares the “Host” header, which specifies which one of the (possibly) multiple sites hosted on the same IP address you would like to reach (a small side-effect is that when you use the IP address of a site, you might get a different site, since the web-server doesn’t know which one to pick).

However this wasn’t possible with SSL, because the certificate was sent before the headers and a certificate is specific for a site (at least the run-of-the mill ones), and the webserver didn’t know which certificate to pick. When I’ve heard on the SANS Daily Stormcast that the newest version of Apache included a way to do this, I was enthusiastic and intrigued at the same time, so I went looking and found the following thing:

It is done by doing the initial communication in plaintext and then “upgrading” to TLS. I wonder just how much is in plaintext? (see the What’s new document – the mod_ssl section specifically)
The official RFC for this is RFC 2817. The RFC specifies both methods for upgrading – before and after the actual request – so the devil will be in the ~~details~~ implementation
There is no browser support for this as of this moment, so it is pretty much useless (until IE + IIS starts supporting it is pretty much a cool option). But at least we have a reference implementation

Bonus article: The First Few Milliseconds of an HTTPS Connection

Picture taken from AMagill’s photostream with permission.

Disabling mod_deflate for certain files

gpanther — Sat, 14 Jun 2008 08:45:00 +0000

I received a question regarding my compressed HTTP post. It goes something like this:

I want to use a PHP script as a kind of transparent proxy (that is, when I request a file, it downloads it from an other URL and serves it up to me), but mod_deflate keeps eating my Content-Length header.

My advice is of course to selectively disable mod_deflate for the given PHP script. This would mean putting something like the following in your .htaccess file:

SetEnvIfNoCase Request_URI get_file.php$ no-gzip dont-vary

Where get_file.php is the name of the script. Some things to remember here:

the dot character needs to be escaped, because it is a meta-character for regular expressions (meaning any character), so you must prefix it with a backslash to mean the dot character
Request_URI represents the URL up to the filename, but not including any query parameters. For example if you have the URL http://example.com/get_file.php?file=talk.mp3, Reques_URI will contain http://example.com/get_file.php. However be aware that Apache contains a nice trick which can be used to generate nice URL’s but which can affect this: if the given file/directory is not found, it tries to go up the path to find a file. For example, I could have written http://example.com/get_file.php/talk.mp3, and if the script contains the logic to serve up talk.mp3, we can have a nice URL. The side effect is however that the Request_URI is now http://example.com/get_file.php/talk.mp3 and the regular expression must be adjusted accordingly (into something like .mp3$)

A final word of warning: if your host allows you to open files via URLs (with readfile for example), run, run far away, because this is a very insecure configuration for PHP and chances are that the server (especially if it shared between multiple users) will be powned quickly.

Serving up authenticated static files

gpanther — Wed, 25 Jul 2007 13:14:00 +0000

Two components which are usually found in web applications are authentication and static files. In this post I will try to show how these two interact. The post will refer to PHP and Apache specifically, since these are the platforms I’m familiar with, however the ideas are generally applicable.

The advantages of static files are: cacheability out of the box (with a dynamically generated result this is very hard to get right) and less overhead when serving up (even more so if something specialized is used like tiny httpd). However you might feel the need to apply authentication to the static files also (that is only users with proper privileges can have access to them). Of course you want to retain the advantages of caching and low overhead as much as possible.

One option (and probably the one with less overhead and ultimately simpler to implement) is to use mod_auth_mysql on the directory hosting the static files and generate a random long (!) username and password for each user session, insert them to the authentication table, and modify the links to the resources to include these credentials. For example, a link in this case might look like this:


http://w7PLTHUDxK:xLaLGkku8O@example.com/static/image.jpg

The advantage of this approach is that we get all those wonderful things like content type or cache headers (or even zlib compression if we configured it) for free. The main pitfall is the choosing of the place where to do the cleanup (remove this temporary user from the table). The session destroy handler is not good enough since it won’t be called if the user doesn’t properly log-out. One solution would be to do repeated “garbage collections” on the tables (in this case care must be taken to set this garbage collection interval the same or larger as the session timeout interval, since otherwise the access might “go away” from under the users feet while they are still logged on). An other option would be to add a user id column to the table and use the “REPLACE INTO” SQL command (which is AFAIK unique to MySQL, not standard) to ensure that the temporary user table has at most as many users as the main user table.

A quick note: all the above can of course be done with static authentication also (that is a hardcoded username and password in the .htaccess file). This is a very simple solution (an easier to apply, since mod_auth_mysql might not be installed/enabled on all the webservers, but mod_auth is on most of them), but is insecure, it can not be used to separate users (ie. to have files which only certain users can access) and because it does not expire automatically, one link is enough for search engines / other crawlers to find it.

This is all well and good, but what if you don’t have control over the server configuration? While I strongly recommend against using shared PHP hosting, some people might be in this situation. The solution is to recreate (at least some of) Apache’s functionality.

The first step is to put the actual static files outside your web root (preferably) or to deny access to the folder where the files are placed with .htaccess (less preferable). If the files would to reside in a public folder, this system would provide obfuscation at best and is equivalent with a 302 or 301 redirect at worst.

The next step is to decide on the method of referencing your static file. You have three options:

Put the file name directly in as a GET parameter (for example get_static.php?fn=image.jpg)
Use mod_rewrite to simulate a directory structure (static/image.jpg which will be rewritten by a rule into the form showed at the previous point)
Use the fact that Apache walks up the path until if finds the first file / directory, so you can do something like get_static.php/image.jpg

The second and third options are the ones I recommend. The reason behind this is that it gives the browser the illusion that you are dealing with different files which can help it do proper caching without relying on the ETag mechanism discussed later.

I would like to pause for a moment and remind everybody that security is a big concern in the web world, since you are practically putting your code out for everybody, meaning that anybody can come and try to break it. One particular concern with these types of scripts (which read and echo back to you arbitrary files) is path traversal. This attack is easy to demonstrate with the following example:

Let’s say that the script works by taking the filename given, concatenating it with the directory (which for this example is /home/abcd/static/) and echoing back the given file. Now if I supply in the filename something like ../../../etc/password, the resulting path will be /home/abcd/static/../../../etc/password, meaning that I can read any file the web server has access to. And before the Windows guys start jumping up and down saying that this is a *nix problem, the example is very easy to translate to Windows.

Now your first reaction would be to disallow (blacklist) the usage of the . character in the path, but don’t go this way. Rather, define the rule which your files will follow and verify that the supplied parameters follow that rule. For example the filenames will contain one or more alphanumeric, underscore or dash character and will have a png, jpg, css or js extension. This translates into the regular expression ^[a-z0-9_-]+.(png|jpg|css|js)$. Be sure to include the start and end anchors (otherwise it only has to contain a substring matching the rule, the whole string doesn’t have to match the rule) and watch out for other regular expression gotcha’s. As an added security measure use the realpath function (which resolves things like symbolic links or .. sequences) before performing any further verification.

Now we have the file, and need to generate the headers. The important headers are:

Content-Size – this is very straight forward, it is the size of the file. While theoretically the HTTP protocol supports other measurement units than bytes, practically bytes are always used
Content-Type – this can be obtained using the mime_content_type function, however be aware that sometimes it fails to identify the correct type and action must be taken to correct it (for example a CSS file might be identified as text/plain, but it must be served up text/stylesheet to work in all the browsers)
Cache headers – depending on how long you think the clients / intermediate proxies should cache your content, these must be set accordingly.
ETag – this is a header which helps the browser distinguish between multiple content sources from the same URL. For example if the link to an image is http://example.com/image.php?id=1 and to the second one http://example.com/image.php?id=2, without an ETag these will represent the same cache entry, meaning that you can have situations where the second image is displayed instead of the first or vice-versa, because the browser operates under the assumption that they are the same and pulls one out of the cache, when instead the should be used. ETag’s can be an arbitrary alphanumeric string, so for example you could use the MD5 hash of the file (and no, there is no information disclosure vulnerability here which would warrant the usage of salted hashes for example because the user is already getting the file! S/he can recalculate the MD5 of it is s/he wishes!)
Content-Encoding – if you wish and it makes sense to compress your content, be sure to output the proper Content-Encoding header. Also make sure to adjust the Content-Size header, otherwise you could have some serious breakage.
Accept-Range – if you wish to enable resume support for the file (that is for the client to be able to start downloading from the middle of file for example), you need to provide (and handle, as described below) this header.

The script also needs to take into account the request headers:

If-Modified-Since – the browser is checking the validity of the cached object, so this should return a 304 header if the content didn’t change and provide no content body.
Accept-Encoding – this should be checked before providing compressed (gzipped) content. Also, beware that some older browser falsely claim to support gzipped content.
Range – if you specified that you handle ranges, you must look out for this header and send only which was requested. This of course can further be complicated with compression, in which case you need to take the specified chunk, compress it, make sure to output the correct Content-Length, and the send it
ETag – if you supplied an ETag when serving up the content, it will (should) be returned to you, when doing cache checking

After I’ve written this all up, I’ve found that there is a PHP extension which provides most of the functions for this: HTTP. Use it. It’s much easier than rolling your own and you have less chance to miss some corner cases (like the fact that as per HTTP/1.1 request headers are case-insensitive, meaning that If-Modified-Since and iF-mOdIfIeD-sInCe are the same thing and should be treated the same).

PS. I didn’t mention, but mechanism can also be used to hide the real file names. This might be needed when for whatever reason you don’t want to divulge it (because file names can provide additional information which you might not want your users to have). This can be achieved by using an additional step and giving the user a token which is translated in a file-name at the server. These tokens can be:

Generated from the file name
Arbitrarily chosen
Created using a random process
Created using a deterministic process

For maximum security I recommend to go with arbitrarily chosen random tokens for each file (otherwise an attacker might break the security by trying other IDs – for example if the IDs are numeric, s/he can try other numbers – or by guessing the file names and applying the generator function on it and checking the existence of the file).

Update: I’ve looked at using mod_xsendfile with PHP, however it seems to be a dormant project (the latest posted version is for Apache 2.0, nothing there for 2.2 :-(). An other option which may be worth exploring is the following (if you are using PHP as a loadable module rather than CGI): use virtual to redirect the request to the static files. You even find a good example in the comments.

Compressed HTTP

gpanther — Tue, 24 Jul 2007 06:26:00 +0000

The HTTP standard allows for the delivered content to be compressed (to be more precise it allows for it to be encoded in different ways, one of the encoding being compression). Under Apache there are two simple ways to do this:

Using the mod_deflate Apache module
If you have mod_php activated, setting the zlib.output_compression variable in the php.ini file to 1

I won’t spend much detail on the configuration options, however I want to describe one little quirk, which is logical in hindsight but I struggled with it a little: you can loose the Content-Length header on files which don’t fit in your compression buffer from the start. This is course logical because:

Headers must be sent before the content
If the server must do several read from file – compress – output cycles to compress the whole file, it can’t possibly predict accurately (to the byte level) how large / small the compressed version of the file will be. Getting it wrong is risky because client software might rely on this and could lock up in a wait cycle or display incomplete data.

Update: if you want to selectively disable mod_deflate for certain files because of this (or other reasons), check out this post about it.

You can observe this effect when downloading (large) files especially, since the absence of a Content-Length header means that the client can’t show a progress bar indicating the percentage you downloaded (this is what I observed at first and then went on to investigate the causes).

One more remark regarding the getting the Content-Length wrong part. One (fairly) common case where this can be an issue is with PHP scripts which output Content-Length headers and the compression is done via zlib.output_compression. The problem is that mod_php doesn’t remove the Content-Length header, which almost certainly has a larger value than the size of the compressed data. This causes the hanging, incomplete downloads symptom. To be even more confusing:

When using HTTP/1.1 and keep-alive this problem manifest itself.
When keep-alive is inactive, the problem disapears (sort-of). What actually happens is that the Content-Length is still wrong, but the actual connection is reset by the server after sending all the data (since no keep-alive = one request per connection). This usually works with clients (both curl and Firefox interpreted it as download complete), but other client software might chose to interpret the condition as failed/corrupted download.

The possible solutions would be:

Perform the compression inside your PHP script (possibly caching the compressed version on-disk if it makes sense) and output the correct (ie. the one corresponding to the compressed data) Content-Length header. This is more work, but you will retain the progress-bar when downloading files
Use mod_deflate to perform the compression, which removes the Content-Length header if it can’t compress the whole data at once (this is not specified in the documenation, but – the beauty of open source – you can take a peak at the source code – the ultimate documentation. Just search for apr_table_unset(r->headers_out, "Content-Length"); ). This will kill the progress bar (for the reasons discussed before). To get back the progress bar, you could increase the DeflateBufferSize configuration parameter (which is by default set to 8k) to be larger than the largest file you wish to serve, or deactivate compression for the files which will be downloaded (rather than displayed).

A final remark: the HTTP protocol also supports the uploaded data to be compressed (this can useful for example when uploading larger files), as shown by the following blurb in the mod_deflate documentation:

The mod_deflate module also provides a filter for decompressing a gzip compressed request body. In order to activate this feature you have to insert the DEFLATE filter into the input filter chain using SetInputFilter or AddInputFilter.

…

Now if a request contains a Content-Encoding: gzip header, the body will be automatically decompressed. Few browsers have the ability to gzip request bodies. However, some special applications actually do support request compression, for instance some WebDAV clients.

When I saw this, I was ecstatic, since I was searching for something like this for some of my projects. If this works, it means that I can:

Use a protocol (HTTP) for file upload which has libraries in many programming languates
Use a protocol which needs only one port (as opposed to FTP) and can be made secure if necessary (with SSL/TLS)
Use compression, just like rsync can (and, although it can’t create binary diffs on its own, when the uploaded files are not used for synchronization, this is not an issue)

Obviously there must be some drawbacks

It seems to be an Apache-only feature (I didn’t find anything which could indicate support in IIS or even some clear RFC to document how this should work)
It can’t be negociated! This is huge drawback. When the server side compression is used, the process is the following:
- The client sends an Accept-Encoding: gzip header along with the request
- The server checks for this header and if present, compresses the content (minus the time, when the client doesn’t really support the compression)
However, the fact that the client is the first to send, means that there is no way for the server to signal its (in)capability to accept gzip encoding. Even the fact that it’s Apache and previously served up compressed content doesn’t guarantee the fact that it can handle it, since the input and output filters are two separate things. So the options available are:
- Use gzip (eventually preceding it with a heuristic detection like the one described before – is it Apache and does it serve up gzip compressed content), and if the server returns an error code, try without gzip
- The option which I will take – use this only with your own private servers where you configured them properly.

So how do you do it? Here is a blurb, again from the mod_deflate source code: only work on main request/no subrequests. This means that the whole body of the request must be gzip compressed if we chose to use this, it is not possible to compress only the part containing the file for example in a multipart request. Below you can see some perl code I hacked together to use this feature:


#!/usr/bin/perl
use strict;
use warnings;
use File::Temp qw/tempfile/;
use Compress::Zlib;
use HTTP::Request::Common;
use LWP;

$HTTP::Request::Common::DYNAMIC_FILE_UPLOAD = 1;

my $request = POST 'http://10.0.0.12/test.php',
    [
 'test1' => 'test1',
 'test2' => 'test2',
 'a_file' => ['somedata.dat']
    ],
    'Content_Type' => 'form-data',
    'Content_Encoding' => 'gzip';

sub transform_upload {
    my $request = shift;
    
    my ($fh, $filename) = tempfile();
    my $cs = gzopen($fh, "wb");
    
    my $request_c = $request->content();
    while (my $data = $request_c->()) { $cs->gzwrite($data); }
    $cs->gzclose();
    close $fh;
    
    open $fh, $filename; binmode $fh;
    $request->content(sub {
 my $buffer;
 if (0 < read $fh, $buffer, 4096) {
     return $buffer;
 } else {
     close $fh;
     return undef;
 }
    });
    $request->content_length(-s $filename);
}

transform_upload($request);

my $browser = LWP::UserAgent->new();
my $response = $browser->request($request);

print $response->content();

This code is optimized for big files, meaning that it won’t read the whole request in the memory at one time. Hope somebody finds it useful.

Implementing Web Services with Open Source Software

gpanther — Fri, 12 Jan 2007 10:51:00 +0000

Today many services are available (both internal and external to a company) as Web Services, more specifically as SOAP. Companies like Microsoft, IBM or Sun have heavily invested in this field and made many of their products compatible with it (as a client and/or as a server). In this article I will study the different possibilities of implementing a SOAP server with Open Source solutions.

The specific requirements are:

It should use the HTTP transport layer (the most commonly used in SOAP)
It should either have an embedded HTTP server or be usable with Apache
It should be platform independent

But let me step back for a moment and ask: why would you want to go this route? Why not use the product of well known companies which offer integration with developer tools and in some cases are available for free? While those products are certainly more mature and easier to use, when going the OSS route you have:

more flexibility (because you have the full source code available – and even if you don’t want to actively participate in the development process, it helps a lot for debugging),
more deployment options (just think how many webhosts offer Apache / MySQL / PHP / Perl as opposed to IIS, WebSphere or Java)
when extending the possible interfacing options of a product written for this platform (adding a SOAP API for a wiki for example) it is easier to use something like this rather than requiring the installation of a whole new framework
and finally the issue of the cost: while not a big problem because (a) academic institutions already have or can get free licenses for much of the products and (b) the companies themselves distribute their products (or at least some versions) free, it may still be an argument.

By doing research following these guidelines the following three possibilities emerged:

The SOAP::Lite library for Perl
Advantages:
- Very easy to use
- Available across platforms (both from the CPAN and PPM repositories)
- Has an extensive “cookbook” (set of short HOWTOs): http://cookbook.soaplite.com/
- Runs in Apache (either as CGI or with mod_perl – in the later case you may need to replace SOAP::Transport::HTTP::CGI with SOAP::Transport::HTTP::Apache in the examples)
- Has tracing functionality (to enable it at the client side, use the following way to include the library: use SOAP::Lite +trace => 'all'; and then redirect the stderr output (where the tracing info is dumped) to a file like this soap-clien.pl 2>debug-info.txt
Disadvantages: Does not support automatic generation of the WSDL file Sometimes it insists on sending the variables as a certain type (integer) even though I would like to send them as string


The SOAP library included with PHP
Advantages:

Usually readily included with PHP
Cross platform
Under active development

Disadvantages:

Does not support automatic generation of the WSDL file
There are few tutorials for it
Very basic debugging support. PHP has a weak debugging support out of the box, but the fact that the majority of the functions are implemented in a binary library makes things even worse (because you would need a hybrid PHP/binary debugger for proper debugging)


The NuSOAP project for PHP
Advantages:

Cross platform (written in PHP)
Automatic WSDL generation
When accessed with a browser, it presents a friendly HTML interface which lists all the published methods / objects and their parameters
Distributed as PHP source files which can be easily installed to most hosts (the user doesn't need to ask the server administrator to load extra binary modules)
While PHP has no integrated debugging support, the library itself tries to output debugging information. To activate this mode set the debug variable to 1 (like this: $debug = 1;). The debugging information will be appended to the reponse XML as a comment.

Disadvantages:

Not very well maintained (probably because of the existence of the "official" PHP SOAP module)
Few examples and many of the examples don't work. For working examples go to the authors webiste
Conflicts with the official PHP SOAP module. If you get an error saying something along the lines of can not redefine class soap_client, you have to unload the PHP SOAP module. An other option would be to go through the source and rename this class.
No real debugging support (because PHP doesn't have one).


The final choice was NuSOAP. The deciding factor was the (semi)automatic - because you have to give it hints about the parameter types - WSDL generation. This is essential if you wish to make your service available to the largest possible audience, especially those using statically typed languages. A perfect example is the .NET / Visual Studio environment, which needs the WSDL file to automatically generate the stub for the web service.
A little side note: if your web service is accessed through SSL / HTTPS and the certificate authority who

signed the certificate of the server is not trusted (ie. it's not Verisign), you get some warnings while

generating the stub in Visual Studio and the final program will halt with an exception saying something like Could not establish a trusted connection over this SSL/TLS connection. The most common cause of this is the fact that the developer uses a self-signed certificate for the server. As far as I know there is no way to stop this from happening from inside the framework. However, because the framework shares its network access architecture with Internet Explorer, you can correct it from there. First you will need the certificate from the server (a .crt file, server.crt). Then go to Tools->Internet Options and select the Content tab. Click on Certificates, go to the Trusted Root Certification Authorities and select Import. Point it to the server certificate and answer affirmatively to the confirmation dialog. From now on that certificate will also be considered a trusted root certificate, you won't get warnings while browsing sites with it (and those sites might even have elevated privileges - depending on your Internet Explorer configuration), but most importantly your .NET client side code will work just fine.
The test project was to implement a web service which simulated some simple state machine(s). The project was implemented on the following platform:

Apache 2.2.2
PHP 5.1.4
MySQL 5.1
NuSOAP 0.7.1

The test was done on a Windows XP Pro machine using XAMPP to quickly install all the required components, however there is nothing platform specific in the code or the components, so it should be easy to replicate it on a different platform (Linux for example). The PHP code for the server side can be seen in Appendix A and an example for a state machine definition file can be found in Appendix B. An example client program written in C# can be seen in Appendix C.
The structure of a state machine definition file is as follows:

The root node is stateMachine. It has one mandatory parameter: initialState which specifies the initial state it is in
In the messages section it defines all possible messages (identified by name) which can be sent to this machine. This enumeration is needed to be able to check the validity of the message names provided later to guard against miss-typing.
The list of states identified by name. The states can be of two type: auto and message. Those of type auto automatically advance from the current state to the next state depending on the contained action elements. Those of type message wait for a message to advance.
The action elements contain the following attributes:

nextState - mandatory, the name of the text state if this action is chosen
waitBefore and waitAfter - optional, the amount of period to pause before and after executing this actions, in milliseconds. If omitted, zero is assumed. It must be a positive integer.
probability - a number greater than 0 but less or equal to 1.0. Determines the probability of this action being chosen. 0 means never and 1 means always. The sum of probabilities for a group (state element for auto states and message element for message states) must be 1.0. If some probabilities are omitted, the remaining probability is distributed amongst them (so if we have 5 action items and the first has a probability of 0.1, the second one of 0.3 and the rest are omitted, the last three will each have a probability of 0.2)



The exported functions by the server are:


postMessage(stateMachineName: string, message: string): string
posts a message to the state machine identified by the stateMachineName. The definition for this state machine must be stored on the server in the file .xml. It is assumed that there is only one instance "running" of each server. This is guaranteed by the fact that the state of them is stored in a database table protected by write locks during the transitions. The method is synchronous and returns the name of the current state resulted from processing the message and any other automatic steps (states of type auto) which followed. One thing to keep in mind is that if you specify a state of

 type auto for the initial state, this will also be evaluated at the first message posted. On error it returns an empty string and you can use the getErrorMessage function to get the error message.


getMachineState(stateMachineName: string): string
Gets the current state of the given state machine. It's asynchronous (with respect to the state machine, not the caller). If an error has occurred in the state machine it returns the empty string. You can use the getErrorMessage function to get the error message.


getErrorMessage(stateMachineName: string): string
Returns the error message for the given state machine or empty if no error exists


resetMachineState(stateMachineName: string): void
Resets the machine to its initial state (as specified by the initialState attribute of the correspoding definition file)


resetAllMachines: void
Resets all the state machines to their initial state


Appendix A - PHP Server side code
To install it you would need the following items:

The NuSOAP library in the lib subdirectory (or anywhere else, just be sure to adjust the include directive accordingly)
The PearDB package for database access
Adjust the $data_directory variable so that it points to directory where the XML files describing the state machines are located. Important: include the trailing slash or backslash depending on the platform
Create two database tables to store the current state of the automatas (the second table is for locking purposes only, because MySQL doesn't support writing while in a read lock). The SQL statements to create these tables are (you might need to tweak these a little bit to get them to work if you are using something other than MySQL or a different version of it):


CREATE TABLE `state_machines`.`state_machines` (
`machine_name` VARCHAR(255) NOT NULL DEFAULT '',
`machine_state` VARCHAR(255) NOT NULL DEFAULT '',
`error_message` VARCHAR(255) NOT NULL DEFAULT '',
PRIMARY KEY(`machine_name`)
)
ENGINE = MEMORY;

CREATE TABLE `state_machines`.`lock_table` (
 `dummy_column` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
 PRIMARY KEY(`dummy_column`)
)
ENGINE = MEMORY;


 
Adjust the connection string accordingly in the DB::connect call


require_once 'lib/nusoap.php';
  require_once 'DB.php';
  
  $server = new soap_server();
  $server->configureWSDL('statemachine', 'urn:statemachine');
  
  $data_directory = "C:\xampp\htdocs\webservice\data\";  
  $db_connection = DB::connect("mysql://state_machine:password@localhost/state_machines");
  if (DB::isError($db_connection)) 
    stopWithErrorMessage("failed to connect to the database - " . $db_connection->getMessage());
  
  // Register the methods to expose
  $server->register('postMessage',    
      array('stateMachineName' => 'xsd:string', 'message' => 'xsd:string'),
      array('return' => 'xsd:string'),
      'urn:statemachine',             
      'urn:statemachine#postMessage', 
      'rpc',                          
      'encoded',                      
      'send a message to a given state machine. returns the new state'      
  );
  $server->register('getMachineState',    
      array('stateMachineName' => 'xsd:string'),
      array('return' => 'xsd:string'),
      'urn:statemachine',             
      'urn:statemachine#getMachineState', 
      'rpc',                          
      'encoded',                      
      'returns the current state of the automaton'      
  );
  $server->register('getErrorMessage',    
      array('stateMachineName' => 'xsd:string'),
      array('return' => 'xsd:string'),
      'urn:statemachine',             
      'urn:statemachine#getErrorMessage', 
      'rpc',                          
      'encoded',                      
      'returns the error message for a given state machine ("" if no error exists)'      
  );  
  $server->register('resetMachineState',    
      array('stateMachineName' => 'xsd:string'),
      array(),
      'urn:statemachine',             
      'urn:statemachine#resetMachineState', 
      'rpc',                          
      'encoded',                      
      'resets the given state machine'      
  );  
  $server->register('resetAllMachines',    
      array(),
      array(),
      'urn:statemachine',             
      'urn:statemachine#resetMachineState', 
      'rpc',                          
      'encoded',                      
      'resets all the state machines'      
  );    

  $server->service($HTTP_RAW_POST_DATA);

  //send a message to a given state machine. returns the new state
  function postMessage($stateMachineName, $message) {
    if (!preg_match('/^[w-s]+$/', $stateMachineName)) 
      stopWithErrorMessage('state machine name contains illegal characters');
    global $data_directory;
    if (!is_file($data_directory . $stateMachineName . ".xml")) 
      stopWithErrorMessage('specified state machine does not exists');
      
    //load up the state machine
    $stateMachine = loadStateMachineFromFile($data_directory . $stateMachineName . ".xml");
    global $db_connection;
    //from now on we need to be synchronized with other threads - lock the database table
    $db_connection->query('LOCK TABLE lock_table WRITE');
    //synchronize it with the database
    $stateMachine = synchronizeStateMachine($stateMachine, $stateMachineName);
    //now post the message to the state machine
    $stateMachine = postMessageWhilePossible($stateMachine, $message);
    if (is_string($stateMachine)) {
      //an error has occured! store the error message and return the empty string
      $db_connection->query('REPLACE INTO state_machines (machine_state, error_message) VALUES ("", "' . addslashes($stateMachine['currentState']) . 
        '") WHERE machine_name="' . addlashes($stateMachineName) . '"');
      $db_connection->query('UNLOCK TABLES');
      return '';
    } else {
      //everything went ok, store the new state and return it
      $db_connection->query('REPLACE INTO state_machines (machine_state, error_message) VALUES ("' . addslashes($stateMachine['currentState']) . 
        '", "") WHERE machine_name="' . addslashes($stateMachineName) . '"');
      $db_connection->query('UNLOCK TABLES');
      return $stateMachine['currentState'];
    }     
  }
  
  //returns the current state of the automaton
  function getMachineState($stateMachineName) {
    if (!preg_match('/^[w-s]+$/', $stateMachineName)) 
      stopWithErrorMessage('state machine name contains illegal characters');
    global $data_directory;
    if (!is_file($data_directory . $stateMachineName . ".xml")) 
      stopWithErrorMessage('specified state machine does not exists');
      
    //load up the state machine
    $stateMachine = loadStateMachineFromFile($data_directory . $stateMachineName . ".xml");
    //synchronize it with the database
    $stateMachine = synchronizeStateMachine($stateMachine, $stateMachineName);
    //return the current state
    return $stateMachine['currentState'];
  }
  
  //returns the error message for a given state machine ('' if no error exists)
  function getErrorMessage($stateMachineName) {
    if (!preg_match('/^[w-s]+$/', $stateMachineName)) 
      stopWithErrorMessage('state machine name contains illegal characters');
    global $db_connection;
    return $db_connection->getOne("SELECT error_message FROM state_machines WHERE machine_name="" . addslashes($stateMachineName) . """);
  }
  
  //resets the given state machine
  function resetMachineState($stateMachineName) {
    if (!preg_match('/^[w-s]+$/', $stateMachineName)) 
      stopWithErrorMessage('state machine name contains illegal characters');
    global $db_connection;
    $db_connection->query('DELETE FROM state_machines WHERE machine_name="' . addslashes($stateMachineName) . '"');
  }
  
  //resets all the state machines
  function resetAllMachines() {
    global $db_connection;
    $db_connection->query('DELETE FROM state_machines');
  }
  
  //internal helper function which outputs the error message to the header and then exits
  function stopWithErrorMessage($error_message) {
    header("HTTP/1.1 500 Internal Server Error: " . $error_message, true, 500);
    exit;     
  }
  
  //internal helper function - synchronizes the state of a an automaton with the one stored in the databas
  //(if it's stored there)
  function synchronizeStateMachine($state_machine, $stateMachineName) {
    global $db_connection;
    $machine_state = $db_connection->getRow("SELECT * FROM state_machines WHERE machine_name="" . addslashes($stateMachineName) . """);
    if (is_array($machine_state)) {
      //it is present in the database, try to synchronize with it
      if ( ('' == $machine_state['machine_state']) || array_key_exists($machine_state['machine_state'], $state_machine['states'])) {
        $state_machine['currentState'] = $machine_state['machine_state'];
        return $state_machine;
      } else {
        stopWithErrorMessage("Erroneous state in the database: " . $machine_state['machine_state']);        
        exit;
      }
    }
    //it's not present in the database, leave it as it is
    return $state_machine;
  }
  
  //internal helper function. Checks if the given node has the specified attribute
  //if not, returns null, if it does, it returns it
  function getAttributeOrNull($node, $attr_name) {
    if ($node->hasAttribute($attr_name))
      return $node->getAttribute($attr_name);
    return null;      
  }
  
  //internal helper function. Extract and validate the action elements from
  //a node (message or state). Return an array structure on success,
  //an error messag (string) on failure
  function extractActions($parent_node, $states_list, $state_name, $message_name) {
    //process the probable actions - make sure that the sum of the probabilities if 1.0
    //when no probaility is specified, the remaining probability is distributed between them
    $error_message_suffix = ('' == $message_name) ? '' : " at message '$message_name'";
    $actions = $parent_node->getElementsByTagName('action');
    $result = array();
    $probability_sum = 0.0; $actions_with_no_probability = 0;
    foreach ($actions as $action) {
      $new_action = array();
      if (null !== ($action_probability = getAttributeOrNull($action, 'probability'))) {
        if ($action_probability <= 0)
          return "Negative probability of action in state '$state_name'$error_message_suffix";
        $probability_sum += 0.0 + $action_probability;
        $new_action['probability'] = 0.0 + $action_probability;
      } else {
        ++$actions_with_no_probability;
      }
      if (null === ($action_wait_before = getAttributeOrNull($action, 'waitBefore'))) {
        $new_action['waitBefore'] = 0;
      } else {
        if ($action_wait_before < 0)
          return "Negative 'waitBefore' of action in state '$state_name'$error_message_suffix";
        $new_action['waitBefore'] = intval($action_wait_before);
      }
      if (null === ($action_wait_after = getAttributeOrNull($action, 'waitAfter'))) {
        $new_action['waitAfter'] = 0;
      } else {
        if ($action_wait_after < 0)
          return "Negative 'waitAfter' of action in state '$state_name'$error_message_suffix";
        $new_action['waitAfter'] = intval($action_wait_after);
      }     
      if (null === ($action_next_state = getAttributeOrNull($action, 'nextState')))
        return "Unspecified nextState in state '$state_name'$error_message_suffix";
      if (!array_key_exists($action_next_state, $states_list))
        return "Invalid nextState specified in action at state '$state_name'$error_message_suffix: '$action_next_state'";
      $new_action['nextState'] = $action_next_state;
                  
      $result[] = $new_action;
    }
    //now redistribute the remaining probability :)
    if ($actions_with_no_probability > 0) {
      foreach (array_keys($result) as $action_key) {
        if (!array_key_exists('probability', $result[$action_key])) {
          $result[$action_key]['probability'] = (1.0 - $probability_sum) / $actions_with_no_probability;
        }
      }
    }
    //finally sum up the probability and check it (must be 1.0)
    $probability_sum = 0.0;
    foreach ($result as $action)
      $probability_sum += $action['probability'];
    if (abs(1.0 - $probability_sum) > 0.001)
      return "The sum of probabilities for state '$state_name'$error_message_suffix is way off from 1.0"; 
    
    return $result;
  }

  //returns a structure which completly describes the state machine
  //returns a string with an error message if the XML failed to follow
  //the rules
  function loadStateMachine($state_machine_xml) {
    $doc = new DOMDocument();
    $doc->loadXML($state_machine_xml);    
    
    //this will be the result is all goes well
    $result = array();
    
    //find all the valid message
    $xpath = new DOMXPath($doc);
    $valid_messages = array();
    foreach ($xpath->query('//stateMachine/messages/message') as $valid_message) {
      if (null === getAttributeOrNull($valid_message, 'name'))
        return "Found message element which doesn't have the 'name' attribute!";
      $valid_messages[getAttributeOrNull($valid_message, 'name')] = 1;
    } 
    if (0 >= count($valid_messages))
      return 'No valid message names found!';   
          
    //now parse states
    $result['states'] = array();
    $states = $doc->getElementsByTagName('state');
    if (0 >= $states->length)
      return 'No state elements found!';
    
    //first store the state names so that we can validate them later on
    foreach ($states as $state) {
      if (null === ($state_name = getAttributeOrNull($state, 'name')))
        return 'Found state with no name!';
      if (array_key_exists($state_name, $result['states']))
        return "Found state with duplicate name: '$state_name'";
      $result['states'][$state_name] = array();
    }
    
    foreach ($states as $state) {
      //validate the basic parameters for the state
      $state_name = getAttributeOrNull($state, 'name');
      if (null === ($state_type = getAttributeOrNull($state, 'type')))
        return 'Found state with no type!';
      if ( ('message' != $state_type) && ('auto' != $state_type) )
        return "Found state with invalid type: '$state_type'";
        
      //save the validated stuff
      $result['states'][$state_name]['type'] = $state_type;
      
      //process the available state transitions
      if ('message' == $state_type) {
        $messages = $state->getElementsByTagName('message');
        $result['states'][$state_name]['messages'] = array();
        foreach ($messages as $message) {
          //message name: - it should exists, - it should be valid and - it shouldn't be used before (in this state)
          if (null === ($message_name = getAttributeOrNull($message, 'name')))
            return "Found message with no name in state '$state_name'";
          if (!array_key_exists($message_name, $valid_messages))
            return "Found invalid message name '$message_name' in state '$state_name'";
          if (array_key_exists($message_name, $result['states'][$state_name]['messages']))          
            return "Found duplicate message name '$message_name' in state '$state_name'";
          $result['states'][$state_name]['messages'][$message_name] = array();
          
          $result['states'][$state_name]['messages'][$message_name]['actions'] = 
            extractActions($message, $result['states'], $state_name, $message_name);
          if (is_string($result['states'][$state_name]['messages'][$message_name]['actions']))
            //an error has occured
            return $result['states'][$state_name]['messages'][$message_name]['actions'];
        }       
      } else {
        //load the actions we can chose from
        $result['states'][$state_name]['actions'] = 
          extractActions($state, $result['states'], $state_name, '');
        if (is_string($result['states'][$state_name]['actions']))
          //an error has occurred
          return $result['states'][$state_name]['actions'];
      }
    }
      
    //get the starting state and make sure that it's a valid state
    if (null === ($initial_state = getAttributeOrNull($doc->documentElement, 'initialState')))
      return 'Document has no initial state!';      
    if (!array_key_exists($initial_state, $result['states']))
      return 'Initial state is invalid!';   
    $result['currentState'] = $initial_state;   
    
    return $result;
  }
  
  function loadStateMachineFromFile($file_machine) {
    $xml_file_contents = file_get_contents($file_machine);
    if (get_magic_quotes_gpc()) $xml_file_contents = stripslashes($xml_file_contents);
    return loadStateMachine($xml_file_contents);
  }
  
  //applies a given message to a given state machine. it executes the specified waits
  //returns the modified state machine. If an error occured, the currentState will be set to ''
  function internalPostMessage($state_machine, $message = '') {
    //print "Applying message: '$message'n";
    //print "Current state: $state_machine[currentState]n";
    
    //we are in an invalid state - we can't do anything
    if (!array_key_exists($state_machine['currentState'], $state_machine['states']))
      return $state_machine;
        
    if ('message' == $state_machine['states'][$state_machine['currentState']]['type']) {
      if (!array_key_exists($message, $state_machine['states'][$state_machine['currentState']]['messages'])) {
        //this message can not be applied now
        $state_machine['currentState'] = '';
        return $state_machine;
      }
      $actions = $state_machine['states'][$state_machine['currentState']]['messages'][$message]['actions'];
    } else {
      $actions = $state_machine['states'][$state_machine['currentState']]['actions'];
    }
        
    //now chose an action, by "throwing a dice"
    $rand_value = rand(0, 32768) / 32768;
    $action_to_execute = null;
    foreach ($actions as $action) {
      $action_to_execute = $action;
      if ($rand_value <= $action['probability'])        
        break;      
      $rand_value -= $action['probability'];
    }   
    
    //print "Going to state $action[nextState]n";
    //now execute the action        
    sleep(intval($action['waitBefore'] / 1000 + 0.5));
    $state_machine['currentState'] = $action['nextState'];    
    sleep(intval($action['waitAfter'] / 1000 + 0.5));
    //print "Gone to state $action[nextState]n";   
    
    return $state_machine;
  }
  
  //the same as above, however it continues while possible after the first move
  //(while the current state is an automatic one)
  function postMessageWhilePossible($state_machine, $message) {
    //process any automatic statest BEFORE
    while ( ('' != $state_machine['currentState']) &&
      ('auto' == $state_machine['states'][$state_machine['currentState']]['type']) ) {
      $state_machine = internalPostMessage($state_machine);
    }
    if ('' != $state_machine['currentState'])
      $state_machine = internalPostMessage($state_machine, $message);
    //process any automatic statest AFTER
    while ( ('' != $state_machine['currentState']) &&
      ('auto' == $state_machine['states'][$state_machine['currentState']]['type']) ) {
      $state_machine = internalPostMessage($state_machine);
    }
    return $state_machine;
  } 
?>


Appendix B - Example state machine file


  
    
  
  
    
      
      
    
  
  
    
      
  
  
    
        
    
  



Appendix C - Example client program in .NET (C#, VB .NET, Delphi and Managed C++)
Before you can use these examples, you have to add a Web Reference to your project. You can do this by right-clicking on your Reference folder in your Visual Studio and selecting Web Reference. You should put in the link with a ?wsdl appended (to get the WSDL file). For example if you are hosting the service locally, you would put in http://localhost/webservice/index.php?wsdl
C#
private static void Main(string[] args)
{
      Console.WriteLine("Startin up...");
      statemachine statemachine1 = new statemachine();
      Console.WriteLine("Startup done...");
      for (int num1 = 0; num1 < 10; num1++)
      {
            Console.WriteLine("The current state is: " + statemachine1.getMachineState("testAutomata"));
            Console.WriteLine("Passing message: Flip");
            Console.WriteLine("Test automata returned: " + statemachine1.postMessage("testAutomata", "Flip"));
            Console.WriteLine("---");
      }
      Console.WriteLine("Press any key to exit...");
      Console.ReadKey();
}

VB .NET
Private Shared Sub Main(ByVal args As String())
      Console.WriteLine("Startin up...")
      Dim statemachine1 As New statemachine
      Console.WriteLine("Startup done...")
      Dim num1 As Integer = 0
      Do While (num1 < 10)
            Console.WriteLine(("The current state is: " & statemachine1.getMachineState("testAutomata")))
            Console.WriteLine("Passing message: Flip")
            Console.WriteLine(("Test automata returned: " & statemachine1.postMessage("testAutomata", "Flip")))
            Console.WriteLine("---")
            num1 += 1
      Loop
      Console.WriteLine("Press any key to exit...")
      Console.ReadKey
End Sub

Delphi
procedure Program.Main(args: string[]);
begin
      Console.WriteLine('Startin up...');
      statemachine1 := statemachine.Create;
      Console.WriteLine('Startup done...');
      num1 := 0;
      while ((num1 < 10)) do
      begin
            Console.WriteLine(string.Concat('The current state is: ', statemachine1.getMachineState('testAutomata')));
            Console.WriteLine('Passing message: Flip');
            Console.WriteLine(string.Concat('Test automata returned: ', statemachine1.postMessage('testAutomata', 'Flip')));
            Console.WriteLine('---');
            inc(num1)
      end;
      Console.WriteLine('Press any key to exit...');
      Console.ReadKey
end;

Managed C++
private: static void __gc* Main(System::String __gc* args __gc [])
{
      System::Console::WriteLine(S"Startin up...");
      ContactWebservice::stateMachine::statemachine __gc* statemachine1 = __gc new ContactWebservice::stateMachine::statemachine();
      System::Console::WriteLine(S"Startup done...");
      for (System::Int32 __gc* num1 = 0; (num1 < 10); num1++)
      {
            System::Console::WriteLine(System::String::Concat(S"The current state is: ", statemachine1->getMachineState(S"testAutomata")));
            System::Console::WriteLine(S"Passing message: Flip");
            System::Console::WriteLine(System::String::Concat(S"Test automata returned: ", statemachine1->postMessage(S"testAutomata", S"Flip")));
            System::Console::WriteLine(S"---");
      }
      System::Console::WriteLine(S"Press any key to exit...");
      System::Console::ReadKey();
}

Appendix D - Example client written in Perl
use warnings;
use diagnostics;
use SOAP::Lite +trace => 'all';

print ">>" . SOAP::Lite
  -> uri('http://www.soaplite.com/Demo')
  -> proxy('http://localhost/webservice/index.php')
  -> getMachineState("testAutomata")
  -> result;



Including mixed (SSL and non-SSL) content on your secure site
gpanther — Mon, 01 Jan 2007 19:56:00 +0000
Disclaimer: while I dabble with Apache from time to time, I’m not a professional SysAdmin or Apache guru. The things described below is my own experience, and it should not be considered expert advice, just a staring point. An other way to say it: if you know better, please leave a comment :).
AskApache (a great blog BTW for technical network related stuff – the only negative thing being that sometimes it is too technical :)) has an article about mixing secure (fetched through HTTPS) and non-secure (fetched through HTTP) elements on a page. Usually the result of doing something like this is that the browser displays a warning and/or a broken lock instead of a normal lock. This can scare away security conscious users. Two things you can do to remedy this:
If you host the resources the link goes to, use the HTTPS protocol to link to them. Most of the times people use plain HTTP to link to static elements (like images, style-sheets and so on) because the encryption in the HTTPS protocol creates an overhead and we want to keep CPU utilization low for our servers. Here are my counter-arguments: modern servers have plenty of CPU power. Also, most (read 99.9%) of modern web browser do multiple requests over the same connection, so that the encryption key is negotiated only once every N minutes (where N is around 15 if I remember right). An other argument would be (if you are using a hosting company): I never seen hosting companies charing by the amount of HTTPS connections made. Finally the big argument: are you ready to loose visitors / sales / whatever your site is about because users mistrusting your site (because of the warnings) to get some little speed and scalability gain?
If the given resources are not hosted by you and are not accessible through a secure connection, you could use mod_proxy to create a virtual proxy to make it seem as the response comes from your server. (You could also  simply copy the page / image in question to your local server and serve it up from there, but that includes all kind of copyright problems). Some advantages and disadvantage:

Advantage: you eliminate the mixed content warning
Disadvantage: You are using double the bandwidth (because your server first fetches the given resource – thus using downstream bandwidth – then sends it to the client – using upstream bandwidth)
Advantage: it is seamless for the client
Disadvantage: you have to have mod_proxy installed. It is not included in the default Apache installation and SysAdmins are not very happy to install it, because it can very well be a security risk it not configured properly
Advantage: it works with dynamic resources (for which the make-a-copy and serve-it-from-the-local-server wouldn’t work even if you would to resolve somehow the copyright issues)

One final note: the AskApache article talks about hosting videos (Google and YouTube) on a secure page. The interesting thing is that the browser only cares about the fact that the player is loaded through a secure connection, not that the video (loaded by the player) loads through a secure connection. This is done because the browser has no control over the plugins (in this case the Flash player) behavior. The good news is however that because of this if you chose the proxy solution, you don’t have to proxy the entire video, just the player (which is obviously much smaller).



20 ways to Secure your Apache Configuration
gpanther — Sun, 03 Dec 2006 09:12:00 +0000
A nice writeup about securing your Apache installation:
20 ways to Secure your Apache Configuration



Cookie viruses? Me thinks not
gpanther — Thu, 09 Nov 2006 16:02:00 +0000
The only reader of mine had a question: what is my opinion about cookie viruses? (If you also read this blog, I apologize and also I’m werry happy in this case that I have more than reader. If you have questions or topics you would like me to discuss, please post them in the comments)
Getting back to the topic here: I don’t have an opinion, since there is no such thing as a cookie virus. By definition the a virus is (quote from Wikipedia):
A computer virus is a self-replicating computer program written to alter the way a computer operates, without the permission or knowledge of the user.
There is no such thing because cookies should be (and usually are) treated by browsers as opaque tokens (that is they are not interpreted in any way and are sent back exactly as received to the server). Now one could imagine the following really far-fetched scenario which would be something similar to viruses:
A given site uses cookies to return some javascript which is evaluated at the clientside by some javascript embedded in the page (that is the code embedded in the page is looking at document.cookies and doing an eval on it. Now in this case we could make the client side javascript do whatever we want, however:


If we can modify the client side headers, probably we have very such access to either the client or the server that there are much more malicious things that we can do.
The javascript will be executed in the very limited context of the web page, so we could only infect other cookies that go to the same site (but we already had access to it when modifying the first cookie, so there is no reason for using such a convoluted method.

Now many sensationalist sources use the word virus to refer to all kind of malicious actions to drive up hype (and we all know what my opinion is about that). There are however some real possibilities of doing harm, most of them in the area of information theft and input validation.

The first one (which doesn’t fit in any of the two mentioned categories) is the possibility that there is a buffer overflow exploit in the cookie handling code at the server. In the official standard it is stated that a cookie should be now larger than 4096 bytes and we all know that when something like this is in the spec many coders automatically write char buffer[4096];. However before you think that this is a 0-day against Apache or something, let me say the following: I threw together some quick code an ran it against an Apache server (2.2.something) and it very nicely refused to accept the headers. It also generated a return message which was properly escaped, so there is no possible XSS vulnerability there. I’m also sure that IIS has no such problem but maybe some home-brew custom http servers might have this problem.
A scenario on which some papers are focusing is the following: the cookie contains some text which is relayed back to the server, which in turns embeds it in the HTML output without proper sanitization. This can result in the attacker embedding code of their choice in the page, including javascript, however such an attack has no real-life benefit, since if the attacker can access the clients cookies, s/he has probable write access to the file system and can do much more nefarious things with much less complication.
A third possibility would be that the server relies on data contained in the cookie for authentication or for some other action. In the first case there are two vulnerabilities: cookie theft and creating a custom cookie to gain access (if the server relies for example on some value to be present in the cookie to indicate that the user authenticated successfully). The second case would be when there are parameters in the cookie for a server-side process (shopping cart information). If the server has no way of validating the information upon receiving it or doesn’t do so, one could manipulate this information to gain advantages (to buy things at a 0 price for example). Ideally this information should be kept in a server-side session storage or if you don’t want to break the REST model, encode it in the URL, but make sure that you provide a way for the server to verify the posted back information, by for example encrypting it and then appending a salted hash (where the salt is only known to the server) to it and verifying this when receiving a new request.

In conclusion: Developers – validate your input! validate your input! validate your input! (at every step)



Things you (probably) didn’t know about your webserver
gpanther — Wed, 04 Oct 2006 08:41:00 +0000
Today’s webservers are incredibly complex beasts. I don’t know how many of the people operating Apache have read the full specifications. I sure didn’t. So it should come as no surprise that there are hidden features in our servers (and some of them turned on by default), which can weaken our defenses. There are two that I want to talk about today, both turned on by default:

The first (and the more important one, although in security every item is important) was only recently publicized and involves sending an invalid header to Apache, which responds with an error page. I’ve got this one from the SecuriTeam blog. If the default error pages were not changed, they will include the invalid header, so a cross-site scripting attack is possible. To test if your site is vulnerable, you can use curl like this: curl http://localhost/asdf -H "Expect: " -v -i. If the output contains the alert, your server is vulnerable. To worsen the situation, you can use Flash or XMLHttpRequest to create these types of requests (although not with Firefox, which disallows the transmission of this header). Now don’t start filtering on Mozilla browsers, because user agents can also be spoofed. The two possible workarounds are: create custom error pages (harder if you host multiple sites) or enable mod_headers and use the following global rule: RequestHeader unset Expect early (tested with Apache 2.2.3 on WinXP). This might slow your webserver a little down as described in the documentation, but at least you’re not vulnerable until you update Apache.
The second is a lesser problem, and involves the possibility of stealing cookies if the site has a XSS vulnerability even if the cookies are marked HttpOnly: It involves sending a TRACE request to the webserver. This request is usually used for debugging, and echoes everything back, including the cookie headers. Again Flash or XMLHttpRequest can be used to craft these special queries. A more detailed description of them can be found here: http://www.cgisecurity.com/whitehat-mirror/WhitePaper_screen.pdf. To test if your vulnerable, telnet to your webserver and enter the following commands:
TRACE / HTTP/1.1
Host: localhost (replace it with your host)
X-Header: test

(two enters)

 and you should see everything echoed back to you. As described here, you can use mod_rewrite to filter this attack, by adding the following rules:
RewriteEngine On
RewriteCond %{REQUEST_METHOD} ^TRACE
RewriteRule .* - [F]

And it is also a good idea to make sure that your sites are not vulnerable to XSS