Configure Apache as a caching proxy fronting these services. This means that you can tolerate downtime for the services and you have quicker builds (since you don’t need to contact remote servers). It also has a security benefit (you can firewall of your build server such that it can’t make any outgoing connections) and it’s nice to avoid consuming the bandwidth of those registries (especially since they are provided for free).
Without further ado, here are the config bits for Apache 2.4
/etc/apache2/force_cache_proxy.conf
– the general configuration file for caching:
# Security - we don't want to act as a proxy to arbitrary hosts ProxyRequests Off SSLProxyEngine On # Cache files to disk CacheEnable disk / CacheMinFileSize 0 # cache up to 100MB CacheMaxFileSize 104857600 # Expire cache in one day CacheMinExpire 86400 CacheDefaultExpire 86400 # Try really hard to cache requests CacheIgnoreCacheControl On CacheIgnoreNoLastMod On CacheStoreExpired On CacheStoreNoStore On CacheStorePrivate On # If remote can't be reached, reply from cache CacheStaleOnError On # Provide information about cache in reply headers CacheDetailHeader On CacheHeader On # Only allow requests from localhost <Location /> Order Deny,Allow Deny from all Allow from 127.0.0.1 </Location> <Proxy *> # Don't send X-Forwarded-* headers - don't leak local hosts # And some servers get confused by them ProxyAddHeaders Off </Proxy> # Small timeout to avoid blocking the build to long ProxyTimeout 5
Now with this prepared we can create the individual configurations for the services we wish to proxy:
For pypi:
# pypi mirror Listen 127.1.1.1:8001 <VirtualHost 127.1.1.1:8001> Include force_cache_proxy.conf ProxyPass / https://pypi.python.org/ status=I ProxyPassReverse / https://pypi.python.org/ </VirtualHost>
For npm:
# npm mirror Listen 127.1.1.1:8000 <VirtualHost 127.1.1.1:8000> Include force_cache_proxy.conf ProxyPass / https://registry.npmjs.org/ status=I ProxyPassReverse / https://registry.npmjs.org/ </VirtualHost>
After configuration you need to enable the site (a2ensite) as well as needed modules (a2enmod – ssl, cache, disk_cache, proxy, proxy_http).
Finally you need to configure your package manager clients to use these endpoints:
For npm you need to edit ~/.npmrc
(or use npm config set
) and add registry = http://127.1.1.1:8000/
For Python / pip you need to edit ~/.pip/pip.conf
(I recommend having download-cache as per Stavros’s post):
[global] download-cache = ~/.cache/pip/ index-url = http://127.1.1.1:8001/simple/
If you use setuptools (why!? just stop and use pip :-)), your config is ~/.pydistutils.cfg
:
[easy_install] index_url = http://127.1.1.1:8001/simple/
Also, if you use buildout, the needed config adjustment in buildout.cfg
is:
[buildout] index = http://127.1.1.1:8001/simple/
This is mostly it. If your client is using any kind of local caching, you should clear your cache and reinstall all the dependencies to ensure that Apache has them cached on the disk. There are also dedicated solutions for caching the repositories (for example devpi for python and npm-lazy-mirror for node), however I found them somewhat unreliable and with Apache you have a uniform solution which already has things like startup / supervision implemented and which is familiar to most sysadmins.
]]>However this wasn’t possible with SSL, because the certificate was sent before the headers and a certificate is specific for a site (at least the run-of-the mill ones), and the webserver didn’t know which certificate to pick. When I’ve heard on the SANS Daily Stormcast that the newest version of Apache included a way to do this, I was enthusiastic and intrigued at the same time, so I went looking and found the following thing:
Bonus article: The First Few Milliseconds of an HTTPS Connection
Picture taken from AMagill’s photostream with permission.
]]>I want to use a PHP script as a kind of transparent proxy
(that is, when I request a file, it downloads it from an other URL and serves it up to me), but mod_deflate keeps eating my Content-Length header.
My advice is of course to selectively disable mod_deflate for the given PHP script. This would mean putting something like the following in your .htaccess file:
SetEnvIfNoCase Request_URI get_file.php$ no-gzip dont-vary
Where get_file.php
is the name of the script. Some things to remember here:
dotcharacter needs to be escaped, because it is a meta-character for regular expressions (meaning
any character), so you must prefix it with a backslash to mean
the dot character
A final word of warning: if your host allows you to open files via URLs (with readfile for example), run, run far away, because this is a very insecure configuration for PHP and chances are that the server (especially if it shared between multiple users) will be powned quickly.
]]>The advantages of static files are: cacheability out of the box (with a dynamically generated result this is very hard to get right) and less overhead when serving up (even more so if something specialized is used like tiny httpd). However you might feel the need to apply authentication to the static files also (that is only users with proper privileges can have access to them). Of course you want to retain the advantages of caching and low overhead as much as possible.
One option (and probably the one with less overhead and ultimately simpler to implement) is to use mod_auth_mysql on the directory hosting the static files and generate a random long (!) username and password for each user session, insert them to the authentication table, and modify the links to the resources to include these credentials. For example, a link in this case might look like this:
http://w7PLTHUDxK:[email protected]/static/image.jpg
The advantage of this approach is that we get all those wonderful things like content type or cache headers (or even zlib compression if we configured it) for free. The main pitfall is the choosing of the place where to do the cleanup (remove this temporary user from the table). The session destroy handler is not good enough since it won’t be called if the user doesn’t properly log-out. One solution would be to do repeated “garbage collections” on the tables (in this case care must be taken to set this garbage collection interval the same or larger as the session timeout interval, since otherwise the access might “go away” from under the users feet while they are still logged on). An other option would be to add a user id column to the table and use the “REPLACE INTO” SQL command (which is AFAIK unique to MySQL, not standard) to ensure that the temporary user table has at most as many users as the main user table.
A quick note: all the above can of course be done with static authentication also (that is a hardcoded username and password in the .htaccess file). This is a very simple solution (an easier to apply, since mod_auth_mysql might not be installed/enabled on all the webservers, but mod_auth is on most of them), but is insecure, it can not be used to separate users (ie. to have files which only certain users can access) and because it does not expire automatically, one link is enough for search engines / other crawlers to find it.
This is all well and good, but what if you don’t have control over the server configuration? While I strongly recommend against using shared PHP hosting, some people might be in this situation. The solution is to recreate (at least some of) Apache’s functionality.
The first step is to put the actual static files outside your web root (preferably) or to deny access to the folder where the files are placed with .htaccess (less preferable). If the files would to reside in a public folder, this system would provide obfuscation at best and is equivalent with a 302 or 301 redirect at worst.
The next step is to decide on the method of referencing your static file. You have three options:
get_static.php?fn=image.jpg
)static/image.jpg
which will be rewritten by a rule into the form showed at the previous point)get_static.php/image.jpg
The second and third options are the ones I recommend. The reason behind this is that it gives the browser the illusion that you are dealing with different files which can help it do proper caching without relying on the ETag mechanism discussed later.
I would like to pause for a moment and remind everybody that security is a big concern in the web world, since you are practically putting your code out for everybody, meaning that anybody can come and try to break it. One particular concern with these types of scripts (which read and echo back to you arbitrary files) is path traversal. This attack is easy to demonstrate with the following example:
Let’s say that the script works by taking the filename given, concatenating it with the directory (which for this example is /home/abcd/static/
) and echoing back the given file. Now if I supply in the filename something like ../../../etc/password
, the resulting path will be /home/abcd/static/../../../etc/password
, meaning that I can read any file the web server has access to. And before the Windows guys start jumping up and down saying that this is a *nix problem, the example is very easy to translate to Windows.
Now your first reaction would be to disallow (blacklist) the usage of the .
character in the path, but don’t go this way. Rather, define the rule which your files will follow and verify that the supplied parameters follow that rule. For example the filenames will contain one or more alphanumeric, underscore or dash character and will have a png, jpg, css or js extension
. This translates into the regular expression ^[a-z0-9_-]+.(png|jpg|css|js)$
. Be sure to include the start and end anchors (otherwise it only has to contain a substring matching the rule, the whole string doesn’t have to match the rule) and watch out for other regular expression gotcha’s. As an added security measure use the realpath function (which resolves things like symbolic links or ..
sequences) before performing any further verification.
Now we have the file, and need to generate the headers. The important headers are:
text/plain, but it must be served up
text/stylesheetto work in all the browsers)
http://example.com/image.php?id=1
and to the second one http://example.com/image.php?id=2
, without an ETag these will represent the same cache entry, meaning that you can have situations where the second image is displayed instead of the first or vice-versa, because the browser operates under the assumption that they are the same and pulls one out of the cache, when instead the should be used. ETag’s can be an arbitrary alphanumeric string, so for example you could use the MD5 hash of the file (and no, there is no information disclosure vulnerability here which would warrant the usage of salted hashes for example because the user is already getting the file! S/he can recalculate the MD5 of it is s/he wishes!)The script also needs to take into account the request headers:
After I’ve written this all up, I’ve found that there is a PHP extension which provides most of the functions for this: HTTP. Use it. It’s much easier than rolling your own and you have less chance to miss some corner cases (like the fact that as per HTTP/1.1 request headers are case-insensitive, meaning that If-Modified-Since and iF-mOdIfIeD-sInCe are the same thing and should be treated the same).
PS. I didn’t mention, but mechanism can also be used to hide the real file names. This might be needed when for whatever reason you don’t want to divulge it (because file names can provide additional information which you might not want your users to have). This can be achieved by using an additional step and giving the user a token which is translated in a file-name at the server. These tokens can be:
For maximum security I recommend to go with arbitrarily chosen random tokens for each file (otherwise an attacker might break the security by trying other IDs – for example if the IDs are numeric, s/he can try other numbers – or by guessing the file names and applying the generator function on it and checking the existence of the file).
Update: I’ve looked at using mod_xsendfile with PHP, however it seems to be a dormant project (the latest posted version is for Apache 2.0, nothing there for 2.2 :-(). An other option which may be worth exploring is the following (if you are using PHP as a loadable module rather than CGI): use virtual to redirect the request to the static files. You even find a good example in the comments.
]]>encodedin different ways, one of the encoding being compression). Under Apache there are two simple ways to do this:
I won’t spend much detail on the configuration options, however I want to describe one little quirk, which is logical in hindsight but I struggled with it a little: you can loose the Content-Length header on files which don’t fit in your compression buffer from the start. This is course logical because:
Update: if you want to selectively disable mod_deflate for certain files because of this (or other reasons), check out this post about it.
You can observe this effect when downloading (large) files especially, since the absence of a Content-Length header means that the client can’t show a progress bar indicating the percentage you downloaded (this is what I observed at first and then went on to investigate the causes).
One more remark regarding the getting the Content-Length wrong
part. One (fairly) common case where this can be an issue is with PHP scripts which output Content-Length headers and the compression is done via zlib.output_compression
. The problem is that mod_php doesn’t remove the Content-Length header, which almost certainly has a larger value than the size of the compressed data. This causes the hanging, incomplete downloads
symptom. To be even more confusing:
The possible solutions would be:
apr_table_unset(r->headers_out, "Content-Length");
). This will kill the progress bar (for the reasons discussed before). To get back the progress bar, you could increase the DeflateBufferSize
configuration parameter (which is by default set to 8k) to be larger than the largest file you wish to serve, or deactivate compression for the files which will be downloaded (rather than displayed).A final remark: the HTTP protocol also supports the uploaded data to be compressed (this can useful for example when uploading larger files), as shown by the following blurb in the mod_deflate documentation:
The mod_deflate module also provides a filter for decompressing a gzip compressed request body. In order to activate this feature you have to insert the DEFLATE filter into the input filter chain using SetInputFilter or AddInputFilter.
…
Now if a request contains a Content-Encoding: gzip header, the body will be automatically decompressed. Few browsers have the ability to gzip request bodies. However, some special applications actually do support request compression, for instance some WebDAV clients.
When I saw this, I was ecstatic, since I was searching for something like this for some of my projects. If this works, it means that I can:
Obviously there must be some drawbacks
Accept-Encoding: gzip
header along with the requestreallysupport the compression)
However, the fact that the client is the first to send, means that there is no way for the server to signal its (in)capability to accept gzip encoding. Even the fact that it’s Apache and previously served up compressed content doesn’t guarantee the fact that it can handle it, since the input and output filters are two separate things. So the options available are:
So how do you do it? Here is a blurb, again from the mod_deflate source code: only work on main request/no subrequests
. This means that the whole body of the request must be gzip compressed if we chose to use this, it is not possible to compress only the part containing the file for example in a multipart request. Below you can see some perl code I hacked together to use this feature:
#!/usr/bin/perl use strict; use warnings; use File::Temp qw/tempfile/; use Compress::Zlib; use HTTP::Request::Common; use LWP; $HTTP::Request::Common::DYNAMIC_FILE_UPLOAD = 1; my $request = POST 'http://10.0.0.12/test.php', [ 'test1' => 'test1', 'test2' => 'test2', 'a_file' => ['somedata.dat'] ], 'Content_Type' => 'form-data', 'Content_Encoding' => 'gzip'; sub transform_upload { my $request = shift; my ($fh, $filename) = tempfile(); my $cs = gzopen($fh, "wb"); my $request_c = $request->content(); while (my $data = $request_c->()) { $cs->gzwrite($data); } $cs->gzclose(); close $fh; open $fh, $filename; binmode $fh; $request->content(sub { my $buffer; if (0 < read $fh, $buffer, 4096) { return $buffer; } else { close $fh; return undef; } }); $request->content_length(-s $filename); } transform_upload($request); my $browser = LWP::UserAgent->new(); my $response = $browser->request($request); print $response->content();
This code is optimized
for big files, meaning that it won’t read the whole request in the memory at one time. Hope somebody finds it useful.
The specific requirements are:
But let me step back for a moment and ask: why would you want to go this route? Why not use the product of well known companies which offer integration with developer tools and in some cases are available for free? While those products are certainly more mature and easier to use, when going the OSS route you have:
By doing research following these guidelines the following three possibilities emerged:
Advantages:
SOAP::Transport::HTTP::CGI
with SOAP::Transport::HTTP::Apache
in the examples)use SOAP::Lite +trace => 'all';
and then redirect the stderr output (where the tracing info is dumped) to a file like this soap-clien.pl 2>debug-info.txt
Disadvantages:
Advantages:
Disadvantages:
Advantages:
$debug = 1;
). The debugging information will be appended to the reponse XML as a comment.Disadvantages:
can not redefine class soap_client, you have to unload the PHP SOAP module. An other option would be to go through the source and rename this class.
The final choice was NuSOAP. The deciding factor was the (semi)automatic - because you have to give it hints about the parameter types - WSDL generation. This is essential if you wish to make your service available to the largest possible audience, especially those using statically typed languages. A perfect example is the .NET / Visual Studio environment, which needs the WSDL file to automatically generate the stub for the web service.
A little side note: if your web service is accessed through SSL / HTTPS and the certificate authority who
signed the certificate of the server is not trusted (ie. it's not Verisign), you get some warnings while
generating the stub in Visual Studio and the final program will halt with an exception saying something like Could not establish a trusted connection over this SSL/TLS connection
. The most common cause of this is the fact that the developer uses a self-signed certificate for the server. As far as I know there is no way to stop this from happening from inside the framework. However, because the framework shares its network access architecture with Internet Explorer, you can correct it from there. First you will need the certificate from the server (a .crt file, server.crt). Then go to Tools->Internet Options and select the Content tab. Click on Certificates, go to the Trusted Root Certification Authorities and select Import. Point it to the server certificate and answer affirmatively to the confirmation dialog. From now on that certificate will also be considered a trusted root certificate, you won't get warnings while browsing sites with it (and those sites might even have elevated privileges - depending on your Internet Explorer configuration), but most importantly your .NET client side code will work just fine.
The test project was to implement a web service which simulated some simple state machine(s). The project was implemented on the following platform:
The test was done on a Windows XP Pro machine using XAMPP to quickly install all the required components, however there is nothing platform specific in the code or the components, so it should be easy to replicate it on a different platform (Linux for example). The PHP code for the server side can be seen in Appendix A and an example for a state machine definition file can be found in Appendix B. An example client program written in C# can be seen in Appendix C.
The structure of a state machine definition file is as follows:
stateMachine
. It has one mandatory parameter: initialState
which specifies the initial state it is innextState
- mandatory, the name of the text state if this action is chosenwaitBefore
and waitAfter
- optional, the amount of period to pause before and after executing this actions, in milliseconds. If omitted, zero is assumed. It must be a positive integer.probability
- a number greater than 0 but less or equal to 1.0. Determines the probability of this action being chosen. 0 means never and 1 means always. The sum of probabilities for a group (state element for auto states and message element for message states) must be 1.0. If some probabilities are omitted, the remaining probability is distributed amongst them (so if we have 5 action items and the first has a probability of 0.1, the second one of 0.3 and the rest are omitted, the last three will each have a probability of 0.2)The exported functions by the server are:
postMessage(stateMachineName: string, message: string): string
posts a message to the state machine identified by the stateMachineName. The definition for this state machine must be stored on the server in the file <stateMachineName>.xml. It is assumed that there is only one instance "running" of each server. This is guaranteed by the fact that the state of them is stored in a database table protected by write locks during the transitions. The method is synchronous and returns the name of the current state resulted from processing the message and any other automatic steps (states of type auto) which followed. One thing to keep in mind is that if you specify a state of
type auto for the initial state, this will also be evaluated at the first message posted. On error it returns an empty string and you can use the getErrorMessage
function to get the error message.
getMachineState(stateMachineName: string): string
Gets the current state of the given state machine. It's asynchronous (with respect to the state machine, not the caller). If an error has occurred in the state machine it returns the empty string. You can use the getErrorMessage
function to get the error message.
getErrorMessage(stateMachineName: string): string
Returns the error message for the given state machine or empty if no error exists
resetMachineState(stateMachineName: string): void
Resets the machine to its initial state (as specified by the initialState attribute of the correspoding definition file)
resetAllMachines: void
Resets all the state machines to their initial state
To install it you would need the following items:
$data_directory
variable so that it points to directory where the XML files describing the state machines are located. Important: include the trailing slash or backslash depending on the platform
CREATE TABLE `state_machines`.`state_machines` (
`machine_name` VARCHAR(255) NOT NULL DEFAULT '',
`machine_state` VARCHAR(255) NOT NULL DEFAULT '',
`error_message` VARCHAR(255) NOT NULL DEFAULT '',
PRIMARY KEY(`machine_name`)
)
ENGINE = MEMORY;
CREATE TABLE `state_machines`.`lock_table` (
`dummy_column` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY(`dummy_column`)
)
ENGINE = MEMORY;
DB::connect
call<?php require_once 'lib/nusoap.php'; require_once 'DB.php'; $server = new soap_server(); $server->configureWSDL('statemachine', 'urn:statemachine'); $data_directory = "C:\xampp\htdocs\webservice\data\"; $db_connection = DB::connect("mysql://state_machine:password@localhost/state_machines"); if (DB::isError($db_connection)) stopWithErrorMessage("failed to connect to the database - " . $db_connection->getMessage()); // Register the methods to expose $server->register('postMessage', array('stateMachineName' => 'xsd:string', 'message' => 'xsd:string'), array('return' => 'xsd:string'), 'urn:statemachine', 'urn:statemachine#postMessage', 'rpc', 'encoded', 'send a message to a given state machine. returns the new state' ); $server->register('getMachineState', array('stateMachineName' => 'xsd:string'), array('return' => 'xsd:string'), 'urn:statemachine', 'urn:statemachine#getMachineState', 'rpc', 'encoded', 'returns the current state of the automaton' ); $server->register('getErrorMessage', array('stateMachineName' => 'xsd:string'), array('return' => 'xsd:string'), 'urn:statemachine', 'urn:statemachine#getErrorMessage', 'rpc', 'encoded', 'returns the error message for a given state machine ("" if no error exists)' ); $server->register('resetMachineState', array('stateMachineName' => 'xsd:string'), array(), 'urn:statemachine', 'urn:statemachine#resetMachineState', 'rpc', 'encoded', 'resets the given state machine' ); $server->register('resetAllMachines', array(), array(), 'urn:statemachine', 'urn:statemachine#resetMachineState', 'rpc', 'encoded', 'resets all the state machines' ); $server->service($HTTP_RAW_POST_DATA); //send a message to a given state machine. returns the new state function postMessage($stateMachineName, $message) { if (!preg_match('/^[w-s]+$/', $stateMachineName)) stopWithErrorMessage('state machine name contains illegal characters'); global $data_directory; if (!is_file($data_directory . $stateMachineName . ".xml")) stopWithErrorMessage('specified state machine does not exists'); //load up the state machine $stateMachine = loadStateMachineFromFile($data_directory . $stateMachineName . ".xml"); global $db_connection; //from now on we need to be synchronized with other threads - lock the database table $db_connection->query('LOCK TABLE lock_table WRITE'); //synchronize it with the database $stateMachine = synchronizeStateMachine($stateMachine, $stateMachineName); //now post the message to the state machine $stateMachine = postMessageWhilePossible($stateMachine, $message); if (is_string($stateMachine)) { //an error has occured! store the error message and return the empty string $db_connection->query('REPLACE INTO state_machines (machine_state, error_message) VALUES ("", "' . addslashes($stateMachine['currentState']) . '") WHERE machine_name="' . addlashes($stateMachineName) . '"'); $db_connection->query('UNLOCK TABLES'); return ''; } else { //everything went ok, store the new state and return it $db_connection->query('REPLACE INTO state_machines (machine_state, error_message) VALUES ("' . addslashes($stateMachine['currentState']) . '", "") WHERE machine_name="' . addslashes($stateMachineName) . '"'); $db_connection->query('UNLOCK TABLES'); return $stateMachine['currentState']; } } //returns the current state of the automaton function getMachineState($stateMachineName) { if (!preg_match('/^[w-s]+$/', $stateMachineName)) stopWithErrorMessage('state machine name contains illegal characters'); global $data_directory; if (!is_file($data_directory . $stateMachineName . ".xml")) stopWithErrorMessage('specified state machine does not exists'); //load up the state machine $stateMachine = loadStateMachineFromFile($data_directory . $stateMachineName . ".xml"); //synchronize it with the database $stateMachine = synchronizeStateMachine($stateMachine, $stateMachineName); //return the current state return $stateMachine['currentState']; } //returns the error message for a given state machine ('' if no error exists) function getErrorMessage($stateMachineName) { if (!preg_match('/^[w-s]+$/', $stateMachineName)) stopWithErrorMessage('state machine name contains illegal characters'); global $db_connection; return $db_connection->getOne("SELECT error_message FROM state_machines WHERE machine_name="" . addslashes($stateMachineName) . """); } //resets the given state machine function resetMachineState($stateMachineName) { if (!preg_match('/^[w-s]+$/', $stateMachineName)) stopWithErrorMessage('state machine name contains illegal characters'); global $db_connection; $db_connection->query('DELETE FROM state_machines WHERE machine_name="' . addslashes($stateMachineName) . '"'); } //resets all the state machines function resetAllMachines() { global $db_connection; $db_connection->query('DELETE FROM state_machines'); } //internal helper function which outputs the error message to the header and then exits function stopWithErrorMessage($error_message) { header("HTTP/1.1 500 Internal Server Error: " . $error_message, true, 500); exit; } //internal helper function - synchronizes the state of a an automaton with the one stored in the databas //(if it's stored there) function synchronizeStateMachine($state_machine, $stateMachineName) { global $db_connection; $machine_state = $db_connection->getRow("SELECT * FROM state_machines WHERE machine_name="" . addslashes($stateMachineName) . """); if (is_array($machine_state)) { //it is present in the database, try to synchronize with it if ( ('' == $machine_state['machine_state']) || array_key_exists($machine_state['machine_state'], $state_machine['states'])) { $state_machine['currentState'] = $machine_state['machine_state']; return $state_machine; } else { stopWithErrorMessage("Erroneous state in the database: " . $machine_state['machine_state']); exit; } } //it's not present in the database, leave it as it is return $state_machine; } //internal helper function. Checks if the given node has the specified attribute //if not, returns null, if it does, it returns it function getAttributeOrNull($node, $attr_name) { if ($node->hasAttribute($attr_name)) return $node->getAttribute($attr_name); return null; } //internal helper function. Extract and validate the action elements from //a node (message or state). Return an array structure on success, //an error messag (string) on failure function extractActions($parent_node, $states_list, $state_name, $message_name) { //process the probable actions - make sure that the sum of the probabilities if 1.0 //when no probaility is specified, the remaining probability is distributed between them $error_message_suffix = ('' == $message_name) ? '' : " at message '$message_name'"; $actions = $parent_node->getElementsByTagName('action'); $result = array(); $probability_sum = 0.0; $actions_with_no_probability = 0; foreach ($actions as $action) { $new_action = array(); if (null !== ($action_probability = getAttributeOrNull($action, 'probability'))) { if ($action_probability <= 0) return "Negative probability of action in state '$state_name'$error_message_suffix"; $probability_sum += 0.0 + $action_probability; $new_action['probability'] = 0.0 + $action_probability; } else { ++$actions_with_no_probability; } if (null === ($action_wait_before = getAttributeOrNull($action, 'waitBefore'))) { $new_action['waitBefore'] = 0; } else { if ($action_wait_before < 0) return "Negative 'waitBefore' of action in state '$state_name'$error_message_suffix"; $new_action['waitBefore'] = intval($action_wait_before); } if (null === ($action_wait_after = getAttributeOrNull($action, 'waitAfter'))) { $new_action['waitAfter'] = 0; } else { if ($action_wait_after < 0) return "Negative 'waitAfter' of action in state '$state_name'$error_message_suffix"; $new_action['waitAfter'] = intval($action_wait_after); } if (null === ($action_next_state = getAttributeOrNull($action, 'nextState'))) return "Unspecified nextState in state '$state_name'$error_message_suffix"; if (!array_key_exists($action_next_state, $states_list)) return "Invalid nextState specified in action at state '$state_name'$error_message_suffix: '$action_next_state'"; $new_action['nextState'] = $action_next_state; $result[] = $new_action; } //now redistribute the remaining probability :) if ($actions_with_no_probability > 0) { foreach (array_keys($result) as $action_key) { if (!array_key_exists('probability', $result[$action_key])) { $result[$action_key]['probability'] = (1.0 - $probability_sum) / $actions_with_no_probability; } } } //finally sum up the probability and check it (must be 1.0) $probability_sum = 0.0; foreach ($result as $action) $probability_sum += $action['probability']; if (abs(1.0 - $probability_sum) > 0.001) return "The sum of probabilities for state '$state_name'$error_message_suffix is way off from 1.0"; return $result; } //returns a structure which completly describes the state machine //returns a string with an error message if the XML failed to follow //the rules function loadStateMachine($state_machine_xml) { $doc = new DOMDocument(); $doc->loadXML($state_machine_xml); //this will be the result is all goes well $result = array(); //find all the valid message $xpath = new DOMXPath($doc); $valid_messages = array(); foreach ($xpath->query('//stateMachine/messages/message') as $valid_message) { if (null === getAttributeOrNull($valid_message, 'name')) return "Found message element which doesn't have the 'name' attribute!"; $valid_messages[getAttributeOrNull($valid_message, 'name')] = 1; } if (0 >= count($valid_messages)) return 'No valid message names found!'; //now parse states $result['states'] = array(); $states = $doc->getElementsByTagName('state'); if (0 >= $states->length) return 'No state elements found!'; //first store the state names so that we can validate them later on foreach ($states as $state) { if (null === ($state_name = getAttributeOrNull($state, 'name'))) return 'Found state with no name!'; if (array_key_exists($state_name, $result['states'])) return "Found state with duplicate name: '$state_name'"; $result['states'][$state_name] = array(); } foreach ($states as $state) { //validate the basic parameters for the state $state_name = getAttributeOrNull($state, 'name'); if (null === ($state_type = getAttributeOrNull($state, 'type'))) return 'Found state with no type!'; if ( ('message' != $state_type) && ('auto' != $state_type) ) return "Found state with invalid type: '$state_type'"; //save the validated stuff $result['states'][$state_name]['type'] = $state_type; //process the available state transitions if ('message' == $state_type) { $messages = $state->getElementsByTagName('message'); $result['states'][$state_name]['messages'] = array(); foreach ($messages as $message) { //message name: - it should exists, - it should be valid and - it shouldn't be used before (in this state) if (null === ($message_name = getAttributeOrNull($message, 'name'))) return "Found message with no name in state '$state_name'"; if (!array_key_exists($message_name, $valid_messages)) return "Found invalid message name '$message_name' in state '$state_name'"; if (array_key_exists($message_name, $result['states'][$state_name]['messages'])) return "Found duplicate message name '$message_name' in state '$state_name'"; $result['states'][$state_name]['messages'][$message_name] = array(); $result['states'][$state_name]['messages'][$message_name]['actions'] = extractActions($message, $result['states'], $state_name, $message_name); if (is_string($result['states'][$state_name]['messages'][$message_name]['actions'])) //an error has occured return $result['states'][$state_name]['messages'][$message_name]['actions']; } } else { //load the actions we can chose from $result['states'][$state_name]['actions'] = extractActions($state, $result['states'], $state_name, ''); if (is_string($result['states'][$state_name]['actions'])) //an error has occurred return $result['states'][$state_name]['actions']; } } //get the starting state and make sure that it's a valid state if (null === ($initial_state = getAttributeOrNull($doc->documentElement, 'initialState'))) return 'Document has no initial state!'; if (!array_key_exists($initial_state, $result['states'])) return 'Initial state is invalid!'; $result['currentState'] = $initial_state; return $result; } function loadStateMachineFromFile($file_machine) { $xml_file_contents = file_get_contents($file_machine); if (get_magic_quotes_gpc()) $xml_file_contents = stripslashes($xml_file_contents); return loadStateMachine($xml_file_contents); } //applies a given message to a given state machine. it executes the specified waits //returns the modified state machine. If an error occured, the currentState will be set to '' function internalPostMessage($state_machine, $message = '') { //print "Applying message: '$message'n"; //print "Current state: $state_machine[currentState]n"; //we are in an invalid state - we can't do anything if (!array_key_exists($state_machine['currentState'], $state_machine['states'])) return $state_machine; if ('message' == $state_machine['states'][$state_machine['currentState']]['type']) { if (!array_key_exists($message, $state_machine['states'][$state_machine['currentState']]['messages'])) { //this message can not be applied now $state_machine['currentState'] = ''; return $state_machine; } $actions = $state_machine['states'][$state_machine['currentState']]['messages'][$message]['actions']; } else { $actions = $state_machine['states'][$state_machine['currentState']]['actions']; } //now chose an action, by "throwing a dice" $rand_value = rand(0, 32768) / 32768; $action_to_execute = null; foreach ($actions as $action) { $action_to_execute = $action; if ($rand_value <= $action['probability']) break; $rand_value -= $action['probability']; } //print "Going to state $action[nextState]n"; //now execute the action sleep(intval($action['waitBefore'] / 1000 + 0.5)); $state_machine['currentState'] = $action['nextState']; sleep(intval($action['waitAfter'] / 1000 + 0.5)); //print "Gone to state $action[nextState]n"; return $state_machine; } //the same as above, however it continues while possible after the first move //(while the current state is an automatic one) function postMessageWhilePossible($state_machine, $message) { //process any automatic statest BEFORE while ( ('' != $state_machine['currentState']) && ('auto' == $state_machine['states'][$state_machine['currentState']]['type']) ) { $state_machine = internalPostMessage($state_machine); } if ('' != $state_machine['currentState']) $state_machine = internalPostMessage($state_machine, $message); //process any automatic statest AFTER while ( ('' != $state_machine['currentState']) && ('auto' == $state_machine['states'][$state_machine['currentState']]['type']) ) { $state_machine = internalPostMessage($state_machine); } return $state_machine; } ?>
<stateMachine initialState="Off"> <messages> <message name="Flip" /> </messages> <state name="Off" type="message"> <message name="Flip"> <action probability="0.5" nextState="Transient" /> <action probability="0.5" nextState="Off" /> </message> </state> <state name="Transient" type="auto"> <action probability="0.9" nextState="On" waitBefore="1000" /> <action probability="0.1" nextState="Off" waitBefore="1000" /> </state> <state name="On" type="message"> <message name="Flip"> <action nextState="Off" /> </message> </state> </stateMachine>
Before you can use these examples, you have to add a Web Reference
to your project. You can do this by right-clicking on your Reference
folder in your Visual Studio and selecting Web Reference. You should put in the link with a ?wsdl
appended (to get the WSDL file). For example if you are hosting the service locally, you would put in http://localhost/webservice/index.php?wsdl
private static void Main(string[] args) { Console.WriteLine("Startin up..."); statemachine statemachine1 = new statemachine(); Console.WriteLine("Startup done..."); for (int num1 = 0; num1 < 10; num1++) { Console.WriteLine("The current state is: " + statemachine1.getMachineState("testAutomata")); Console.WriteLine("Passing message: Flip"); Console.WriteLine("Test automata returned: " + statemachine1.postMessage("testAutomata", "Flip")); Console.WriteLine("---"); } Console.WriteLine("Press any key to exit..."); Console.ReadKey(); }
Private Shared Sub Main(ByVal args As String()) Console.WriteLine("Startin up...") Dim statemachine1 As New statemachine Console.WriteLine("Startup done...") Dim num1 As Integer = 0 Do While (num1 < 10) Console.WriteLine(("The current state is: " & statemachine1.getMachineState("testAutomata"))) Console.WriteLine("Passing message: Flip") Console.WriteLine(("Test automata returned: " & statemachine1.postMessage("testAutomata", "Flip"))) Console.WriteLine("---") num1 += 1 Loop Console.WriteLine("Press any key to exit...") Console.ReadKey End Sub
procedure Program.Main(args: string[]); begin Console.WriteLine('Startin up...'); statemachine1 := statemachine.Create; Console.WriteLine('Startup done...'); num1 := 0; while ((num1 < 10)) do begin Console.WriteLine(string.Concat('The current state is: ', statemachine1.getMachineState('testAutomata'))); Console.WriteLine('Passing message: Flip'); Console.WriteLine(string.Concat('Test automata returned: ', statemachine1.postMessage('testAutomata', 'Flip'))); Console.WriteLine('---'); inc(num1) end; Console.WriteLine('Press any key to exit...'); Console.ReadKey end;
private: static void __gc* Main(System::String __gc* args __gc []) { System::Console::WriteLine(S"Startin up..."); ContactWebservice::stateMachine::statemachine __gc* statemachine1 = __gc new ContactWebservice::stateMachine::statemachine(); System::Console::WriteLine(S"Startup done..."); for (System::Int32 __gc* num1 = 0; (num1 < 10); num1++) { System::Console::WriteLine(System::String::Concat(S"The current state is: ", statemachine1->getMachineState(S"testAutomata"))); System::Console::WriteLine(S"Passing message: Flip"); System::Console::WriteLine(System::String::Concat(S"Test automata returned: ", statemachine1->postMessage(S"testAutomata", S"Flip"))); System::Console::WriteLine(S"---"); } System::Console::WriteLine(S"Press any key to exit..."); System::Console::ReadKey(); }
use warnings; use diagnostics; use SOAP::Lite +trace => 'all'; print ">>" . SOAP::Lite -> uri('http://www.soaplite.com/Demo') -> proxy('http://localhost/webservice/index.php') -> getMachineState("testAutomata") -> result;]]>
AskApache (a great blog BTW for technical network related stuff – the only negative thing being that sometimes it is too technical :)) has an article about mixing secure (fetched through HTTPS) and non-secure (fetched through HTTP) elements on a page. Usually the result of doing something like this is that the browser displays a warning and/or a broken lock instead of a normal lock. This can scare away security conscious users. Two things you can do to remedy this:
If you host the resources the link goes to, use the HTTPS protocol to link to them. Most of the times people use plain HTTP to link to static elements (like images, style-sheets and so on) because the encryption in the HTTPS protocol creates an overhead and we want to keep CPU utilization low for our servers
. Here are my counter-arguments: modern servers have plenty of CPU power. Also, most (read 99.9%) of modern web browser do multiple requests over the same connection, so that the encryption key is negotiated only once every N minutes (where N is around 15 if I remember right). An other argument would be (if you are using a hosting company): I never seen hosting companies charing by the amount of HTTPS connections made. Finally the big argument: are you ready to loose visitors / sales / whatever your site is about because users mistrusting your site (because of the warnings) to get some little speed and scalability gain?
If the given resources are not hosted by you and are not accessible through a secure connection, you could use mod_proxy to create a virtual proxy to make it seem as the response comes from your server. (You could also simply copy the page / image in question to your local server and serve it up from there, but that includes all kind of copyright problems). Some advantages and disadvantage:
One final note: the AskApache article talks about hosting videos (Google and YouTube) on a secure page. The interesting thing is that the browser only cares about the fact that the player is loaded through a secure connection, not that the video (loaded by the player) loads through a secure connection. This is done because the browser has no control over the plugins (in this case the Flash player) behavior. The good news is however that because of this if you chose the proxy solution, you don’t have to proxy the entire video, just the player (which is obviously much smaller).
]]>cookie viruses? (If you also read this blog, I apologize and also I’m werry happy in this case that I have more than reader. If you have questions or topics you would like me to discuss, please post them in the comments)
Getting back to the topic here: I don’t have an opinion, since there is no such thing as a cookie virus. By definition the a virus is (quote from Wikipedia):
A computer virus is a self-replicating computer program written to alter the way a computer operates, without the permission or knowledge of the user.
There is no such thing because cookies should be (and usually are) treated by browsers as opaque tokens (that is they are not interpreted in any way and are sent back exactly as received to the server). Now one could imagine the following really far-fetched scenario which would be something similar to viruses:
A given site uses cookies to return some javascript which is evaluated at the clientside by some javascript embedded in the page (that is the code embedded in the page is looking at document.cookies and doing an eval on it. Now in this case we could make the client side javascript do whatever we want, however:
infectother cookies that go to the same site (but we already had access to it when modifying the first cookie, so there is no reason for using such a convoluted method.
Now many sensationalist sources use the word virus
to refer to all kind of malicious actions to drive up hype (and we all know what my opinion is about that). There are however some real possibilities of doing harm, most of them in the area of information theft and input validation.
char buffer[4096];
. However before you think that this is a 0-day against Apache or something, let me say the following: I threw together some quick code an ran it against an Apache server (2.2.something) and it very nicely refused to accept the headers. It also generated a return message which was properly escaped, so there is no possible XSS vulnerability there. I’m also sure that IIS has no such problem but maybe some home-brew custom http servers might have this problem.In conclusion: Developers – validate your input! validate your input! validate your input! (at every step)
]]>curl http://localhost/asdf -H "Expect: <script>alert('Vulnerable');</script>" -v -i
. If the output contains the alert, your server is vulnerable. To worsen the situation, you can use Flash or XMLHttpRequest to create these types of requests (although not with Firefox, which disallows the transmission of this header). Now don’t start filtering on Mozilla browsers, because user agents can also be spoofed. The two possible workarounds are: create custom error pages (harder if you host multiple sites) or enable mod_headers and use the following global rule: RequestHeader unset Expect early
(tested with Apache 2.2.3 on WinXP). This might slow your webserver a little down as described in the documentation, but at least you’re not vulnerable until you update Apache.TRACE / HTTP/1.1 Host: localhost (replace it with your host) X-Header: test (two enters)
and you should see everything echoed back to you. As described here, you can use mod_rewrite to filter this attack, by adding the following rules:
RewriteEngine On RewriteCond %{REQUEST_METHOD} ^TRACE RewriteRule .* - [F]
And it is also a good idea to make sure that your sites are not vulnerable to XSS