mod_deflate – Grey Panthers Savannah https://grey-panther.net Just another WordPress site Sat, 14 Jun 2008 08:45:00 +0000 en-US hourly 1 https://wordpress.org/?v=6.9 206299117 Disabling mod_deflate for certain files https://grey-panther.net/2008/06/disabling-mod_deflate-for-certain-files.html https://grey-panther.net/2008/06/disabling-mod_deflate-for-certain-files.html#comments Sat, 14 Jun 2008 08:45:00 +0000 https://grey-panther.net/?p=714 I received a question regarding my compressed HTTP post. It goes something like this:

I want to use a PHP script as a kind of transparent proxy (that is, when I request a file, it downloads it from an other URL and serves it up to me), but mod_deflate keeps eating my Content-Length header.

My advice is of course to selectively disable mod_deflate for the given PHP script. This would mean putting something like the following in your .htaccess file:

SetEnvIfNoCase Request_URI get_file.php$ no-gzip dont-vary

Where get_file.php is the name of the script. Some things to remember here:

  • the dot character needs to be escaped, because it is a meta-character for regular expressions (meaning any character), so you must prefix it with a backslash to mean the dot character
  • Request_URI represents the URL up to the filename, but not including any query parameters. For example if you have the URL http://example.com/get_file.php?file=talk.mp3, Reques_URI will contain http://example.com/get_file.php. However be aware that Apache contains a nice trick which can be used to generate nice URL’s but which can affect this: if the given file/directory is not found, it tries to go up the path to find a file. For example, I could have written http://example.com/get_file.php/talk.mp3, and if the script contains the logic to serve up talk.mp3, we can have a nice URL. The side effect is however that the Request_URI is now http://example.com/get_file.php/talk.mp3 and the regular expression must be adjusted accordingly (into something like .mp3$)

A final word of warning: if your host allows you to open files via URLs (with readfile for example), run, run far away, because this is a very insecure configuration for PHP and chances are that the server (especially if it shared between multiple users) will be powned quickly.

]]>
https://grey-panther.net/2008/06/disabling-mod_deflate-for-certain-files.html/feed 4 714
Compressed HTTP https://grey-panther.net/2007/07/compressed-http.html https://grey-panther.net/2007/07/compressed-http.html#comments Tue, 24 Jul 2007 06:26:00 +0000 https://grey-panther.net/?p=842 The HTTP standard allows for the delivered content to be compressed (to be more precise it allows for it to be encoded in different ways, one of the encoding being compression). Under Apache there are two simple ways to do this:

I won’t spend much detail on the configuration options, however I want to describe one little quirk, which is logical in hindsight but I struggled with it a little: you can loose the Content-Length header on files which don’t fit in your compression buffer from the start. This is course logical because:

  • Headers must be sent before the content
  • If the server must do several read from file – compress – output cycles to compress the whole file, it can’t possibly predict accurately (to the byte level) how large / small the compressed version of the file will be. Getting it wrong is risky because client software might rely on this and could lock up in a wait cycle or display incomplete data.

Update: if you want to selectively disable mod_deflate for certain files because of this (or other reasons), check out this post about it.

You can observe this effect when downloading (large) files especially, since the absence of a Content-Length header means that the client can’t show a progress bar indicating the percentage you downloaded (this is what I observed at first and then went on to investigate the causes).

One more remark regarding the getting the Content-Length wrong part. One (fairly) common case where this can be an issue is with PHP scripts which output Content-Length headers and the compression is done via zlib.output_compression. The problem is that mod_php doesn’t remove the Content-Length header, which almost certainly has a larger value than the size of the compressed data. This causes the hanging, incomplete downloads symptom. To be even more confusing:

  • When using HTTP/1.1 and keep-alive this problem manifest itself.
  • When keep-alive is inactive, the problem disapears (sort-of). What actually happens is that the Content-Length is still wrong, but the actual connection is reset by the server after sending all the data (since no keep-alive = one request per connection). This usually works with clients (both curl and Firefox interpreted it as download complete), but other client software might chose to interpret the condition as failed/corrupted download.

The possible solutions would be:

  • Perform the compression inside your PHP script (possibly caching the compressed version on-disk if it makes sense) and output the correct (ie. the one corresponding to the compressed data) Content-Length header. This is more work, but you will retain the progress-bar when downloading files
  • Use mod_deflate to perform the compression, which removes the Content-Length header if it can’t compress the whole data at once (this is not specified in the documenation, but – the beauty of open source – you can take a peak at the source code – the ultimate documentation. Just search for apr_table_unset(r->headers_out, "Content-Length"); ). This will kill the progress bar (for the reasons discussed before). To get back the progress bar, you could increase the DeflateBufferSize configuration parameter (which is by default set to 8k) to be larger than the largest file you wish to serve, or deactivate compression for the files which will be downloaded (rather than displayed).

A final remark: the HTTP protocol also supports the uploaded data to be compressed (this can useful for example when uploading larger files), as shown by the following blurb in the mod_deflate documentation:

The mod_deflate module also provides a filter for decompressing a gzip compressed request body. In order to activate this feature you have to insert the DEFLATE filter into the input filter chain using SetInputFilter or AddInputFilter.

Now if a request contains a Content-Encoding: gzip header, the body will be automatically decompressed. Few browsers have the ability to gzip request bodies. However, some special applications actually do support request compression, for instance some WebDAV clients.

When I saw this, I was ecstatic, since I was searching for something like this for some of my projects. If this works, it means that I can:

  • Use a protocol (HTTP) for file upload which has libraries in many programming languates
  • Use a protocol which needs only one port (as opposed to FTP) and can be made secure if necessary (with SSL/TLS)
  • Use compression, just like rsync can (and, although it can’t create binary diffs on its own, when the uploaded files are not used for synchronization, this is not an issue)

Obviously there must be some drawbacks 🙂

  • It seems to be an Apache-only feature (I didn’t find anything which could indicate support in IIS or even some clear RFC to document how this should work)
  • It can’t be negociated! This is huge drawback. When the server side compression is used, the process is the following:
    • The client sends an Accept-Encoding: gzip header along with the request
    • The server checks for this header and if present, compresses the content (minus the time, when the client doesn’t really support the compression)

    However, the fact that the client is the first to send, means that there is no way for the server to signal its (in)capability to accept gzip encoding. Even the fact that it’s Apache and previously served up compressed content doesn’t guarantee the fact that it can handle it, since the input and output filters are two separate things. So the options available are:

    • Use gzip (eventually preceding it with a heuristic detection like the one described before – is it Apache and does it serve up gzip compressed content), and if the server returns an error code, try without gzip
    • The option which I will take – use this only with your own private servers where you configured them properly.

So how do you do it? Here is a blurb, again from the mod_deflate source code: only work on main request/no subrequests. This means that the whole body of the request must be gzip compressed if we chose to use this, it is not possible to compress only the part containing the file for example in a multipart request. Below you can see some perl code I hacked together to use this feature:

#!/usr/bin/perl
use strict;
use warnings;
use File::Temp qw/tempfile/;
use Compress::Zlib;
use HTTP::Request::Common;
use LWP;

$HTTP::Request::Common::DYNAMIC_FILE_UPLOAD = 1;

my $request = POST 'http://10.0.0.12/test.php',
    [
 'test1' => 'test1',
 'test2' => 'test2',
 'a_file' => ['somedata.dat']
    ],
    'Content_Type' => 'form-data',
    'Content_Encoding' => 'gzip';

sub transform_upload {
    my $request = shift;
    
    my ($fh, $filename) = tempfile();
    my $cs = gzopen($fh, "wb");
    
    my $request_c = $request->content();
    while (my $data = $request_c->()) { $cs->gzwrite($data); }
    $cs->gzclose();
    close $fh;
    
    open $fh, $filename; binmode $fh;
    $request->content(sub {
 my $buffer;
 if (0 < read $fh, $buffer, 4096) {
     return $buffer;
 } else {
     close $fh;
     return undef;
 }
    });
    $request->content_length(-s $filename);
}

transform_upload($request);

my $browser = LWP::UserAgent->new();
my $response = $browser->request($request);

print $response->content();

This code is optimized for big files, meaning that it won’t read the whole request in the memory at one time. Hope somebody finds it useful.

]]>
https://grey-panther.net/2007/07/compressed-http.html/feed 7 842