python – Grey Panthers Savannah

Proxying pypi / npm / etc for fun and profit!

gpanther — Wed, 05 Feb 2014 15:26:00 +0000

Package managers for source code (like pypi, npm, nuget, maven, gems, etc) are great! We should all use them. But what happens if the central repository goes down? Suddenly all your continious builds / deploys fail for no reason. Here is a way to prevent that:

Configure Apache as a caching proxy fronting these services. This means that you can tolerate downtime for the services and you have quicker builds (since you don’t need to contact remote servers). It also has a security benefit (you can firewall of your build server such that it can’t make any outgoing connections) and it’s nice to avoid consuming the bandwidth of those registries (especially since they are provided for free).

Without further ado, here are the config bits for Apache 2.4

/etc/apache2/force_cache_proxy.conf – the general configuration file for caching:


# Security - we don't want to act as a proxy to arbitrary hosts
ProxyRequests Off
SSLProxyEngine On
 
# Cache files to disk
CacheEnable disk /
CacheMinFileSize 0
# cache up to 100MB
CacheMaxFileSize 104857600
# Expire cache in one day
CacheMinExpire 86400
CacheDefaultExpire 86400
# Try really hard to cache requests
CacheIgnoreCacheControl On
CacheIgnoreNoLastMod On
CacheStoreExpired On
CacheStoreNoStore On
CacheStorePrivate On
# If remote can't be reached, reply from cache
CacheStaleOnError On
# Provide information about cache in reply headers
CacheDetailHeader On
CacheHeader On
 
# Only allow requests from localhost

        Order Deny,Allow
        Deny from all
        Allow from 127.0.0.1

 

        # Don't send X-Forwarded-* headers - don't leak local hosts
        # And some servers get confused by them
        ProxyAddHeaders Off


# Small timeout to avoid blocking the build to long
ProxyTimeout    5

Now with this prepared we can create the individual configurations for the services we wish to proxy:

For pypi:


# pypi mirror
Listen 127.1.1.1:8001


        Include force_cache_proxy.conf

        ProxyPass         /  https://pypi.python.org/ status=I
        ProxyPassReverse  /  https://pypi.python.org/

For npm:


# npm mirror
Listen 127.1.1.1:8000


        Include force_cache_proxy.conf

        ProxyPass         /  https://registry.npmjs.org/ status=I
        ProxyPassReverse  /  https://registry.npmjs.org/

After configuration you need to enable the site (a2ensite) as well as needed modules (a2enmod – ssl, cache, disk_cache, proxy, proxy_http).

Finally you need to configure your package manager clients to use these endpoints:

For npm you need to edit ~/.npmrc (or use npm config set) and add registry = http://127.1.1.1:8000/

For Python / pip you need to edit ~/.pip/pip.conf (I recommend having download-cache as per Stavros’s post):


[global]
download-cache = ~/.cache/pip/
index-url = http://127.1.1.1:8001/simple/

If you use setuptools (why!? just stop and use pip :-)), your config is ~/.pydistutils.cfg:


[easy_install]
index_url = http://127.1.1.1:8001/simple/

Also, if you use buildout, the needed config adjustment in buildout.cfg is:


[buildout]
index = http://127.1.1.1:8001/simple/

This is mostly it. If your client is using any kind of local caching, you should clear your cache and reinstall all the dependencies to ensure that Apache has them cached on the disk. There are also dedicated solutions for caching the repositories (for example devpi for python and npm-lazy-mirror for node), however I found them somewhat unreliable and with Apache you have a uniform solution which already has things like startup / supervision implemented and which is familiar to most sysadmins.

Passing UTF-8 trough HTTP

gpanther — Wed, 29 May 2013 17:36:00 +0000

These days we should write every code as if it will be used by international people with a wide variety of personal information (just look at Falsehoods Programmers Believe About Names for some headscratchers). I would like to do add my small contribution to this by showing how UTF-8 encoded strings can be passed into GET/POST parameters.

For this I’ll be using the following small PHP script, which can be quickly run by the command line PHP webserver added in PHP 5.4:



GETs: 
POSTs:

We’ll test this with the following Python script:


#!/usr/bin/python
# vim: set fileencoding=utf-8 :
import urllib
import urllib2

params = {'name': u'東京'}
params = { k: v.encode('utf-8') for k, v in params.iteritems() }
data = urllib.urlencode(params)

url = 'http://localhost:8000/?' + data
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)

print response.read()

This all works well and nicely, so here are some conclusions:

GET and POST variables need to be UTF-8 encoded after which they need to be urlencoded (“% encoded”). See this StackOverflow answer.
Based on the same answer: hostnames use Punycode instead (but we are not concerned with hostnames here)
You might need to add the following header for POST requests to work: “Content-Type: application/x-www-form-urlencoded; charset=UTF-8”
Failing to observe this sequence leads to an UnicodeEncodeError in urllib.urlencode

Converting datetime to UTC in python

gpanther — Thu, 07 Feb 2013 17:06:00 +0000

So you need to convert a python datetime object which has a timezone set (“aware” in the Python nomenclature) to an UTC one with no timezone set (“naive”), for example because NDB on GAE can’t store anything else. The solution will look something like this:

date = date.astimezone(tz.tzutc()).replace(tzinfo=None)

For searcheability: the exception thrown by NDB if you fail to do this is “NotImplementedError: DatetimeProperty updated_at can only support UTC. Please derive a new Property to support alternative timezones.”

GeekMeet talk about Google App Engine

gpanther — Sat, 27 Oct 2012 19:45:00 +0000

The GAE presentation I’ve given for the 12th edition of Cluj Geek Meet can be found here (created using reveal.js).

You can find the source code here.

Running pep8 and pylint programatically

gpanther — Sun, 19 Aug 2012 12:12:00 +0000

Having tools like pep8 and pylint are great, especially given the huge amount of dynamism involved in Python – which results in many opportunities to shooting yourself in the foot. Sometimes however you want to invoke these tools in more specialized ways, for example only on the files which changed since the last commit. Here is how you can do this from a python script and capture their output for later post-processing (maybe you want merge the output from both tools, or maybe you want to show only the lines which changed since the last commit, etc):


import pep8
try:
  sys.stdout = StringIO()
  pep8_checker = pep8.StyleGuide(config_file=config, format='pylint')
  pep8_checker.check_files(paths=[ ...path to files/dirs to check... ])
  output = sys.stdout.getvalue()
finally:
  sys.stdout = sys.__stdout__

from pylint.lint import Run
from pylint.reporters.text import ParseableTextReporter

reporter = ParseableTextReporter()
result = StringIO()
reporter.set_output(result)
Run(['--rcfile=pylint.config'] + [ ...files.., ], reporter=reporter, exit=False)
output = result.getvalue()

It is recommended that you use pylint/pep8 installed trough pip/easy_install rather than the Linux distribution repositories, since they are known to contain outdated software. You can check for this via code like the following:


if pkg_resources.get_distribution('pep8').parsed_version < parse_version('1.3.3'):
    logging.error('pep8 too old. At least version 1.3.3 is required')
    sys.exit(1)
if pkg_resources.get_distribution('pylint').parsed_version < parse_version('0.25.1'):
    logging.error('pylint too old. At least version 0.25.1 is required')
    sys.exit(1)

Finally, if you have to use an old version of pep8, the code needs to be modified to the following (however, this older version probably won't be of much use and will most likely annoy you - you should really try to use an up-to-date version - for example you could isolate this version using virtualenv):


result = []
import pep8
pep8.message = lambda msg: result.append(msg)
pep8.process_options(own_code)
for code_dir in [ ...files or dirs... ]:
    pep8.input_dir(code_dir)

Clearing your Google App Engine datastore

gpanther — Wed, 15 Aug 2012 07:08:00 +0000

Warning! This is a method to erase the data from your Google App Engine datastore. There is no way to recover your data after you go trough with this! Only use this if you’re absolutely certain!

If you have a GAE account used for experimentation, you might like to clean it up sometimes (erase the contents of the datastore and blobstore associated with the application). Doing this trough the admin interface can become very tedious, so here is an alternative method:

Start your Remote API shell

Use the following code to delete all datastore entities:


while True: keys=db.Query(keys_only=True).fetch(500); db.delete(keys); print "Deleted 500 entries, the last of which was %s" % keys[-1].to_path()

Use the following code to delete all blobstore entities:


from google.appengine.ext.blobstore import *
while True: list=BlobInfo.all().fetch(500); delete([b.key() for b in list]);  print "Deleted elements, the last of which was %s" % list[-1].filename

The above method is inspired by this stackoverflow answer, but has the advantage that it does the deletion in smaller steps, meaning that the risk of the entire transaction being aborted because of deadline exceeded or over quota errors is removed.

Final caveats:

This can be slow
This consumes your quota, so you might have to do it over several days or raise your quota
The code is written in a very non-pythonic way (multiple statements on one line) for the ease of copy-pasting

Using Jython from Maven

gpanther — Thu, 13 Oct 2011 15:16:00 +0000

This blogpost was originally posted to the Transylvania JUG blog.

On the surface it looks simple: just add the dependency and you can run the example code.

However what the jython artifact doesn’t get you are the standard python libraries like re. This means that as soon as you try to do something like the code below, it will error out:

PythonInterpreter interp = new PythonInterpreter();
try {
  interp.exec("import re");
} 
catch (PyException ex) {
  ex.printStackTrace();
}

The solution? Use the jython-standalone artifact which includes the standard libraries. An other advantage is that it has the latest release (2.5.2) while jython lags two minor revisions behind (2.5.0) in Maven Central. A possible downside is the larger size of the jar.


    org.python
    jython-standalone
    2.5.2

Detecting the Metasploit encryptors in one hour and 49 lines of Python

gpanther — Thu, 30 Jul 2009 15:32:00 +0000

I’ve seen a lot of blogpostings lately which proclaim that Metasploit payloads encrypted with one of the available encryptors and written into an executable file are somewhat “magically” capable of bypassing AV software (these posts usually contain a couple of VirusTotal links to demonstrate the point). The main scenario considered (from what I gather) is the following: you prepare a connect-back shell and then you convince the target of your penttest to run it (you email it to them, you put it on an USB stick, etc) and you get access to their machine. The AV aspect comes into the picture when you consider that the target has such software running on their system.

So I said: detecting it can’t be that hard! And generated all the combination of payloads and encoders (plus some triple encoded ones – since this also seems to be considered “a better way” to hide the payloads) and written up the following python script using pefile and pydasm:


import pefile, pydasm
import sys, glob, operator, re

def countFInstr(buffer):
  offset = 0
  fpoint = 0
  rx = re.compile("0x[0-9a-f]+")
  while offset < len(buffer): 
    i = pydasm.get_instruction(buffer[offset:], pydasm.MODE_32) 
    instr = pydasm.get_instruction_string(i, pydasm.FORMAT_INTEL, 0)
    if (instr and rx.search(instr)): fpoint += 1
    if not i:
      offset += 1
    else:
      offset += i.length
  return fpoint

def scan(filename):
  try:
    pe = pefile.PE(filename, fast_load=True)
  except pefile.PEFormatError:
    return False
  execSectionSize = 0
  foundRWXSection = False
  rw = 0x40000000 | 0x80000000L
  for section in pe.sections:
    if (0 == section.Characteristics & 0x20000000): continue
    execSectionSize += section.SizeOfRawData
    if (rw == section.Characteristics & rw):
      # print section.Name
      buffer = section.get_data(section.VirtualAddress, 128) 
      # for c in buffer: print "%#x" % ord(c),
      # print ""
      # print countFInstr(buffer)
      if (countFInstr(buffer) < 16): return False
      if (len(buffer) < 128): return False
      foundRWXSection = True
  if (not foundRWXSection): return False
  if (execSectionSize > 4096): return False
  return True

sys.argv = reduce(operator.add, map(glob.glob, sys.argv))

for filename in sys.argv[1:]:
  print filename, " ",
  if scan(filename):
    print "Metasploit!"
  else:
    print "-"

It has a detection rate of 100% and a false positive rate of 0% (although I didn’t have access to executable files packed with more “exotic” packers which would have given me a more accurate FP rate – even so I consider that the detection method is not really prone to false positives).

So how does it work? What does it take for it to say “Metasploit”?

The executable must have at least one section marked with Read/Write/Execute (typical for packers)
The beginning of the given section (the first 128 bytes) must contain at least 16 instructions with hardcoded constants (immediate instructions)
The total number of raw data loaded into executable sections must be less than 4k

But wait! – you might say – you are not detecting the actual payload! You are detecting some particular characteristics of the file which are relatively easy to change! And my reply is: correct. But discussion about the “correct” way of doing things is a philosophical one as long as the presented solution has a low FN/FP rate and is efficient. You might get into an argument about how “future proof” it is, but then again, most AV products are black-boxes and it wouldn’t be so straight forward to find the particular detection algorithm and then circumvent it.

An other thing I remarked is that the given code doesn’t try to defend against emulators (for example by doing multiple loops, calling different windows API’s, etc). While the code is sufficiently complicate to create a problem for IDS’s, AV software which has emulation capability (and almost all of the “big guys” and even many of the smaller guys do) will go trough the decryptor like a hot knife trough butter.

So why then doesn’t AV detect these executables? Because they occur in very low numbers, and unfortunately today AV is a numbers game.

Please, the next time you p0wn the client with a metasploit-payload-executable, don’t say “AV is worthless”. Rather say: “this demonstrates what an undetected malware can do, so you should use multiple layers of defense”.

Picture taken from fazen’s photostream with permission.

Negative zero – what is it?

gpanther — Fri, 19 Dec 2008 15:30:00 +0000

Computers have two ways of representing numbers:

One is called sign and magnitude – usually you have one bit specifying the sign (again, most of time you have 0 for positive and 1 for negative) and the rest of the bits specify the absolute value (“magnitude”) of the number.
The other is ordering the numbers from the lowest to the highest (or the other way around) and specifying an index in this ordering – two’s complement is for an example for this system, although it also has some nifty properties with regards to the arithmetic operations.

In the first case we can have a “+0” and a “-0” value. Now I’m no mathematician, so I checked the sources of knowledge :-). From the Mathworld article on Zero:

It is the only integer (and, in fact, the only real number) that is neither negative nor positive.

Furthermore, we have the following definition for the sign function:

The sign of a real number, also called sgn or signum, is -1 for a negative number (i.e., one with a minus sign “-“), 0 for the number zero, or +1 for a positive number (i.e., one with a plus sign “+”). In other words, for real x,

These lead me to believe that -0 and +0 are just an artifact of how we represent numbers in computers, and in fact they are one and the same entity. An additional proof is that IEEE 754 (the standard defining floating point representations – the most widely used sign and magnitude method to represent numbers) says in the standard:

5.11 Details of comparison predicates
…
Comparisons shall ignore the sign of zero (so +0 = −0)

So far, so good, right? Java has a small catch however:

Even though -0.0 == 0.0, Double.valueOf(-0.0).compareTo(Double.valueOf(0.0)) is not zero (ie, the two objects are not equal)! This has wideraging implicatitions, one of the biggest being that if you use hashmaps or similar structrures with a Double key (given that you can’t use double, because it isn’t an object), they will show up as distinct entries! This may or may not be with what you want! One must mention that this behavior is clearly documented in the Java docs:

0.0d is considered by this method to be greater than -0.0d.

Then again, one must wonder how many people have read this document before running into the problem

Contrasting with a few other programming languages:

From the few tests I’ve done, it seems that .NET implements Double more intuitively (ie. 0 == Double.Parse("0.0").CompareTo(Double.Parse("-0.0"))). This behavior is also consistent in collections (ie. they map to the same key in dictionaries), even though, when printed out, the two objects display the original signs. There also seems to be a (somewhat) complicated way to determine whether the given 0 is or is not zero.
PHP (even though it doesn’t have the same boxing / unboxing features) is consistent with the way .NET handles the situation: it prints -0 / 0 respectively, but they compare as equal and are considered the same key in associative arrays.
In Perl, we have a behavior closer to Java: they compare as equal (again, no autoboxing), but in hashes they act as different keys.
Python is again closer to .NET (they compare as equal and are considered the same key in associate arrays.
Javascript also behaves the way .NET does (although there might be differences between the JS engine implementations of different browsers – I only tested it in FF3).
Ruby and Smalltalk are left as exercises to the reader (they should be interesting, since they both treat numbers as first class objects, meaning – that in a way – they are closer to Java or .NET than the other languages mentioned)

There are justifications for both approaches. On the one site, it is intuitive that -0 == +0, and breaking this expectation can introduce subtle errors in the programs. On the other side, the two objects are different (for example if you print them out, one will display -0.0 and the other 0.0) so (from this point of view) it is justified that they are not equal. Just make sure that you take this into account.

Some further reading:

The hard edges of Python

gpanther — Thu, 05 Jun 2008 15:08:00 +0000

I’ve been playing around with Python (mostly because pefile is written in it) and got very annoyed with the whole white-space as a control structure. In theory it all sounds great: you write beautiful code and it just works. However in practice I find this approach lacking in at least two ways:

When moving code around (inside of the same file or between files), many times I got in a situation when the indentation looked alright visually, but the interpreter complained. (Then again, I guess it could have been worse: the interpreter could have silently accepted the line(s), but attached it to the wrong block)
When I need to step back (decrease the indentation, instead of increasing it) it seems that it is much harder identifying the level of indentation needed. My theory about this is that while you increase your indentation level usually by one (thus it is easy to follow), it is not uncommon to decrease it by several levels at once (which is much harder to follow). In C-like I find that the following method works reasonably well: when I start a block, I immediately place the ending marker (bracket) on the corresponding version. Then the bracket, combined with support in the editor for highlighting bracket pairs, gives a very good indication of my position.

Some other (mild) annoyances:

Python seems to have this idea of interpreting everything as late as possible. Again, this sounds nice, but when I wait 30 minutes just to find out that I forgot to import some module and a function is missing, it makes me ask: where is my Perl strict mode?

This issue also seems to be related to the dynamism: you can’t use a function until you defined it (which sound ok, but here is the kicker) even if it is in the same file! Ok, Pascal was great and I loved Delphi, but grow up already. The parser went through the file, it knows that the function exists, now let me use it!

The python debugger doesn’t have a command to inspect the contents of a class (something like ‘x’ in the Perl debugger). You have p (for print) and pp (for pretty print), but both of those print out something along the lines of class F at 0xblahblah. So here are some things I found useful:

Most of the sites have a very weird attitude. They suppose that you would want to add code to your source to debug it (!??). If I have to insert code in my file, I just do a bunch of print statements and be done with it. Fortunately the documentation mentions (although very briefly) that you can debug a script by running it as follows:python -m pdb myscript.py
You can find the debugger commands also in the documentation. The one I found very useful was the alias example:alias pi for k in %1.__dict__.keys(): print "%1.",k,"=",%1.__dict__[k]

which creates a new command named si which you can use to inspect the class elements (see my earlier complaint with regards to p and pp)
Although the si command/alias is very useful, it can screw up the terminal badly if the class contains variables with binary data. In this case you are better off printing only the key names.

Update: before I forget – the implementation of pack/unpack was very annoying also. Why reinvent the wheel when there are very good implementations already? And why not include an arbitrarily many modifier? Why do I have to use a custom function (which must be declared before hand)?