c – Grey Panthers Savannah

Is hand-writing assembly still necessary these days?

gpanther — Sun, 06 Feb 2011 07:14:00 +0000

Some time ago I came over the following article: Fast CRC32 in Assembly. It claimed that the assembly implementation was faster than the one implemented in C. Performance was always something I’m interested in, so I repeated and extended the experiment.

Here are the numbers I got. This is on a Core 2 Duo T5500 @ 1.66 Ghz processor. The numbers express Mbits/sec processed:

The assembly version from the blogpost (table taken from here): ~1700
Optimized C implementation (taken from the same source): ~1500. The compiler used was Microsoft Visual C++ Express 2010
Unoptimized C implementation (ie. Debug build): ~900
Java implementation using polynomials: ~100 (using JRE 1.6.0_23)
Java implementation using table: ~1900
Built-in Java implementation: ~1700
Javascript (for the fun of it) implementation (using the code from here with optimization – storing the table as numeric rather than string) on Firefox 4.0 Beta 10: ~80
Javascript on Chrome 10.0.648.18: ~40
(No IE9 test – they don’t offer it for Windows XP)

Final thoughts:

Hand coding assembly is not necessary in 99.999% (then again 80% of all statistics are made up :-p). Using better tools or better algorithms (see the “Java table based” vs. “Java polynomial”) can give just as good of performance improvement. Maintainability and portability (almost always) trump performance
Be pragmatic. Are you sure that your performance is CPU bound? If you are calculating a CRC32 of disk files, a gigabit per second is more than enough
Revisit your assumptions periodically (especially if you are dealing with legacy code). The performance characteristics of modern systems (CPUs) differ enormously from the old ones. I would wager that on an old CPU with little cache the polynomial version would have performed much better, but now that we have CPU caches measured in MB rather than KB the table one performs much better
Javascript engines are getting better and better.

Some other interesting remarks:

The source code can be found in my repo. Unfortunately I can’t include the C version since I managed to delete it by mistake
The file used to benchmark the different implementations was a PDF copy of the Producing Open Source Software book
The HTML5 implementation is surprisingly inconsistent between Firefox and Chrome, so I needed to add the following line to keep them both happy: var blob = file.slice ? file.slice(start, len) : file;
The Javascript code doesn’t work unless it is loaded via the http(s) protocol. Loading it from a local file gives “Error no. 4”, so I used a small python webserver
Javascript timing has some issues, but my task took longer than 15ms, so I got stable measurements
The original post mentions a variation of the algorithm which can take 16 bits at one (rather than 8) which could result in a speed improvement (and maybe it can be extended to 32 bits)
Be aware of the “free” tools from Microsoft! This article would have been published sooner if it wasn’t for the fact MSVC++ 2010 Express require an online registration and when I had time I had no Internet access!
Update: If you want to run the experiment with GCC, you might find the following post useful: Intel syntax on GCC

Picture taken from the TheGiantVermin’s photostream with permission.

Two new challenges

gpanther — Fri, 02 Oct 2009 13:47:00 +0000

Well, new for me at least…

The first one is 0x41414141.com. Just go to the site and you can start directly. As far as I know, this is not time-bound.

The second one is spargecoduasta.com (“break this code”). It is put up by BitDefender and I don’t know if it has a time limit. The levels I’ve seen seem to focus on C/C++. It is available in both Romanian and English.

Finally, a little off-topic, but still a challenge: The Science Knowledge Quiz – with the tagline “Are you more science-savvy than the average American?”. Via Pat’s Daily Grind (I’ve got 11 out of the 12).

Have fun!

Is Java slower than C? (and does it matter?)

gpanther — Tue, 05 May 2009 14:35:00 +0000

Via daniel’s blog (the creator of curl) I arrived to this page: why the Java implementation (JGit) doesn’t run nearly as fast as the C implementation. The short version of it is: even after many tunings JGit is twice as slow as the C implementation. One of the problems which got my attention, was the different ways a SHA-1 sum got sliced and diced. So I’ve done a microbenchmark and here are my (not very scientific) results:

The fastest way to compare two SHA-1 sums in Java (that I found) was to use its string representation. I’ve tried cramming the hash in Unicode characters (two bytes per character) and byte arrays. The first was only slightly slower, while the second was orders of magnitude slower (~15x slower)
Compared to the naive C implementation (using strcmp over the string representation) the Java solution was 100x times (!) slower

What is the end-conclusion? Yes, Java is slower. This is an extreme case of course (amongst other problems, the test ran for very short period of times and possibly the JIT didn’t kick in) and in real life the performance loss is much smaller. In fact the email linked above talks about a 2x performance loss and 2x bigger memory consumption. What it doesn’t talk about however, is the number of bugs (of the “walk all over your memory and you are scratching your head” kind) in the C implementation versus the Java implementation. In my opinion:

The speed of Java is “good enough”. In fact it is (orders of magnitude) better than many other high-level languages which are widely used (like PHP, Perl, Python, Ruby).
Yes, you can implement things in C, but you will do it in 10x the time with 10x the bugs and probably go mad (unless your aim is job security rather than getting work done)
There is an incredible amount of work going into improving the performance of the JVM. Check out this episode from the Java Posse (great podcast btw!) if you are interested in the subject
Always profile before deciding that you need to optimize a certain part of your code. Humans are notoriously bad at guessing the bottlenecks
“Good enough” means “good enough”. Ok, so the Java implementation was a 100 times slower. Still, it managed to compare over 10 million (that is 10^7) hashes in one second! I find it hard to believe that the main bottleneck in a source-code versioning system this is the comparing of hashes (or the CPU more generally). Even my crappy CVS saturates the disk I/O over a high latency VPN.
Related to the above point: set (realistic) goals and don’t obsess about the fact that you could be “doing better”. For example: it needs to render the HTML page in less than 100 ms in 95% of the cases. Could you do it in less tha 50 ms? Maybe, but if 100 ms is good enough, it is good enough.
Finally, after you profiled, you always have the option of reimplementing problematic parts in C if you think that it’s worth your time

Picture taken from Tahmid Munaz’s photostream with permission.

Compiling software for OpenWrt (and creating packages)

gpanther — Wed, 08 Apr 2009 12:21:00 +0000

From my experience, compiling software is not especially hard, but most of the tutorials out there are somewhat dated (as this one will be in 6-7 months). But at least until then it can be useful, and hopefully I will find the time to update it later on. I’m using the trunk version of OpenWrt, which a little more up-to-date than 8.09, but most probably everything described here works with 8.09 (the latest release).

I’ve taken inspiration from the following sources:

The main ideas would be:

The easiest way to start is by copying an existing makefile and editing it to fit your needs
OpenWrt has an advanced build system which does all the following things:
- Download the application from its original source (and verify the integrity of the archive using md5sum)
- Apply some local patches to it
- Configure / build it
However, for local development you most likely won’t need this. An alternative solution (which will be used in the tutorial later on) would be to copy the source to the build directory in the preparation faze.
Makefiles are very sensitive to tabs (so you have to have tabs and not 4 or 8 spaces in certain locations) and also, errors in them are very cryptic (for example "missing separator"). If your build fails, the first thing you should check is that you have your tabs in order. Also verify that your editor doesn’t have some kind of “transform tabs to spaces” option active. For example, if you are using mcedit with the default color-scheme, it will highlight correct tabs in red, as can be seen in the screenshot below.
Also, you might have observed that not all the sections use tabs, some are ok with spaces. However rules for which section should use what are not clear to me, so my recommendation is to stick with tabs everywhere. For a quick make tutorial, you can check out this site. A last word of warning on this matter: copy-pasting from this blogpost will almost certainly mess things up (convert tabs to spaces, etc), so please double check the source after copying it.

Our goal (taken from the first linke tutorial) is to get the following little C program to compile and run:


/****************
* Helloworld.c
* The most simplistic C program ever written.
* An epileptic monkey on crack could write this code.
*****************/
#include 

int main(void)
{
 printf("Hell! O' world, why won't my code compile?nn");
 return 0;
}

The first step is to create a Makefile for it:


# build helloworld executable when user executes "make"
helloworld: helloworld.o
 $(CC) $(LDFLAGS) helloworld.o -o helloworld
helloworld.o: helloworld.c
 $(CC) $(CFLAGS) -c helloworld.c

# remove object files and executable when user executes "make clean"
clean:
 rm *.o helloworld

Place the files in the packages/helloworld/src directory of either the checked-out OpenWrt source or the Openwrt SDK. Now change into this directory and make sure that everything builds on our local system (without the crosscompiling magic):


make
./helloworld <-- this should output the message
make clean   <-- clean up after ourselves

Now to create the openwrt makefile. This will be located one level up (ie. packages/helloworld/Makefile):


#
# Copyright (C) 2008 OpenWrt.org
#
# This is free software, licensed under the GNU General Public License v2.
# See /LICENSE for more information.
#
# $Id$

include $(TOPDIR)/rules.mk

PKG_NAME:=helloworld
PKG_RELEASE:=1

include $(INCLUDE_DIR)/package.mk

define Package/helloworld
 SECTION:=utils
 CATEGORY:=Utilities
 TITLE:=Helloworld -- prints a snarky message  
endef

define Build/Prepare
 mkdir -p $(PKG_BUILD_DIR)
 $(CP) ./src/* $(PKG_BUILD_DIR)/
endef

define Build/Configure
endef

define Build/Compile
 $(MAKE) -C $(PKG_BUILD_DIR) $(TARGET_CONFIGURE_OPTS)
endef

define Package/helloworld/install
 $(INSTALL_DIR) $(1)/bin
 $(INSTALL_BIN) $(PKG_BUILD_DIR)/helloworld $(1)/bin/
endef

$(eval $(call BuildPackage,helloworld))

The makefile should be pretty self-explanatory. A couple things I would like to highlight:

Build/Prepare is the step where we copy our source-code to the build directory. This is a hack to circumvent the need of downloading the tgz file, but it is a hack which works well (you might want to add the –r switch to cp if you have nested directories in src – this isn’t the case for this simple example)
In the Build/Compile step it is very important to include the $(TARGET_CONFIGURE_OPTS) part. Without it, the thing will build, but it will link with the standard libc, rather than the ulibc available on the Openwrt router. Tracking down this error is made harder by the unintuitive error messages. Specifically, you will see something like this on your router: “/bin/ash: The command /bin/helloworld can not be found”, even though you see that the file exists and it has execute permissions! To verify that your issues are caused by this problem, simply too a “less /bin/helloworld” on your router and check to see if you have strings indicating glibc (instead of ulibc).

Now you are ready to compile:

If you are using the SDK, simply go to its root directory and issue the make V=99 command
If you are building the complete tree, you have to first do “make menuconfig”, make sure that your package is checked for build (you should see the letter M near it) and then issue make V=99. Be aware that compiling the full tree can take a considerable time (more than a hour in some cases).

Your package should now be ready. Copy the package (you will find it in the bin/packages/target-... subfolder) to your router (or better yet, a Qemu VM running OpenWrt – for safety) and test that everything works:


scp helloworld.ipkg root@router:/root
[root@router]# opkg install helloworld.ipkg
[root@router]# helloworld <-- the message should be printed

This would be all :-). Because of simplicity, this tutorial doesn’t cover the the calling of configuration scripts. Also, as far as I’ve seen, there is no easy way to include parts of other projects. For example, if I wish to create a package for LuaFileSystem, I would need lua.h (and some other, related files). However, I haven’t found an easy way reference it from the lua package, and have opted for putting a local copy in the src/lua subdirectory.

Picture taken from cantrell.david’s buddy icon with permission.

The Problem with Programming

gpanther — Wed, 29 Nov 2006 19:34:00 +0000

Via Raganworld comes the following interview with Bjarne Stroustrup (you know, the C++ guy :)):

The Problem with Programming

My favorite quote:

There are just two kinds of languages: the ones everybody complains about and the ones nobody uses.

(the article has also a nice threaded discussion possibility added. almost as nice as the one at codeproject)