programming languages – Grey Panthers Savannah https://grey-panther.net Just another WordPress site Thu, 27 Aug 2009 15:04:00 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 206299117 Perl is everywhere! https://grey-panther.net/2009/08/perl-is-everywhere.html https://grey-panther.net/2009/08/perl-is-everywhere.html#respond Thu, 27 Aug 2009 15:04:00 +0000 https://grey-panther.net/?p=219

Something which is not appreciated enough IMHO is just how much of the interwebs runs on Perl: for example Frozen Bubble is written in Perl. Also, from some error messages I’ve got the impression Yahoo Pipes uses (is written in?) Perl.

And just before you accuse me of being a Perl fanboy (which I am BTW :-p), here is a fascinating document: Far More Than Everything You’ve Ever Wanted to Know about.

]]>
https://grey-panther.net/2009/08/perl-is-everywhere.html/feed 0 219
Patching lcc-win32 so that it runs under Windows 2000 https://grey-panther.net/2009/07/patching-lcc-win32-so-that-it-runs-under-windows-2000.html https://grey-panther.net/2009/07/patching-lcc-win32-so-that-it-runs-under-windows-2000.html#respond Fri, 31 Jul 2009 10:37:00 +0000 https://grey-panther.net/?p=243 lcc-win32 is a small C (not C++!) for Windows, which comes with a simple editor/IDE. It is free for non-commercial use and is small and quick to install. Unfortunately it wouldn’t start on a fully patched Windows 2000 SP4 box, even though the homepage explicitly mentions Windows 2000 as supported. The problem was that my system, for whatever reason, had an older version of SHELL32.DLL, which didn’t contain/export a required method. So I patched the executable and redirected the given import to an other import (ie. the loader would use a different import) and NOP-ed out the code which used the given import (fortunately it was used only in a single location, which wasn’t critical). Below you can see a video of the process:

The error message:

wedit.exe - Entry Point Not Found

The procedure entry point SHGetFolderPathAndSubDirW could not be located in the dynamic link library SHELL32.DLL

Tools used:

Here is the script which is shown in the background:

  • Patching lcc-win32 so that it runs under Windows 2000
  • We try to run the editor and we see that it (in fact the windows loader) errors out saying that it can’t find a given export in SHELL32.DLL
  • Bonus tip: you can copy the contents of a message box by pressing Ctrl+C when given focus.
  • Ok, we open up the executable in IDA to asses the situation (we already generated the idb file to speed up the demo)
  • Using cross-references we see that it is only used in one place, and even that doesn’t seem crucial.
  • So we edit the IAT of wedit.exe so that it imports an other function instead of the original one (so that it loads).
  • For safety we NOP out the call code. We must NOP out the pushing of the parameters and the call to keep the stack in sync.
  • Finally we test that everything works.
  • Thank you for your attention!

So you see, things can be fixed, even when you don’t have access to the source code, but it is nicer (and less complicated) when you do. Hopefully this will help somebody out 🙂

]]>
https://grey-panther.net/2009/07/patching-lcc-win32-so-that-it-runs-under-windows-2000.html/feed 0 243
Is Java slower than C? (and does it matter?) https://grey-panther.net/2009/05/is-java-slower-than-c-and-does-it-matter.html https://grey-panther.net/2009/05/is-java-slower-than-c-and-does-it-matter.html#comments Tue, 05 May 2009 14:35:00 +0000 https://grey-panther.net/?p=304 2471828485_c97a2e83a8_b Via daniel’s blog (the creator of curl) I arrived to this page: why the Java implementation (JGit) doesn’t run nearly as fast as the C implementation. The short version of it is: even after many tunings JGit is twice as slow as the C implementation. One of the problems which got my attention, was the different ways a SHA-1 sum got sliced and diced. So I’ve done a microbenchmark and here are my (not very scientific) results:

  • The fastest way to compare two SHA-1 sums in Java (that I found) was to use its string representation. I’ve tried cramming the hash in Unicode characters (two bytes per character) and byte arrays. The first was only slightly slower, while the second was orders of magnitude slower (~15x slower)
  • Compared to the naive C implementation (using strcmp over the string representation) the Java solution was 100x times (!) slower

What is the end-conclusion? Yes, Java is slower. This is an extreme case of course (amongst other problems, the test ran for very short period of times and possibly the JIT didn’t kick in) and in real life the performance loss is much smaller. In fact the email linked above talks about a 2x performance loss and 2x bigger memory consumption. What it doesn’t talk about however, is the number of bugs (of the “walk all over your memory and you are scratching your head” kind) in the C implementation versus the Java implementation. In my opinion:

  • The speed of Java is “good enough”. In fact it is (orders of magnitude) better than many other high-level languages which are widely used (like PHP, Perl, Python, Ruby).
  • Yes, you can implement things in C, but you will do it in 10x the time with 10x the bugs and probably go mad (unless your aim is job security rather than getting work done)
  • There is an incredible amount of work going into improving the performance of the JVM. Check out this episode from the Java Posse (great podcast btw!) if you are interested in the subject
  • Always profile before deciding that you need to optimize a certain part of your code. Humans are notoriously bad at guessing the bottlenecks
  • “Good enough” means “good enough”. Ok, so the Java implementation was a 100 times slower. Still, it managed to compare over 10 million (that is 10^7) hashes in one second! I find it hard to believe that the main bottleneck in a source-code versioning system this is the comparing of hashes (or the CPU more generally). Even my crappy CVS saturates the disk I/O over a high latency VPN.
  • Related to the above point: set (realistic) goals and don’t obsess about the fact that you could be “doing better”. For example: it needs to render the HTML page in less than 100 ms in 95% of the cases. Could you do it in less tha 50 ms? Maybe, but if 100 ms is good enough, it is good enough.
  • Finally, after you profiled, you always have the option of reimplementing problematic parts in C if you think that it’s worth your time

Picture taken from Tahmid Munaz’s photostream with permission.

]]>
https://grey-panther.net/2009/05/is-java-slower-than-c-and-does-it-matter.html/feed 3 304
Alternative regular expression syntax https://grey-panther.net/2009/03/alternative-regular-expression-syntax.html https://grey-panther.net/2009/03/alternative-regular-expression-syntax.html#respond Fri, 27 Mar 2009 12:29:00 +0000 https://grey-panther.net/?p=335 4400462_78ec99af2c_oFor a long time I was a believer in the “Perl way” of doing regular expressions and an avid reader of perlre. All other implementations I viewed as a “poor man’s copy” of the one true idea.

However, after reading the Lua Patterns Tutorial, I found it quite enlightening. Even though it is called “patterns” and not “regular expressions”, it is a very similar concept. The very nice touch is that it uses % as escape character rather than (like in PCRE). For example, to represent a digit you would say %d instead of d, a syntax which I suppose is familiar to a larger audience of programmers (everybody who used the printf / scanf family of functions). An excellent idea!

Check out the complete reference (or the wiki) for more details.

Picture taken from Uqbar is back’s photostream with permission.

]]>
https://grey-panther.net/2009/03/alternative-regular-expression-syntax.html/feed 0 335
Negative zero – what is it? https://grey-panther.net/2008/12/negative-zero-what-is-it.html https://grey-panther.net/2008/12/negative-zero-what-is-it.html#respond Fri, 19 Dec 2008 15:30:00 +0000 https://grey-panther.net/?p=522 Computers have two ways of representing numbers:

  • One is called sign and magnitude – usually you have one bit specifying the sign (again, most of time you have 0 for positive and 1 for negative) and the rest of the bits specify the absolute value (“magnitude”) of the number.
  • The other is ordering the numbers from the lowest to the highest (or the other way around) and specifying an index in this ordering – two’s complement is for an example for this system, although it also has some nifty properties with regards to the arithmetic operations.

In the first case we can have a “+0” and a “-0” value. Now I’m no mathematician, so I checked the sources of knowledge :-). From the Mathworld article on Zero:

It is the only integer (and, in fact, the only real number) that is neither negative nor positive.

Furthermore, we have the following definition for the sign function:

The sign of a real number, also called sgn or signum, is -1 for a negative number (i.e., one with a minus sign “-“), 0 for the number zero, or +1 for a positive number (i.e., one with a plus sign “+”). In other words, for real x,

These lead me to believe that -0 and +0 are just an artifact of how we represent numbers in computers, and in fact they are one and the same entity. An additional proof is that IEEE 754 (the standard defining floating point representations – the most widely used sign and magnitude method to represent numbers) says in the standard:

5.11 Details of comparison predicates

Comparisons shall ignore the sign of zero (so +0 = −0)

So far, so good, right? Java has a small catch however:

Even though -0.0 == 0.0, Double.valueOf(-0.0).compareTo(Double.valueOf(0.0)) is not zero (ie, the two objects are not equal)! This has wideraging implicatitions, one of the biggest being that if you use hashmaps or similar structrures with a Double key (given that you can’t use double, because it isn’t an object), they will show up as distinct entries! This may or may not be with what you want! One must mention that this behavior is clearly documented in the Java docs:

0.0d is considered by this method to be greater than -0.0d.

Then again, one must wonder how many people have read this document before running into the problem 🙂

Contrasting with a few other programming languages:

  • From the few tests I’ve done, it seems that .NET implements Double more intuitively (ie. 0 == Double.Parse("0.0").CompareTo(Double.Parse("-0.0"))). This behavior is also consistent in collections (ie. they map to the same key in dictionaries), even though, when printed out, the two objects display the original signs. There also seems to be a (somewhat) complicated way to determine whether the given 0 is or is not zero.
  • PHP (even though it doesn’t have the same boxing / unboxing features) is consistent with the way .NET handles the situation: it prints -0 / 0 respectively, but they compare as equal and are considered the same key in associative arrays.
  • In Perl, we have a behavior closer to Java: they compare as equal (again, no autoboxing), but in hashes they act as different keys.
  • Python is again closer to .NET (they compare as equal and are considered the same key in associate arrays.
  • Javascript also behaves the way .NET does (although there might be differences between the JS engine implementations of different browsers – I only tested it in FF3).
  • Ruby and Smalltalk are left as exercises to the reader 🙂 (they should be interesting, since they both treat numbers as first class objects, meaning – that in a way – they are closer to Java or .NET than the other languages mentioned)

There are justifications for both approaches. On the one site, it is intuitive that -0 == +0, and breaking this expectation can introduce subtle errors in the programs. On the other side, the two objects are different (for example if you print them out, one will display -0.0 and the other 0.0) so (from this point of view) it is justified that they are not equal. Just make sure that you take this into account.

Some further reading:

]]>
https://grey-panther.net/2008/12/negative-zero-what-is-it.html/feed 0 522
What am I reading? https://grey-panther.net/2006/11/what-am-i-reading.html https://grey-panther.net/2006/11/what-am-i-reading.html#respond Fri, 24 Nov 2006 08:01:00 +0000 https://grey-panther.net/?p=1003 I’ve read two and a half 🙂 really interesting posts today (warning, they are pretty long) about computer languages:

And I thought I share some of my opinions about languages. I’m polyglot myself and have some experience with many types on languages.

  • GWBasic / QBasic – this was the first language I’ve learnt. It was lot of fun even when I was typing programs which I didn’t understand from books and magazines 🙂
  • Logo – this was the second language (if we don’t count BAT files) which I’ve used. I found it interesting but I went rather fast back to QBasic because it lacked the functions to interact with the external world.
  • Pascal – This was my next step and for DOS Turbo Pascal (6.0 and later 7.0) was awesome. For windows not so much. I’ve looked at some source code for Windows and immediately got a headache (this was before I understood event driven programs and the theory behind Windows programs).
  • Visual Basic 3.0 – my first experience with visual IDEs under Windows
  • Delphi – this seemed to combine the best of both: visual IDE from VB and Pascal. My only complaint was the file sizes of the generated executables. Then I discovered KOL, but it wasn’t as easy to use and didn’t have all the third party components.
  • C / C++ – I played around a little with them in Visual Studio and DevCPP, but I never fully mastered it. My suggestion to anyone looking to do this professionally would be: read the Thinking in C and Thinking in C++ series from Bruce Eckel and only touch code after you understood everything that is there, because these languages are very powerful but also very dangerous, like a sharp knife.
  • PHP – with it I discovered the joy of scripting languages and web programming. However over the years I found that I was writing the same code over and over again. I’m planning on trying out something like CakePHP or Symfony.
  • Haskell / Lisp – We cursory touched upon these languages in our university curriculum and they both seemed interesting, however my impression was that they didn’t have a big enough library behind them to do actual work.
  • Smalltalk – again, I was introduced to it during university and it was really interesting (we were using Squeak), but the image concept seemed to radical.
  • Javascript – this is a language which I used for a long time (because I too wanted to add interesting effects to my webpage), but only recently did I learn the more advanced functionality (like prototypes and anonymous functions) and best practices (like unobtrusive javascript)
  • Java – as many others I got introduced to Java through applets. It is a fine languages (garbage collection rocks!) and has a very extensive set of libraries, however I don’t really use it these days because scripting languages are quicker to get work done and C / Delphi is better suited for low level OS stuff (like directly calling APIs)
  • VB .NET / C# – Very nice languages and it’s good to see that the size of executables is back again to normal, however this comes at the expense of the fact that everybody has to have the framework installed, and not everybody does. And it really doesn’t look good when you say to people: you want to try this? Go download and install a 20+ MB framework!
  • Python – It is a fine language, however it doesn’t have the set of libraries Perl has (with CPAN). It is also a little more verbose than Perl.
  • Perl – This is my current scripting language of choice. It has an extensive set of libraries (see CPAN), it’s relatively cross-platform and has a very compact (but from time to time very cryptic) syntax. As I learn more and more of Perl I get a more stronger feeling that PHP is a stripped down Perl and really don’t want to go back to PHP.
  • Ruby – I didn’t actually use Ruby, however I saw a couple of tutorials about Ruby on Rails and I got really interested. Two things that concern me are efficiency (if Rails always looks up the structure of the database, doesn’t each request takes longer?) and security (the blending of the development and deployment environment concern me). I’m sure that there are perfectly good answers to both of my concerns, I just don’t know them yet :).
]]>
https://grey-panther.net/2006/11/what-am-i-reading.html/feed 0 1003