Uncategorized – Grey Panthers Savannah https://grey-panther.net Just another WordPress site Sat, 07 Sep 2024 13:39:29 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 206299117 kurzgesagt – trust but verify https://grey-panther.net/2024/09/kurzgesagt-trust-but-verify.html https://grey-panther.net/2024/09/kurzgesagt-trust-but-verify.html#respond Sat, 07 Sep 2024 13:39:26 +0000 https://grey-panther.net/?p=1340 Back in January Kurzgesat announced that they’re doing a sale because they’ll be moving warehouses (see the announcement on Archive.org). This lead me to ask “is that really happening or is this just a marketing ploy to drive sales”?

So, I did a small experiment by purchasing something from the shop back in January and then again in August, to see if they will be shipped from different addresses. And I’m happy to report that, indeed, they shipped from different addresses, making it highly likely that “we’re changing warehouses” is the truth.

  • shipper address in January:
  • shipper address on August:

So, that’s nice. They also seem to be working on a world building game (link, Steam link), which is planned to come out next year (in ’25). Sounds interesting!

]]>
https://grey-panther.net/2024/09/kurzgesagt-trust-but-verify.html/feed 0 1340
An interesting proof for Pythagoras’s theorem https://grey-panther.net/2017/01/an-interesting-proof-for-pythagorass-theorem.html https://grey-panther.net/2017/01/an-interesting-proof-for-pythagorass-theorem.html#respond Thu, 05 Jan 2017 07:06:00 +0000 https://grey-panther.net/?p=1116

I recently saw an interesting proof for Pythagoras’s theorem in the MathHistory series which I wanted to share with y’all 🙂

So a quick reminder, Pythagoras’s theorem says that if we have a right-angle (90 degree) triangle, then there is the following relation between the length of the sides:

a = sqrt(b^2 + c^2) (where a is the length of the longest side) – and vice-versa.

The proof goes like this: lets rewrite the formula like a^2 = b^2 + c^2. We can interpret this geometrically as: (for a right-angled triangle) the are of the square constructed on the longer side is equal to the sum of the areas of the two squares constructed on the shorter sides.

And now the proof goes as follows:

  • consider a right angled triangle
  • "clone" it 4 times and put it together such that the longer sides form a square. Now the area of the inner square is a^2 while the area of the big square is a^2 + 4*At (At is the area of a triangle)
  • rearrange the triangles as shown. The outer square is still of the same size (the length of its side – a+b is the same) but now it can be written as b^2 + c^2 + 4At. Hence a^2 + 4*At = b^2 + c^2 + 4At which can be simplified to a^2 = b^2 + c^2, or if you prefer to a = sqrt(b^2 + c^2).

I only had one nagging feeling after seeing this proof – how do we know that the first big square constructed is actually a square. Can’t it be that its "edges" are not lines, but slightly crooked like below?

Fortunately we can use the fact that the angles in a triangle add up to 180 degrees (ie. a straight line) and show that the sides of the external triangle are indeed straight lines:

]]>
https://grey-panther.net/2017/01/an-interesting-proof-for-pythagorass-theorem.html/feed 0 1116
Finding the N-th word in a complete dictionary https://grey-panther.net/2017/01/finding-the-n-th-word-in-a-complete-dictionary.html https://grey-panther.net/2017/01/finding-the-n-th-word-in-a-complete-dictionary.html#respond Mon, 02 Jan 2017 04:49:00 +0000 https://grey-panther.net/?p=1114

Problem statement

Find the N-th word in a dictionary which contains all the words that can be generated from a given alphabet of length at most M (and sorted by the conventional dictionary sorting rule / lexicographical order).

As a short detour: why did I become interested in it? It was during my investigation of the upper limit for the number of strings formed from a given alphabet that can be encoded in a given number of bits. Even more concretely: what is the upper limit for the length of a DNA/RNA string formed from nucleotides (ie. a string with alphabet [A,C,G,T]) that can be encoded on 64 bits. Note: the problem statement that we need a codec (ie. both enCOding and DECoding, so we’ll solve a bit more generic problem than just the one-way one described in the title).

The first solution which came to mind was to use some bits for the length and the remaining bits to encode the nucleotides (2 bit / nucleotide) however the question remained: how many bits for the length? And is the solution optimal?

So finally I came up with the following formulation: consider that we have a dictionary of all the possible nucleotide strings for length at most M. Now let the 64 bit value just be an index in this dictionary. This is guaranteed to be the optimal solution (if we assume that the probability of occurrence for every string is the same). Now we need three things:

  1. what is the largest value of M for which the index can be stored on 64 bits?
  2. a time and space efficient way (ie. not generating the entire dictionary and keeping it in memory for lookup) to get the index of a given string (the enCOde step)
  3. the same to get the word at a given index (the DECode step)

There is also a somewhat related problem on Project Euler (24: Lexicographic permutations) – that wasn’t the inspiration though, I found out about it later.

Some initial observations

Just by writing out the complete set of words of length at most M formed from a given alphabet we can make some observations. For example consider the alphabet [A,B] and write out:

  • the words of length 0: '' (the empty string)
  • the words of length 1: A and B
  • the words of length 2: AA, AB, BA and BB

So pretty quickly we can see that for a given alphabet and a given length we have exactly len(alphabet) ** length possible words (where ** is the exponentiation operator – ie. a ** b is the b-th power of a), since: we have length positions, at each position we can have one of the len(alphabet) characters, thus the total possibilities are len(alphabet) * len(alphabet) * ... length times which is len(alphabet) to power length.

After this we can ask "how many strings of length less than or equal to M are there"? (question 1 from the initial problem statement). This is simply sum(len(alphabet) ** i for i in [0, M]), also known as the geometric progression: (1 - La ** M) / 1 - La where La = len(alphabet).

So for example if we have the alphabet [A, C, G, T] and 64 bits available we can encode at most 32 characters according to Wolfram Alpha.

Finding the index of a string

To find this we just need to count how many strings there are in the dictionary before our string (remember the dictionary is in lexicographical order).

A concrete example: our dictionary contains all the words of length at most 3 (M=3) formed from the alphabet [A, B]. What is the index of the word BA? (we consider that index 0 is '' – the empty string, index 1 is A, index 2 is AA and so on).

What is the position of BA in our dictionary?

If we would only have words of length exactly K we could compute this by considering BA a number in base 2 (binary) where A=0 and B=1, transform it to base 10 and have our answer (ie BA -> 10b -> 2 -> BA is at position 2 – or is the 3rd word – in the dictionary AA, AB, BA, BB).

However our dictionary contains all words of length exactly 0, 1, 2 and 3. So just consider each in turn!

In a dictionary containing the words from the alphabet [A, B] of exactly length:

  • K=0: BA would have index 1
  • K=1: BA would have index 2 which is the same as indexOf(B) + 1
  • K=2: BA would have index 2
  • K=3: BA would have index 10, which is the same as indexOf(BAA)

So, to find the index of a string:

  • Go from 0 to M (the maximum length allowed for words in our dictionary)
  • Generate a word of length K from our word by either (assuming our strings are zero indexed):
    • Taking the characters 0 to K (exclusive) if K < len(word)
    • Padding the word with the first character of the alphabet up to length K
  • Finding the index of this (sub)word in a dictionary that contains words of length exactly K by considering the (sub)word as a value written in base La (La == length(alphabet)). Add 1 if we’re in the first case since the longer word would come after the shorter ones.
  • Sum up all the values

Or in Python 3 code:

def indexOf(self, word):
    assert len(word) <= self.__max_len
    result = 0
    for i in range(0, self.__max_len + 1):
        if i < len(word):
            subword = word[:i]
            result += self.__valueInBaseN(subword) + 1
        else:
            subword = word + (i - len(word)) * self.__alphabet[0]
            result += self.__valueInBaseN(subword)
    return result

Finding the N-th string

Finally getting at the problem stated in the title. For this I noted how the dictionary can be constructed for length M:

  • the dictionary for M=0 is just '' (the empty string) and for M=1 the empty string plus the alphabet itself.
  • for M>1 take the dictionary for M-1 and prefix it with each of the characters from the alphabet. Finally add the empty string as the first element.

So for example if we have [A, B] as the alphabet then:

  • the dictionary for M=1 is 0: '', 1: A, 2: B
  • to construct the dictionary for M=2 we replicate the above dictionary 2 times, first prefixing it with A, then with B and finally we add the empty string in front:
0: ''  1: A    4: B
       2: AA   5: BA
       3: AB   6: BB

This suggests an algorithm for finding the solution:

  • take the value. Decide in "column" it would be.
    • you know the number of words in each column: len(dictionary) - 1 / len(alphabet)
    • len(dictionary) is sum(len(alphabet) ** i for i in [0, K]) (see the initial observations)
    • this can also be precomputed for efficiency
  • the column index gives you letter index in the alphabet
  • now subtract from the value the index of the first word in the given column. If you get 0, stop.
  • Otherwise make K one less and look up the new value in the dictionary of length at most K.

A small worked example:

  • lets say we have [A, B] as the alphabet and M=2. We want to find the word at 5 (which is BA if you take a peak at the table above). So:
  • in each column we have 3 words, so 5 is in the 2nd row (the row with index 1) which gives us "B" as the first letter
  • now subtract 4 (the index of the first word in the 2nd column – B) from 5 which leaves us with 1
  • now find the word with index 1 in a dictionary with M=1 which is "A"
  • thus the final word is "BA"

Or in Python 3 code:

def wordAt(self, index):
    assert 0 <= index <= self.__lastIndex
    result, current_len = '', self.__max_len
    while index > 0:
        words_per_column = self.__wordsPerLetterForLen[current_len]
        column_idx = (index - 1) // words_per_column
        result += self.__alphabet[column_idx]
        index_of_first_word_in_col = 1 + column_idx * words_per_column
        index -= index_of_first_word_in_col
        current_len -= 1
    return result

Note: you can find a different algorithm to do the same on math.stackexchange.com, however I found the above to be visually more intuitive.

Can we do it simpler?

So we solved the initial problem (both the one stated in the title and the one which motivated this journey) however it took over a thousand words to describe and justify it. Can we do simpler? Turns out yes! We just need abandon our attachment to the lexicographical order and say that as long as we have a bijective encoding and decoding function with the property decode(encode(word)) == word we are satisfied.

A simple and efficient function is the transformation of the word from base La (length of alphabet) to base 10 and vice-versa. For example if we have [A, C, G, T] as the alphabet and GAT as the word we can do:

  • encode: 2*(4**2) + 0*(4**1) + 3*(4**0) which is 33
  • decode: 33 is written as powers of 4 as above and 2, 0, 3 corresponds to GAT

Again, the ordering will not be lexicographical (A, AA, AB, ...) but rather a numerical-order kind-of (A, B, AA, AB, ...) but the algorithm is much simpler and in the case that La is a power of two, very efficient to implement on current CPUs since division / remainder can be done using bit-shifts / masking.

More speculation

I didn’t actually want to encode DNA/RNA sequences, but rather mutations/variations which are pair of sequences (something like G -> TC or GT -> ''). Now I could just divide the 64 bits into two 32 bit chunks but the same initial question would arise: is this the most optimal way for encoding?

So we go the same solution: what if we would have a dictionary of all the variants ('' -> A, '' -> AA, ...) and just index into it. How would we construct such a dictionary and how would we order it?

Turns out there is an algorithm inspired by the proof that there are the same number of natural numbers as there are rational ones. However that doesn’t give us a way to find the N-th element in the sequence but a Calkin–Wilf sequence does.

So we can have the following algorithm:

  • represent the pair to -> from as two numbers A and B (refer to the discussion until now how we can do that)
  • use the Calkin-Wilf sequence (combined with the continued fraction formula) to find the index of A/B
  • or conversely use the sequence to transform the index into the A/B fraction and then transform the numerator and denominator into the original sequences

This is just speculation but it should work in theory. Also, it is fairly complicated so perhaps there is a better way to do it by making some simplifying assumptions? (like us eliminating the lexicographic ordering requirement).

Source code

A complete implementation of the above algorithms (with tests!) in Python 3 can be found on GitHub.

]]>
https://grey-panther.net/2017/01/finding-the-n-th-word-in-a-complete-dictionary.html/feed 0 1114
The limits of science https://grey-panther.net/2016/09/the-limits-of-science.html https://grey-panther.net/2016/09/the-limits-of-science.html#respond Thu, 08 Sep 2016 04:44:00 +0000 https://grey-panther.net/?p=1110

In a lot of ways Science has become the religion of the day. We can’t go more than a day to hear / see / read a "news" story about "scientists" saying something about something important. We can’t help but feel dazzled, confused, perplexed, overwhelmed by these announcements. And we have discussions with others:

  • I heard that scientists say that X cures cancer.
  • Didn’t you hear? Scientist announced that X causes cancer How is this really different from saying "my shaman said"?

The core principles of science

I’m going to argue that scientific thinking is really in a different league – but it also has many limitations. The core ideas of scientific thinking are:

  • We can observe things and those observations have some relation to the objective reality
  • This objective reality is probabilistically deterministic (ie. if we flip a coin it will land either heads or tails but it generally won’t turn into a unicorn and fly away)
  • Based on our observations we can construct models / hypothesis about cause and effect. While there can be an infinite amount of hypothesis, we assume that reality has a simple and elegant structure and thus prefer the hypothesis which explains the largest category of events with the minimal assumptions
  • Scientific hypothesis need to be testable (also called "falsifiable") – ie. they need to be forward looking / have predictive power. For example if I say "object alway fall downwards" then you or I can take a ball, a key, a rock, etc and verify that indeed, when I let go of it, it falls towards the ground. However such positive outcomes have little value (they increase the likelihood that the hypothesis / theory is true only by a small amount) and the most valuable things are falsifications – when somebody makes a prediction based on the theory but then a different outcome is observed – which most likely means that the theory is false (unless there was an error in the experiment).

This looks an awful lot like "the ten commandments of the science religion", doesn’t it? Because there is no reason to believe them intrinsically. There are only two things in favor of this mindset:

This is how our world seems to work – and thus these principles seem intrinsically true to a lot of us. Then again this "gut feeling" is no different from being convinced that a given kind of deity governs our lives

What is different from every other religion/philosophy however is that it contains the framework to extend it. You just observe, theorize and then try to falsify your theory. Voilà! You’re adding to the scientific knowledge. All other religions are closed while the religion of science is open.

Science is not "truth"

You might have observed that the principles – as described in the previous section – have somewhat of a wishy-washy nature. I keep using words like "likely" and "usually".

For example I said "if we flip a coin it will land either heads or tails but it generally won’t turn into a unicorn and fly away". I can’t say for certain that it won’t turn into a unicorn, but so far nobody reported a case of this happening, but we observed the coin landing heads or tails a lot of times so we’ll assign a very small probability to something else happening.

Scientific results are always probabilistic, but that is just life: if I go out tomorrow I might get hit by a car, but I most probably won’t. Note that other religions generally avoid saying anything about "this life/world" and reserve their proclamations for "the other/after life". At least science is trying to make some predictions, even though they sometimes turn out to be false.

This bears repeating: Science is not "truth". There is no such thing as a "scientific fact". Our scientific knowledge is simply a collection of theories which we failed to falsify – as of yet. Now, some of those theories have been around for a long time and have been found true in many situations and it’s prudent to do to act as if they were the absolute truth, but we must accept the fact that there is always the small possibility that they might turn out to be false.

To pound some more this idea: all the above is true even if we would "do science perfectly". However we are humans: subject to our biases, feelings and other motivations.

Science is not math

(a couple of words about induction)

Related to "there is no scientific truth" is the mirage of mathematics in science, which goes something like this: "we all know that 2+2 is 4, this scientist is using a mathematical formula, so whatever comes out of the formula must be true".

No!

Mathematics and science use very different ways of reasoning:

In math we use deduction: we state some ground rules ("axioms") and from those we deduce (prove) all the other statements. All the deduced statements are "true" (if we didn’t do any mistakes) since they are derived from the axioms which we assume to be true. This doesn’t mean however that it necessarily has any relation to the objective reality. (And just a side-note: even if we stay within the bounds of such imaginary systems – ie. don’t try to apply it to the "real world" – we will hit some fundamental limitations ^1)

Science however uses induction: if I see that both an apple and a rock is failing down, I guess that both "falling down" events happen because of the same cause. Such "rules of thumbs" ^2 work generally but can give the "wrong" results sometimes (as opposed to the rules of deduction which are always correct if we accept the initial axioms). Of course for a scientist these cases of "unexpected outcome" are the most interesting ones since they signal an opportunity to learn something new, but they can be most distressing if we think of science as "the source of truth".

So how does science use mathematics? It is just a precise way to describe hypothesis / theories and to manipulate those. Thus we might say that the maximum distance of a ball thrown is described by the equation v^2 / g and we can make predictions about the distance of a ball will travel before throwing it (and verify after the fact that prediction was – mostly – correct) but the fact that we formulated the theory in mathematical term does not make it fundamentally more true than any other scientific theory. It still is "the best theory we have for now which isn’t refuted by evidence".

Other limitations of science

Coming back to the idea "science is not the absolute truth even if we would "do science perfectly":

We don’t do science perfectly:

  • We have biases and when we have a theory we might not look "hard enough" for evidence to refute it
  • The academic structure is not set up to encourage verification of results: publishing replication results or negative results is discouraged (may scientific journals don’t even accept them for publication)
  • Science is entirely probabilistic (it only tells you what probably is true), however statistics (the branch of mathematics which deals with probabilities) is complicated and it’s easy to make mistakes when judging what constitutes probably true (Is Most Published Research Wrong?)
  • In some very interesting fields it is hard to conduct experiments (more on this in the next section)

And as if the above wasn’t enough, there is the problem of communicating the results (the step where "there is a 60% probability that eating a bit of chocolate improves bone density in white women over 50" turns into "Science Fact! Women should eat chocolate daily!").

To give some uplifting news: these issues are known and people are trying to work on it. There have been scientific journals set up to publish replicating studies and/or studies with negative results. There is a movement to encourage researchers to pre-register their studies (to state what data they’ll collect and how they will analyze it) and publish the results even if they are negative. Finally some journals require scientists to give a "simplified abstract" when publishing the research which can be adapted by journalists easier.

However these are hard issues and we can help out by not having unrealistic expectations.

Words about human-centric fields

A couple of final words about the science in fields like health (mental and physical) or economics: these are the fields which are the most important to us but where the difficulties presented multiply:

  • Generally we can’t do random double-bind trials where we select a group of people and infect them with AIDS or make them live on less than $1 a day (you know, because we value human life)
  • This means that we can only study people who already are in this situation but that makes it very likely that we confuse cause and effect
  • Even if the experiment is non-intrusive (or we’re doing an observational study) it is very hard to get a diverse set of participants (ie. most psychology experiments are done on young white males in the US – no wonder that they don’t replicate across the world)

Again, people are working on addressing these issues, but it is just one more reason not to believe the "fad" articles published daily and shared virally on social media.

Science is not perfect, but it’s the best that we have.

]]>
https://grey-panther.net/2016/09/the-limits-of-science.html/feed 0 1110
A fresh start with Pelican https://grey-panther.net/2016/01/a-fresh-start-with-pelican.html https://grey-panther.net/2016/01/a-fresh-start-with-pelican.html#respond Mon, 04 Jan 2016 04:15:56 +0000 https://grey-panther.net/?p=1107 Here we are in 2016, trying to start blogging again. Using Pelican is more complicate than it needs to be :-(.

]]>
https://grey-panther.net/2016/01/a-fresh-start-with-pelican.html/feed 0 1107
Nested fluent builders https://grey-panther.net/2013/06/nested-fluent-builders.html https://grey-panther.net/2013/06/nested-fluent-builders.html#respond Sun, 23 Jun 2013 12:26:00 +0000 Crossposted from the Transylvania JUG website.

Builders have become commonplace in current Java code. They have the effect of transforming the following code:

new Foo(1, 5, "abc", false);

Into something like

Foo.builder()
  .count(1)
  .priority(5)
  .name("abc")
  .canonical(true)
  .build();

This has the advantage of being much easier to understand (as a downside we can mention the fact that – depending on the implementation – it can result in the creation of an additional object). The implementation of such builders is very simple – they a list of “setters” which return the current object:

public final class FooBuilder {
  private int count = 1;
  // ...

  public FooBuilder count(int count) {
    this.count = count;
    return this; 
  }

  public Foo build() {
    return new Foo(count, //...
  }
}

Of course writing even this code can become repetitive and annoying, in which case we can use Lombok or other code generation tools. An other possible improvement – which makes builder more useful for testing – is to add methods like random as suggested in this Java Advent Calendar article. We can subclass the builder (into FooTestBuilder for example) and only use the “extended” version in testing.

What can do however if our objects are more complex (they have non-primitive fields)? One approach may look like this:

Foo.builder()
  .a(1)
  .b(2)
  .bar(Bar.builder().c(1).build())
  .buzz(Buzz.builder().build())
  .build();

We can make this a little nicer by overloading the bar / buzz methods to accept instances of BarBuilder / BuzzBuilder, in which case we can omit two build calls. Still, I longed for something like the following:

Foo.builder()
  .a(1)
  .b(2)
  .bar()
     .c(1).build()
  .buzz()
     .build()
  .build();

The idea is that the bar / buzz calls call start a new “context” where we initialize the Bar/Buzz classes. “build” calls end the innermost context, with the last build returning the initialized Foo object itself. How can this be written in a typesafe / compiler verifiable way?

My solution is the following:

  • Each builder is parameterized to return an arbitrary type T from its build method
  • The actual return value is generated from a Sink of T
  • When using the builder at the top level, we use an IdentitySink with just returns the passed in value.
  • When using the builder in a nested context, we use a Sink which stores the value and returns the builder from “one level up”.

Some example code to clarify the explanation from above can be found below. Note that this code has been written as an example and could be optimized (like making using a single instance of the IdentitySink, having FooBuilder itself implementing the sink methods, etc).

Implementation of a leaf-level builder:

interface Sink<T> {
  T setBar(Bar bar);
}

final class Bar {
  // ...
  public static BarBuilder<Bar> builder() {
    return new BarBuilder<Bar>(new Sink<Bar>() {
      @Override
      public Bar setBar(Bar bar) { return bar; }
    });
  }
}

class BarBuilder<T> {
  // ...

  protected BarBuilder(Sink<T> sink) {
    this.sink = sink;
  }

  // ...

  public T build() {
    return sink.setBar(new Bar(c, d, fizz));
  }
}
</pre>

<p>Implementation of the root level builder:</p>

<pre lang="java" line="1">
class FooBuilder {
  // ...
  public BarBuilder<FooBuilder> setBar() {
    return new BarBuilder(new Sink<FooBuilder>() {
      @Override
      public Bar setBar(Bar bar) { 
        FooBuilder.this.bar = bar;
        return FooBuilder.this;
      }
    });
  }

  // ...
}

Conclusion: Java has some missing features (liked named parameters or the ease of reuse provided by duck-typing). We can work around them however nicely with some carefully crafted code (and we can put repeating code into code generators to avoid having to write it over and over again). In exchange we get a very versatile and good performing cross-platform runtime.

]]>
https://grey-panther.net/2013/06/nested-fluent-builders.html/feed 0 14
Recovering your RoTLD password for domains registered trough Gandi.NET https://grey-panther.net/2013/02/recovering-your-rotld-password-for-domains-registered-trough-gandi-net.html https://grey-panther.net/2013/02/recovering-your-rotld-password-for-domains-registered-trough-gandi-net.html#respond Tue, 05 Feb 2013 14:08:00 +0000 Update account information and set “Anti-spam system” to No Go to the RoTLD password […]]]> If you need to “recover” your RoTLD password when the .RO domain is registered trough Gandi.NET (I say “recover” because you didn’t set it in the first place :-)) – do this:

  • In the Gandi interface go to Account Management -> Update account information and set “Anti-spam system” to No
  • Go to the RoTLD password recovery page and reset your password. Now the password recovery email should have arrived to your inbox.
  • (Optional) go back to the Gandi interface and re-enable the anti-spam system

I needed to do this to enable CloudFlare on IT-Events.RO because RoTLD allows the changing of nameservers only trough their website.

]]>
https://grey-panther.net/2013/02/recovering-your-rotld-password-for-domains-registered-trough-gandi-net.html/feed 0 18
Free software for Windows https://grey-panther.net/2013/01/free-software-for-windows.html https://grey-panther.net/2013/01/free-software-for-windows.html#respond Thu, 03 Jan 2013 23:36:00 +0000 Inspired by this post I decided to list the software I use/recommend under Windows.

Free/Libre Open Source Software:

  • LibreOffice (for of OpenOffice) – a very capable office solution. Most people don’t need anything more than this. If you are installing it for a non-technical user, make sure to set the default file formats to the Microsoft ones (.doc, .xls, …)
  • Far Manager – a very cool two-panel file manager, for those of us who like text-based things (don’t let the looks fool you – it is very modern and very capable). You could also take a look at NDN or DNOSP, but FAR is my personal favorite.
  • VLC – THE video player. It can play 99.999% of the media out there and it won’t pollute your system with all kinds of DLLs.
  • ffdshow-tryouts – DirectShow / VFW codecs for all the formats VLC can play (in fact they are both based on the ffmpeg project). Use this to play back videos in programs which are DirectShow based (like WMP or Winamp).
  • 7-zip – for all your zipping and unzipping needs. It supports a lot of other formats too (mainly for extracting) so no need to install the shareware version of WinRar
  • Firefox – ’nuff said. There are also a lot of plugins which one might find useful like Firebug, NoScript, RequestPolicy, etc. As a download manager I would recommend DownThemAll, but I found that with recent increases in Internet access speeds I don’t need a download manager.
  • Notepad2 for your small (text) editing needs. There is also Notepad++ and notepad2-mod.
  • PDFForge to create PDFs from any program which can print (it acts as a virtual printer which outputs its results to a PDF file). Sidenote: LibreOffice can natively save to PDF, no need for this if you’re using it only for that.
  • Pidgin – you might know it as GAIM, a multi protocol instant messenger. And it is very multi protocol. The only downside is that some advanced protocol features are not always functional (*cough* file-transfer, *cough*), but I find that I rarely use those anyways. If you are installing it for someone else however, make sure to ask them what features they consider essential (like custom background/emoticons) and act accordingly.
  • WinSCP to copy files trough SSH/SCP, and the FileZilla client to do the same over FTP.
  • Various specialized programs: GIMP for photo-editing, Inkscape for vector-based graphics, VirtualDub and Avidemux for (linear) video editing, Wireshark for network analysis, Audacity for audio editing, PuTTY as an SSH client, VirtualBox for running virtual machines and Eclipse for development (it can do more than Java!). There is also the IntelliJ Community Edition for developement which is Open Source Software, but be aware of the limitations.
  • XAMPP for quickly setting up a LAMP environment under Windows

Free (as in beer) – these may try to trick you into installing toolbars / changing your homepage / your search engine so watch out:

  • TeamViewer – a nice remote control solution (also cross platform – although it doesn’t run perfectly on other OSs). I especially like the fact that it can run as a service and that it takes care of the NAT traversal problem.
  • CDBurnerXP – for burning optical media. Nothing special, but it works.
  • IrfanView – a very capable image viewer / converter. Don’t forget to install the plugins to take full advantage of its features! It is also so lightweight that won’t believe how quickly the installation finishes. Watch out though, it tries to install sponsored programs 🙁
  • FreeCommander – a two panel graphic file manager. Recommended if you’re a Total/Windows Commander fan rather than a Norton Commander one 🙂
  • The MS Visual Studio Express series – a good way to get your feet wet with MS specific development (also good for university projects), but be aware that you’ll quickly hit a wall with it on professional projects.
  • uTorrent for my downloading needs.
  • Daemon Tools Free for my ISO (CD/DVD/BR) mounting needs. Attention during install! It will try to “upgrade” you several times during install and also try to install additional software / change your home page / search provider if you just click trough next. Don’t let it, form a technical point of view it is a great product. There is also the unsupported Virtual CD product from Microsoft and Windows can mount ISOs natively starting from Windows 7 I think.
  • The FoxIt PDF Reader – a lightweight PDF reader, although Adobe Reader X caught up nicely I feel (and they also auto-update to eliminate security vulnerabilities), so you could give it a second go.
  • BB FlashBack Express for screen recording. Little annoying to install (you need to give them an email account, they try to upgrade you to the paid version and you need to “register” afterwards), but after installing it is all good and I found it to be a very capable product (even at the free version level).
  • Chrome and Opera as alternative browsers (and no, Chrome is not Open Source, Chromium is).
  • Paint.NET for advanced but “not photoshop level” image editing.
  • foobar2000 or Winamp (with the classic skin :-)) for music. foobar is very lightweight and quick, but it might lack some features. Winamp is very complete, but tries to make all kinds of changes to your system. You also probably don’t need the Winamp Agent to run in the background all the time 🙂
  • Dropbox for file synchronization / small-scale file serving. Alternatively there is SkyDrive, but it isn’t very Linux friendly.
  • Windows Live Writer – the best blog publishing software I could find. Unfortunately Microsoft ruined it with the Office 2007 look and now seems to want to abandon it completely
  • Skype for video-call / teleconference, although lately I’ve been dropping it in favor of Google Hangouts

Update: this software looks very promising: Ninite. It purports to auto-install and auto-update a lot of common Windows software. Will take it for a spin the next time I install Windows.

]]>
https://grey-panther.net/2013/01/free-software-for-windows.html/feed 0 19
What every programmer should know about X https://grey-panther.net/2012/12/what-every-programmer-should-know-about-x.html https://grey-panther.net/2012/12/what-every-programmer-should-know-about-x.html#respond Mon, 31 Dec 2012 09:24:00 +0000 Piggybacking on some memes floating around on the internet I would like to publish my list of “what every programmer should know”.

A couple of introductory words: in my opinion the two most important things to learn for new programmers are terminology – to know what things / ideas / algorithms / concepts are called so that they can search for them on the internet and discuss their ideas) and humility (if something doesn’t exists or doesn’t work the way we expected, the first thing we should ask ourselves is: “what am I missing?” instead of proclaiming the predecessors to be idiots). Moving along to the list:

Happy holiday reading/watching to all!

]]>
https://grey-panther.net/2012/12/what-every-programmer-should-know-about-x.html/feed 0 20
Writing beautiful code – not just for the aesthetic value https://grey-panther.net/2012/11/writing-beautiful-code-not-just-for-the-aesthetic-value.html https://grey-panther.net/2012/11/writing-beautiful-code-not-just-for-the-aesthetic-value.html#respond Tue, 06 Nov 2012 12:33:00 +0000 https://grey-panther.net/?p=28 This article was originally published in the 6th edition of TodaySoftMag in Romanian and on the Transylvania JUG blog in English. Reprinted here with the permission of the author / magazine.

Most mainstream programming languages contain a large set of features and diverse standard libraries. Because of this it becomes important to know not only “how” you can achieve something (to which there are usually several answers) but also “what is the recommended way”.

In this article I will argue that knowing and following the recommended ways of coding doesn’t only yield shorter (easier to write), easier to read, understand and maintain code but also prevents programmers from introducing a lot of bugs.

This particular article needs a drop of Java language knowledge to savour, but the fundamental idea can be generalized to any programming language: there is more to using a language efficiently than just knowing the syntax.

Example 1: Double Trouble

Lets start with a snippet of code: what does it print out?

Double d1 = (5.0d - 5.0d) *  1.0d;
Double d2 = (5.0d - 5.0d) * -1.0d;
System.out.println(d1.equals(d2));

What about the following one?

double d1 = (5.0d - 5.0d) *  1.0d;
double d2 = (5.0d - 5.0d) * -1.0d;
System.out.println(d1 == d2);

The answer seems to be clear: in both cases we multiply zero with different values (plus and minus one respectively), thus the result should be zero which should compare as equal regardless of the comparison method used (calling the equals method on objects or using the equality operator on the primitive values).

If we run the code, the result might surprise us: the first one displays false while the second one displays true. What’s going on? On one level we can talk the technical reasons behind this result: floating point values are represented in Java (and many other programming languages) using the sign-and-magnitude notation defined in the IEEE Standard 754. Because of this technical detail both “plus zero” and “minus zero” can be represented by variables of this type. And the “equals” method on Double (and Float) objects in Java considers these values to be distinct.

On another level however we could have avoided this problem entirely by using the primitive values as shown in the second code snippet and as suggested by Item 49 in the Effective Java book[1]: Prefer primitive types to boxed primitives. Using primitive types is also more memory efficient and saves us from having to create special cases for the null value.

Sidenote: we have a similar situation with the BigDecimal class[2] where values scaled differently don’t compare as equal. For example the following snippet also prints false:

BigDecimal d1 = new BigDecimal("1.2");
BigDecimal d2 = new BigDecimal("1.20");
System.out.println(d1.equals(d2));

The answer in this case (given that there is no primitive equivalent for this class) would be to use the compareTo method and assert that it returns zero instead of using the equals method (a method which can also be used to solve the conundrum in the Double/Float case if we are not worried about nulls).

Example 2: Where is my null at?

What does the following snippet of code print out?

Double v = null;
Double d = true ? v : 0.0d;
System.out.println(d);

At first glance we would say: null, since the condition is true and v is null (and null can be assigned to a reference of any type, so we are allowed to use it). The actual result is however a NullPointerException at the second line. This is because the right-hand type of the assignment is actually double (the primitive type) not Double (as we would expect) which is silently converted into Double (the boxed type). The generated code looks like this:

Double d = Double.valueOf(true ? v.doubleValue() : 0.0d);

This behavior is described in the Java Language Specification[3]:

“If one of the second and third operands is of primitive type T, and the type of the other is the result of applying boxing conversion (§5.1.7) to T, then the type of the conditional expression is T.”

I would venture to guess that not many of us have read the JLS in its entirety and even if we would have read it, we might not have realized the implications of each phrase. The recommendation from EJ2nd mentioned at the previous example saves us again: we should use primitive types. We can also draw a parallel with Item 43: Return empty arrays or collections, not nulls. Would we have used a “neutral element”, which is analogous to using empty arrays/collections, the problem would not have appeared. (The neutral element would be 0.0d if we use the value later in summation or 1.0d if we use it in multiplication.)

Example 3: We come up empty

What is the difference between the following two conditions?

Collection<V> items;
if (items.size() == 0) { ... }
if (items.isEmpty()) { ... }

One could argue that they do exactly the same thing as being empty is equivalent to having zero items. Still, the second condition is easier to understand (we can almost read it out loud: “if items is empty then …”). But there is more: in some cases it can be much, much faster. Two examples from the Java standard libraries where the time needed to execute “size” grows linearly with the number of elements in the collection while “isEmpty” returns in constant time: ConcurrentLinkedQueue[4] and the view sets returned by TreeSet’s[5] headSet/tailSet methods. And while the documentation for the first mentions this fact, it doesn’t for the second.

This is yet another example how nicer code is also faster.

Example 4: Careful with that static, Eugene!

What will the following snippet of code print out?

public final class Test {
        private static final class Foo {
                static final Foo INSTANCE = new Foo(); // 2
                static final String NAME = Foo.class.getName(); // 3
                Foo() {
                        System.err.println("Hello, my name is " + NAME);
                }
        }
        public static void main(String[] args) {
                System.err.println("Your name is what?nYour name is who?n");
                new Foo(); // 1
        }
}

It will be

Your name is what?
Your name is who?

Hello, my name is null
Hello, my name is Test$Foo

The (probably) unexpected null value happens because we obtain a reference to a partially constructed object:

  • We start to create an instance of Foo at point 1
  • This being the first reference to Foo, the JVM loads it and starts to initialize it
  • Initializing Foo involves initializing all its static fields
  • The initialization of the first static field contains a call to the constructor at point 2 which is dutifully executed
  • At this point the NAME static field is not yet initialized, so the constructor will print out null

This code demonstrates that static fields can be confusing and we shouldn’t use them for things other than constants (but even then we should evaluate if the constant is not better declared as an Enum). By the same token we should also avoid singletons which make our code harder to test (thus avoiding them will make the code easier to test).

We should however favor static member classes over non-static ones (Item 22 in EJ2nd). Static classes in Java are entirely distinct conceptually from static fields and it is unfortunate that the same word was used to describe them both.

We should also run static analysis tools on our code and verify their output frequently (ideally at every commit). For example the bug presented is caught by Findbugs[6] and tools incorporating Findbugs.

Example 5: Remove old cruft

Name four things wrong with the following snippet:

// WRONG! DON’T DO THIS!
Vector v1;
...
if (!v1.contains(s)) { v1.add(s); }

They would be:

  • The wrong container type is used. We clearly want to have each string present at most once which suggests using a Set<> which has the benefits of shorter and faster code (the above method gets linearly slower with the number of elements)
  • Doesn’t use generics
  • It unnecessarily synchronizes access to the structure if it is only used from a single thread
  • If the structure is actually used from multiple threads, the code is not thread safe, only “exception safe” (as in: no exceptions will be raised, but the data structure can be silently corrupted possibly creating a lot of headache downstream)

All of these can be avoided by dropping Vector and its siblings (Hashtable, StringBuffer) and using the Java Collection Framework (available for 14 years[7]) with generics (available for 8 years[8]).

Conclusion

There are many more examples one could give, but I think the point is well made that knowing a programming language means more than just knowing the syntax at a basic level. I’m urging you if you are using Java: get yourself a copy “Effective Java, 2nd edition” and “Java™ Puzzlers: Traps, Pitfalls, and Corner Cases” each and read through them if you haven’t done so already. Also, use static analysis on your code (Sonar[9] is a good choice in this domain) and consider fixing the issues signaled by it, or at least read up on them.

Again, the conclusions is similar for other languages:

  • Try reading up on best practices/idiomatic ways to write code in the given language. For example for Perl the best book currently is “Modern Perl[10]” by chromatic
  • Look to see if there is a good quality static analysis / lint program for your language. For Perl there is Perl::Critic[11], for Python there is pep8[12] and pylint[13], all of which are free and open source

Being good a programmer (or an architect, or a business analyst, etc) is process of lifelong learning and these are the tools which can help us truly learn a programming language.


[1]Joshua Bloch: Effective Java, Second Edition. ISBN: 0321356683

]]>
https://grey-panther.net/2012/11/writing-beautiful-code-not-just-for-the-aesthetic-value.html/feed 0 28