Which leads me to my current setup: use simplelogin.io from Proton with a fallback to Cloudflare Email Routing.
The description of the setup is probably shorter than the list of advantages, which is probably a good thing
v=spf1 include:simplelogin.co include:_spf.mx.cloudflare.net -all
That’s it! Here is again a the relevant DNS records for grey-panther.net:
;; CNAME Records
dkim02._domainkey.grey-panther.net. 1 IN CNAME dkim02._domainkey.simplelogin.co.
dkim03._domainkey.grey-panther.net. 1 IN CNAME dkim03._domainkey.simplelogin.co.
dkim._domainkey.grey-panther.net. 1 IN CNAME dkim._domainkey.simplelogin.co.
;; MX Records
grey-panther.net. 1 IN MX 20 mx2.simplelogin.co.
grey-panther.net. 1 IN MX 10 mx1.simplelogin.co.
grey-panther.net. 1 IN MX 147 amir.mx.cloudflare.net.
grey-panther.net. 1 IN MX 119 linda.mx.cloudflare.net.
grey-panther.net. 1 IN MX 163 isaac.mx.cloudflare.net.
;; TXT Records
_dmarc.grey-panther.net. 1 IN TXT "v=DMARC1;p=reject;rua=mailto:[email protected];ruf=mailto:[email protected];fo=1;"
grey-panther.net. 1 IN TXT "v=spf1 include:simplelogin.co include:_spf.mx.cloudflare.net include:sites.nearlyfreespeech.net -all"
grey-panther.net. 1 IN TXT "sl-verification=xznetmbmfgmkinlnopzlakneigjhzk"
Nothing is perfect, and I’m enabling quite some people to spy on my in the worst case:
* Yes, unencrypted email can be considered mostly public anyway – still, basic security precautions like making sure that your email server speaks SSL/TLS for incoming and outgoing emails is useful.
** So, if I sign up with [email protected] for two different sites, it’s easy to conclude that it’s one person who owns both accounts. However if I use [email protected] for one site and [email protected] for the two different sites, it’s much less clear that there is the same person behind them.
]]>Sadly it seems like the maintainer might have passed sometime last year (or is at least gravely ill). From the page:
Folks … sorry for the delay (again) in getting out an update … just got out of the Hospital … I now have some severe health issues to deal with (complete Kidney failure … need a Kidney transplant) plus another operation … large needles inserted into my spine …however I will try to better maintain the MVPS HOSTS file. Well just got back from Hospital again (excessive water in lungs)
If you could … please consider a donation. Thanks to all that contributed … every little bit helps.
https://winhelp2002.mvps.org/hosts.htm (archive.org link)
So, I donated – may it be of some use to them / their family! And I encourage to do the same if you benefited from this great file!
As for alternatives, there are several good ones:
For the last decade it has been the case – and continues to be the case in my opinion – that ad/tracker blocking is the single most effective way to keep devices from being infected with all kinds of malware (and, it generally makes web browsing faster!)
]]>They have a generous amount of free credit, but I wouldn’t recommend them for production use.
]]>Trying to set up CloudFlare Access and it seems that some information are hard to find:
Cf-Access-Authenticated-User-Email
header. Other useful headers can be Cf-Connecting-Ip
or Cf-Ipcountry
.In 2016 I wrote A fresh start with Pelican. And now, 6 years later I’m writing this. Lots has changed since then and lots has stayed the same. It still fills me with joy writing texts that may be useful to somebody.
So, what’s to like about WordPress? For one, it can do blogs (and websites in general – so I don’t have to keep up with the latest (micro)formats and trust that it handles them reasonably well) and for most usual things (like code highlighting), there are well supported plugins. It’s also F/LOSS software and portable – I must say I quite liked the interview with Matt on FLOSS Weekly.
An other big thing is that it supports comments – something which static websites generally don’t and the alternatives (like Disqus) don’t respect user’s privacy at the level I would like them to.
So type away your comments! (also, if you’re on the feedburner feed, please switch over to https://grey-panther.net/feed, because who knows how long the former will be around!).
But there are also a couple of things not to like about WordPress – for one, using it, I’m painting a big target on my back (lots of WordPress sites are getting hacked every day). I do believe that I’ve taken reasonable precautions against this (stay tuned for a description on how this is set up!), but it’s a risk.
Also, running dynamic websites is not free (though not astronomically expensive either). My main worry around this is that if I become incapacitated for a longer time, this content will disappear (and one big reason for me starting up writing the blog again is to have a documentation for my family for such cases – so that they can get technical help to access – and maintain if they wish to – all the digital trinkets I’m creating). Also stay tuned about my plans around this problem, but the short version is that I’m planning to mirror the content periodically to several “free” providers and hope that at least one of the mirrors will be around long enough.
Until the next time!
Image credits to rawpixel.com through PxHere.
]]>I recently saw an interesting proof for Pythagoras’s theorem in the MathHistory series which I wanted to share with y’all
So a quick reminder, Pythagoras’s theorem says that if we have a right-angle (90 degree) triangle, then there is the following relation between the length of the sides:
a = sqrt(b^2 + c^2)
(where a is the length of the longest side) – and vice-versa.
The proof goes like this: lets rewrite the formula like a^2 = b^2 + c^2
. We can interpret this geometrically as: (for a right-angled triangle) the are of the square constructed on the longer side is equal to the sum of the areas of the two squares constructed on the shorter sides.
And now the proof goes as follows:
a^2
while the area of the big square is a^2 + 4*At
(At
is the area of a triangle)a+b
is the same) but now it can be written as b^2 + c^2 + 4At
. Hence a^2 + 4*At = b^2 + c^2 + 4At
which can be simplified to a^2 = b^2 + c^2
, or if you prefer to a = sqrt(b^2 + c^2)
.I only had one nagging feeling after seeing this proof – how do we know that the first big square constructed is actually a square. Can’t it be that its "edges" are not lines, but slightly crooked like below?
Fortunately we can use the fact that the angles in a triangle add up to 180 degrees (ie. a straight line) and show that the sides of the external triangle are indeed straight lines:
Find the N
-th word in a dictionary which contains all the words that can be generated from a given alphabet of length at most M
(and sorted by the conventional dictionary sorting rule / lexicographical order).
As a short detour: why did I become interested in it? It was during my investigation of the upper limit for the number of strings formed from a given alphabet that can be encoded in a given number of bits. Even more concretely: what is the upper limit for the length of a DNA/RNA string formed from nucleotides (ie. a string with alphabet [A,C,G,T]
) that can be encoded on 64 bits. Note: the problem statement that we need a codec (ie. both enCOding and DECoding, so we’ll solve a bit more generic problem than just the one-way one described in the title).
The first solution which came to mind was to use some bits for the length and the remaining bits to encode the nucleotides (2 bit / nucleotide) however the question remained: how many bits for the length? And is the solution optimal?
So finally I came up with the following formulation: consider that we have a dictionary of all the possible nucleotide strings for length at most M
. Now let the 64 bit value just be an index in this dictionary. This is guaranteed to be the optimal solution (if we assume that the probability of occurrence for every string is the same). Now we need three things:
M
for which the index can be stored on 64 bits?There is also a somewhat related problem on Project Euler (24: Lexicographic permutations) – that wasn’t the inspiration though, I found out about it later.
Just by writing out the complete set of words of length at most M
formed from a given alphabet we can make some observations. For example consider the alphabet [A,B]
and write out:
''
(the empty string)A
and B
AA
, AB
, BA
and BB
So pretty quickly we can see that for a given alphabet and a given length we have exactly len(alphabet) ** length
possible words (where **
is the exponentiation operator – ie. a ** b
is the b-th power of a), since: we have length
positions, at each position we can have one of the len(alphabet)
characters, thus the total possibilities are len(alphabet) * len(alphabet) * ...
length
times which is len(alphabet)
to power length
.
After this we can ask "how many strings of length less than or equal to M
are there"? (question 1 from the initial problem statement). This is simply sum(len(alphabet) ** i for i in [0, M])
, also known as the geometric progression: (1 - La ** M) / 1 - La
where La = len(alphabet)
.
So for example if we have the alphabet [A, C, G, T]
and 64 bits available we can encode at most 32 characters according to Wolfram Alpha.
To find this we just need to count how many strings there are in the dictionary before our string (remember the dictionary is in lexicographical order).
A concrete example: our dictionary contains all the words of length at most 3 (M=3
) formed from the alphabet [A, B]
. What is the index of the word BA
? (we consider that index 0 is ''
– the empty string, index 1 is A
, index 2 is AA
and so on).
What is the position of BA
in our dictionary?
If we would only have words of length exactly K
we could compute this by considering BA
a number in base 2 (binary) where A=0
and B=1
, transform it to base 10 and have our answer (ie BA
-> 10b
-> 2
-> BA
is at position 2 – or is the 3rd word – in the dictionary AA, AB, BA, BB
).
However our dictionary contains all words of length exactly 0, 1, 2 and 3. So just consider each in turn!
In a dictionary containing the words from the alphabet [A, B]
of exactly length:
K=0
: BA
would have index 1K=1
: BA
would have index 2 which is the same as indexOf(B) + 1
K=2
: BA
would have index 2K=3
: BA
would have index 10, which is the same as indexOf(BAA)
So, to find the index of a string:
M
(the maximum length allowed for words in our dictionary)K
from our word by either (assuming our strings are zero indexed):
0
to K
(exclusive) if K < len(word)
K
K
by considering the (sub)word as a value written in base La
(La == length(alphabet)
). Add 1 if we’re in the first case since the longer word would come after the shorter ones.Or in Python 3 code:
def indexOf(self, word):
assert len(word) <= self.__max_len
result = 0
for i in range(0, self.__max_len + 1):
if i < len(word):
subword = word[:i]
result += self.__valueInBaseN(subword) + 1
else:
subword = word + (i - len(word)) * self.__alphabet[0]
result += self.__valueInBaseN(subword)
return result
Finally getting at the problem stated in the title. For this I noted how the dictionary can be constructed for length M
:
M=0
is just ''
(the empty string) and for M=1
the empty string plus the alphabet itself.M>1
take the dictionary for M-1
and prefix it with each of the characters from the alphabet. Finally add the empty string as the first element.So for example if we have [A, B]
as the alphabet then:
M=1
is 0: '', 1: A, 2: B
M=2
we replicate the above dictionary 2 times, first prefixing it with A
, then with B
and finally we add the empty string in front:0: '' 1: A 4: B
2: AA 5: BA
3: AB 6: BB
This suggests an algorithm for finding the solution:
len(dictionary) - 1 / len(alphabet)
len(dictionary)
is sum(len(alphabet) ** i for i in [0, K])
(see the initial observations)K
one less and look up the new value in the dictionary of length at most K
.A small worked example:
[A, B]
as the alphabet and M=2
. We want to find the word at 5 (which is BA
if you take a peak at the table above). So:B
) from 5 which leaves us with 1M=1
which is "A"Or in Python 3 code:
def wordAt(self, index):
assert 0 <= index <= self.__lastIndex
result, current_len = '', self.__max_len
while index > 0:
words_per_column = self.__wordsPerLetterForLen[current_len]
column_idx = (index - 1) // words_per_column
result += self.__alphabet[column_idx]
index_of_first_word_in_col = 1 + column_idx * words_per_column
index -= index_of_first_word_in_col
current_len -= 1
return result
Note: you can find a different algorithm to do the same on math.stackexchange.com, however I found the above to be visually more intuitive.
So we solved the initial problem (both the one stated in the title and the one which motivated this journey) however it took over a thousand words to describe and justify it. Can we do simpler? Turns out yes! We just need abandon our attachment to the lexicographical order and say that as long as we have a bijective encoding and decoding function with the property decode(encode(word)) == word
we are satisfied.
A simple and efficient function is the transformation of the word from base La
(length of alphabet) to base 10 and vice-versa. For example if we have [A, C, G, T]
as the alphabet and GAT
as the word we can do:
2*(4**2) + 0*(4**1) + 3*(4**0)
which is 33GAT
Again, the ordering will not be lexicographical (A, AA, AB, ...
) but rather a numerical-order kind-of (A, B, AA, AB, ...
) but the algorithm is much simpler and in the case that La
is a power of two, very efficient to implement on current CPUs since division / remainder can be done using bit-shifts / masking.
I didn’t actually want to encode DNA/RNA sequences, but rather mutations/variations which are pair of sequences (something like G -> TC
or GT -> ''
). Now I could just divide the 64 bits into two 32 bit chunks but the same initial question would arise: is this the most optimal way for encoding?
So we go the same solution: what if we would have a dictionary of all the variants ('' -> A, '' -> AA, ...
) and just index into it. How would we construct such a dictionary and how would we order it?
Turns out there is an algorithm inspired by the proof that there are the same number of natural numbers as there are rational ones. However that doesn’t give us a way to find the N-th element in the sequence but a Calkin–Wilf sequence does.
So we can have the following algorithm:
to -> from
as two numbers A
and B
(refer to the discussion until now how we can do that)A/B
A/B
fraction and then transform the numerator and denominator into the original sequencesThis is just speculation but it should work in theory. Also, it is fairly complicated so perhaps there is a better way to do it by making some simplifying assumptions? (like us eliminating the lexicographic ordering requirement).
A complete implementation of the above algorithms (with tests!) in Python 3 can be found on GitHub.
]]>In a lot of ways Science has become the religion of the day. We can’t go more than a day to hear / see / read a "news" story about "scientists" saying something about something important. We can’t help but feel dazzled, confused, perplexed, overwhelmed by these announcements. And we have discussions with others:
I’m going to argue that scientific thinking is really in a different league – but it also has many limitations. The core ideas of scientific thinking are:
This looks an awful lot like "the ten commandments of the science religion", doesn’t it? Because there is no reason to believe them intrinsically. There are only two things in favor of this mindset:
This is how our world seems to work – and thus these principles seem intrinsically true to a lot of us. Then again this "gut feeling" is no different from being convinced that a given kind of deity governs our lives
What is different from every other religion/philosophy however is that it contains the framework to extend it. You just observe, theorize and then try to falsify your theory. Voilà ! You’re adding to the scientific knowledge. All other religions are closed while the religion of science is open.
You might have observed that the principles – as described in the previous section – have somewhat of a wishy-washy nature. I keep using words like "likely" and "usually".
For example I said "if we flip a coin it will land either heads or tails but it generally won’t turn into a unicorn and fly away". I can’t say for certain that it won’t turn into a unicorn, but so far nobody reported a case of this happening, but we observed the coin landing heads or tails a lot of times so we’ll assign a very small probability to something else happening.
Scientific results are always probabilistic, but that is just life: if I go out tomorrow I might get hit by a car, but I most probably won’t. Note that other religions generally avoid saying anything about "this life/world" and reserve their proclamations for "the other/after life". At least science is trying to make some predictions, even though they sometimes turn out to be false.
This bears repeating: Science is not "truth". There is no such thing as a "scientific fact". Our scientific knowledge is simply a collection of theories which we failed to falsify – as of yet. Now, some of those theories have been around for a long time and have been found true in many situations and it’s prudent to do to act as if they were the absolute truth, but we must accept the fact that there is always the small possibility that they might turn out to be false.
To pound some more this idea: all the above is true even if we would "do science perfectly". However we are humans: subject to our biases, feelings and other motivations.
(a couple of words about induction)
Related to "there is no scientific truth" is the mirage of mathematics in science, which goes something like this: "we all know that 2+2 is 4, this scientist is using a mathematical formula, so whatever comes out of the formula must be true".
No!
Mathematics and science use very different ways of reasoning:
In math we use deduction: we state some ground rules ("axioms") and from those we deduce (prove) all the other statements. All the deduced statements are "true" (if we didn’t do any mistakes) since they are derived from the axioms which we assume to be true. This doesn’t mean however that it necessarily has any relation to the objective reality. (And just a side-note: even if we stay within the bounds of such imaginary systems – ie. don’t try to apply it to the "real world" – we will hit some fundamental limitations ^1)
Science however uses induction: if I see that both an apple and a rock is failing down, I guess that both "falling down" events happen because of the same cause. Such "rules of thumbs" ^2 work generally but can give the "wrong" results sometimes (as opposed to the rules of deduction which are always correct if we accept the initial axioms). Of course for a scientist these cases of "unexpected outcome" are the most interesting ones since they signal an opportunity to learn something new, but they can be most distressing if we think of science as "the source of truth".
So how does science use mathematics? It is just a precise way to describe hypothesis / theories and to manipulate those. Thus we might say that the maximum distance of a ball thrown is described by the equation v^2 / g
and we can make predictions about the distance of a ball will travel before throwing it (and verify after the fact that prediction was – mostly – correct) but the fact that we formulated the theory in mathematical term does not make it fundamentally more true than any other scientific theory. It still is "the best theory we have for now which isn’t refuted by evidence".
Coming back to the idea "science is not the absolute truth even if we would "do science perfectly":
We don’t do science perfectly:
And as if the above wasn’t enough, there is the problem of communicating the results (the step where "there is a 60% probability that eating a bit of chocolate improves bone density in white women over 50" turns into "Science Fact! Women should eat chocolate daily!").
To give some uplifting news: these issues are known and people are trying to work on it. There have been scientific journals set up to publish replicating studies and/or studies with negative results. There is a movement to encourage researchers to pre-register their studies (to state what data they’ll collect and how they will analyze it) and publish the results even if they are negative. Finally some journals require scientists to give a "simplified abstract" when publishing the research which can be adapted by journalists easier.
However these are hard issues and we can help out by not having unrealistic expectations.
A couple of final words about the science in fields like health (mental and physical) or economics: these are the fields which are the most important to us but where the difficulties presented multiply:
Again, people are working on addressing these issues, but it is just one more reason not to believe the "fad" articles published daily and shared virally on social media.
Science is not perfect, but it’s the best that we have.
]]>Disclaimer: I don’t mean to be picking on the particular organizations / projects / people who I’ll mention below. They are just examples of a larger trend I observed.
Sometimes (most of the times?) we forget just how powerful the machines in our pockets / bags / desks are and accept the inefficiencies of the software running on them. When we start to celebrate those inefficiencies, a line has to be drawn though. Two examples:
In 2013 Twitter claimed a record Tweets Per Second (TPS – cute :-)) of ~143k. Lets round that up to 150k and do some back-of-the envelope calculations:
So why do the have 20 to 40 times that many servers? This means that less than 10% (!) of their server capacity is actually used for business functions.
Second example: Google with DataStax came out with a blogpost about benchmarking a 300 node Cassandra cluster on Google Compute Engine. They claim a peak of 1.2M messages per second. Again, lets do some calculations:
This means that per server we use 7.3 MB/s network traffic and 6 MB/s disk traffic or 6% or a Gigabit connection and about 50% of medium quality spinning rust HDD.
My challenge to you is: next time you see such benchmarks do a quick back-of-the envelope calculation and if it uses less than 60% of the available throughput, call the people on it!
]]>