Grey Panthers Savannah https://grey-panther.net Just another WordPress site Sun, 07 Apr 2024 12:43:09 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.2 206299117 Introducing Quantum Theory: A Graphic Guide to Science’s Most Puzzling Discovery https://grey-panther.net/2024/04/introducing-quantum-theory-a-graphic-guide-to-sciences-most-puzzling-discovery.html https://grey-panther.net/2024/04/introducing-quantum-theory-a-graphic-guide-to-sciences-most-puzzling-discovery.html#respond Sun, 07 Apr 2024 12:43:06 +0000 https://grey-panther.net/?p=1309

I found the “graphic guide” book concept intriguing, yet reading through multiple of them, I felt like they are more directories – pointers for further readings – rather than something I could use – while having little idea about the subject – as an introduction or overview into the subject.

This idea was further strengthened by Introducing Quantum Theory: A Graphic Guide to Science’s Most Puzzling Discovery. This one I found rather enjoyable and I think it gives a bit more of “human” face to the quantum theory – but also, I already had basic notion in physics.

]]>
https://grey-panther.net/2024/04/introducing-quantum-theory-a-graphic-guide-to-sciences-most-puzzling-discovery.html/feed 0 1309
Email setup https://grey-panther.net/2023/07/email-setup.html https://grey-panther.net/2023/07/email-setup.html#respond Sun, 16 Jul 2023 18:33:07 +0000 https://grey-panther.net/?p=1262 I have a couple of goals for my email setup:

  • It should be reliable
  • It should help protect my privacy by:
    • not unnecessarily exposing the contents to my discussions*
    • allow aliases to prevent easy cross-correlation between different sites**
  • Managing aliases should be easy
    • It should be easy to set up new aliases (possibly with a “catch-all” address, where all emails for the domain go)
    • Replying from an alias should be easy (or at least possible)

Which leads me to my current setup: use simplelogin.io from Proton with a fallback to Cloudflare Email Routing.

Advantages of this setup

  • Both Proton and Cloudflare are trusted companies (though this can be subjective, but I certainly rank them higher than the FAANGs)
  • The simplelogin software stack is open source, which means that it’s better audited and theoretically I could run it on my own if it makes sense
  • Both providers promise to only forward, never store your email
  • Simplelogin also provides some generic domains, which means that I can hide even more “in the crowd”, but using those generic domains when creating low-value accounts
  • Replying through a simplelogin account is simple (you “just reply” to the email), though it has some funkyness to it (simplelogin rewrites the email address to “man in the middle” the communication to achieve this – then again, it also includes the original email address in a custom email header)
  • Simplelogin has some advanced features (like “send email from this address to multiple recipients) that can be useful for families for example (where both parents want to get the communication from the school)
  • Simplelogin also has Bitwarden integration

Details of the setup

The description of the setup is probably shorter than the list of advantages, which is probably a good thing 🙂

  • Get a domain and “link it” to Cloudflare (aka. point the nameservers to the Cloudflare ones)
    • I’m assuming here that you already have a Cloudflare account
    • I’m also assuming here that you want to have a custom domain. If not, and just want to use the domains provided by Simplelogin, just create an account with them, done
    • Since I would like to separate my (little bit public) persona from my private persona (ie. why should Amazon know that the person ordering a book from them also runs a blog?), I also have a secondary, more private domain set up this way, in addition to grey-panther.net.
  • Enable Cloudflare “Email Routing” for your domain
  • Enable “Catch-all” for Cloudflare Email Routing and configure it to send to the preferred email address
    • Remember that this is just a fallback / backup solution, normally emails wouldn’t be routed here
  • Enable DMARC in Cloudflare to get some reports about bouncing emails. Alternatively you can use a third-party DMARC service like easydmarc.com to get periodic reports about potential email problems
  • Now go to your Simplelogin account and start setting up the domain
  • To set the MX records for the domain, you’ll need to go to Email > Email Routing > Settings in Cloudflare and click on “Start disabling”
    • Click “Unlock and keep DNS records”! This will allow us to use the Cloudflare email servers as backups later
  • Now continue with the Simplelogin DNS setup
    • Since the Simplelogin MX servers are added with priority “10” and “20” respectively, it means senders will generally prefer them and only fall back to Cloudflare if the simplelogin servers are not available
    • After you finish the setup of the domain in Simplelogin, you probably want to go to said domain > settings in Simplelogin and enable “Auto create/on the fly alias” (Catch-all)
  • Now we want to do a bit more tweaking to the DNS entries in Cloudflare:
    • We should update the SPF record to: v=spf1 include:simplelogin.co include:_spf.mx.cloudflare.net -all
    • (this allows Cloudflare to also forward emails when it acts as a fallback email server. Also, this says that emails for the domain not coming from the enumerated set of servers should be dropped. If you want to be less strict, you can use “~all” instead of “-all”. You can use tools like the SPF Record analyzer to double check that the SPF record is well formed)
    • Update the _dmarc record if you want to use EasyDMARC.com as instructed by the site. You probably want to set “p=reject” here.

That’s it! Here is again a the relevant DNS records for grey-panther.net:

;; CNAME Records
dkim02._domainkey.grey-panther.net. 1 IN CNAME dkim02._domainkey.simplelogin.co.
dkim03._domainkey.grey-panther.net. 1 IN CNAME dkim03._domainkey.simplelogin.co.
dkim._domainkey.grey-panther.net. 1 IN CNAME dkim._domainkey.simplelogin.co.

;; MX Records
grey-panther.net. 1 IN MX 20 mx2.simplelogin.co.
grey-panther.net. 1 IN MX 10 mx1.simplelogin.co.
grey-panther.net. 1 IN MX 147 amir.mx.cloudflare.net.
grey-panther.net. 1 IN MX 119 linda.mx.cloudflare.net.
grey-panther.net. 1 IN MX 163 isaac.mx.cloudflare.net.

;; TXT Records
_dmarc.grey-panther.net. 1 IN TXT "v=DMARC1;p=reject;rua=mailto:[email protected];ruf=mailto:[email protected];fo=1;"
grey-panther.net. 1 IN TXT "v=spf1 include:simplelogin.co include:_spf.mx.cloudflare.net include:sites.nearlyfreespeech.net -all"
grey-panther.net. 1 IN TXT "sl-verification=xznetmbmfgmkinlnopzlakneigjhzk"

Who can spy on me? (aka. threat model)

Nothing is perfect, and I’m enabling quite some people to spy on my in the worst case:

  • Both Proton and Cloudflare can decide to log my emails
    • Although Cloudflare is only a “low priority backup server” in this setup, if we assume that they are acting maliciously (or somebody took control of my Cloudflare account), they can remove the Simplelogin MX records and force email to be forwarded to whatever system they control.
  • If the hardware that runs Proton / Cloudflare services is compromised, I have the same problem
    • Although, hopefully, I’m too small of a fish for somebody who pulls that off to target me specifically (this goes back to “hiding between all the people)
  • My domain registrar (or somebody who gets access to my account there) can decide to repoint my domain to different nameservers that serve different MX registries
    • Not too much to do – just have complex passwords, 2FA and hope that the security of the registry / registrar is good enough
  • The final destination of the emails
    • I host the final address everything is forwarded to in the cloud, so that means that the specific cloud provider also has access to everything. I could use a different solution, but for now the sync-in between devices is just too convenient…

Alternatives considered

  • Self hosting email infrastructure
    • This would have given me the ultimate flexibility, but it would have also tasked me with monitoring and updating the service
  • Using a “catch all” email address with Google Workspace / Google Apps / whatever it’s called this week
    • It’s not all to difficult to set up
    • However, it requires a separate Workspace account that doesn’t work well with many other Google products
  • Migadu
    • Run out of Switzerland, just like Simplelogin/Proton
    • Can pay for it, just like Proton, to hopefully ensure that they’re around longer
    • However less well known, so I don’t feel like I have a good insight into “how they tick”
    • They’re more a “let’s make email hosting simple” kind of company, rather than focusing on privacy, which means they don’t provide additional “generally used” domains (which could be used to better hide in the crowd)

* Yes, unencrypted email can be considered mostly public anyway – still, basic security precautions like making sure that your email server speaks SSL/TLS for incoming and outgoing emails is useful.

** So, if I sign up with [email protected] for two different sites, it’s easy to conclude that it’s one person who owns both accounts. However if I use [email protected] for one site and [email protected] for the two different sites, it’s much less clear that there is the same person behind them.

]]>
https://grey-panther.net/2023/07/email-setup.html/feed 0 1262
Remembering the OG ad/malware blocking hosts file https://grey-panther.net/2022/09/remembering-the-og-ad-malware-blocking-hosts-file.html https://grey-panther.net/2022/09/remembering-the-og-ad-malware-blocking-hosts-file.html#respond Tue, 06 Sep 2022 12:53:44 +0000 https://grey-panther.net/?p=1201 For the longest time the first thing which I installed on new computers / computers I was asked to “help with” was the MVP hosts file (archive.org link). I credit this file with keeping many, many computers safe and running they way their owners intended to for almost two decades now.

Sadly it seems like the maintainer might have passed sometime last year (or is at least gravely ill). From the page:

Folks … sorry for the delay (again) in getting out an update … just got out of the Hospital … I now have some severe health issues to deal with (complete Kidney failure … need a Kidney transplant) plus another operation … large needles inserted into my spine …however I will try to better maintain the MVPS HOSTS file. Well just got back from Hospital again (excessive water in lungs)

If you could … please consider a donation. Thanks to all that contributed … every little bit helps.

https://winhelp2002.mvps.org/hosts.htm (archive.org link)

So, I donated – may it be of some use to them / their family! And I encourage to do the same if you benefited from this great file!

As for alternatives, there are several good ones:

  • I now use nextdns.io on the machines/mobile devices I maintain
  • pi-hole is also an alternative
  • Specifically for Windows, HostsMan is a good software to manage/update hosts files
  • Browser plugins like uBlockOrigin are also very useful

For the last decade it has been the case – and continues to be the case in my opinion – that ad/tracker blocking is the single most effective way to keep devices from being infected with all kinds of malware (and, it generally makes web browsing faster!)

]]>
https://grey-panther.net/2022/09/remembering-the-og-ad-malware-blocking-hosts-file.html/feed 0 1201
Oracle cloud https://grey-panther.net/2022/07/oracle-cloud.html https://grey-panther.net/2022/07/oracle-cloud.html#respond Sun, 24 Jul 2022 17:12:54 +0000 https://grey-panther.net/?p=1172 As they say – people don’t use Oracle because the IT department chose it :). This is also probably true for for their cloud offering :). Just of the top of my head:

  • Arcane login procedure
    • that doesn’t support 2FA
    • that prompts you to change your (randomly generated, high entry, kept in a password manager) password, even though the NIST has recommended against this practice for many years
    • which fails to actually log you out (!) – discovered this when I was trying to verify that my updated password worked
  • Machine console sometimes work and sometimes doesn’t
  • Arcane procedure to attach disks to VMs (to be fair: they show the commands in a popup window)
    • And even with these commands one can’t switch the boot disk of a given VM

They have a generous amount of free credit, but I wouldn’t recommend them for production use.

]]>
https://grey-panther.net/2022/07/oracle-cloud.html/feed 0 1172
Useful Cloudflare infos https://grey-panther.net/2022/07/useful-cloudflare-infos.html https://grey-panther.net/2022/07/useful-cloudflare-infos.html#respond Sun, 24 Jul 2022 13:30:35 +0000 https://grey-panther.net/?p=1170

Trying to set up CloudFlare Access and it seems that some information are hard to find:

  • The tunnel communicates over 7844/udp (important in case you want/have a restrictive firewall and/or your cloud provider requires to configure the node-independent firewall)
  • The authenticated user is specified by the Cf-Access-Authenticated-User-Email header. Other useful headers can be Cf-Connecting-Ip or Cf-Ipcountry.
  • To link the authentication with the tunnel you desire, simply configure the “Self-hosted application” on the same (sub)domain as the tunnel.
]]>
https://grey-panther.net/2022/07/useful-cloudflare-infos.html/feed 0 1170
A fresh start with… WordPress :) https://grey-panther.net/2022/05/a-fresh-start-with-wordpress.html https://grey-panther.net/2022/05/a-fresh-start-with-wordpress.html#comments Sun, 15 May 2022 13:41:32 +0000 https://grey-panther.net/?p=1122

In 2016 I wrote A fresh start with Pelican. And now, 6 years later I’m writing this. Lots has changed since then and lots has stayed the same. It still fills me with joy writing texts that may be useful to somebody.

So, what’s to like about WordPress? For one, it can do blogs (and websites in general – so I don’t have to keep up with the latest (micro)formats and trust that it handles them reasonably well) and for most usual things (like code highlighting), there are well supported plugins. It’s also F/LOSS software and portable – I must say I quite liked the interview with Matt on FLOSS Weekly.

An other big thing is that it supports comments – something which static websites generally don’t and the alternatives (like Disqus) don’t respect user’s privacy at the level I would like them to.

So type away your comments! (also, if you’re on the feedburner feed, please switch over to https://grey-panther.net/feed, because who knows how long the former will be around!).

But there are also a couple of things not to like about WordPress – for one, using it, I’m painting a big target on my back (lots of WordPress sites are getting hacked every day). I do believe that I’ve taken reasonable precautions against this (stay tuned for a description on how this is set up!), but it’s a risk.

Also, running dynamic websites is not free (though not astronomically expensive either). My main worry around this is that if I become incapacitated for a longer time, this content will disappear (and one big reason for me starting up writing the blog again is to have a documentation for my family for such cases – so that they can get technical help to access – and maintain if they wish to – all the digital trinkets I’m creating). Also stay tuned about my plans around this problem, but the short version is that I’m planning to mirror the content periodically to several “free” providers and hope that at least one of the mirrors will be around long enough.

Until the next time!

Image credits to rawpixel.com through PxHere.

]]>
https://grey-panther.net/2022/05/a-fresh-start-with-wordpress.html/feed 1 1122
An interesting proof for Pythagoras’s theorem https://grey-panther.net/2017/01/an-interesting-proof-for-pythagorass-theorem.html https://grey-panther.net/2017/01/an-interesting-proof-for-pythagorass-theorem.html#respond Thu, 05 Jan 2017 07:06:00 +0000 https://grey-panther.net/?p=1116

I recently saw an interesting proof for Pythagoras’s theorem in the MathHistory series which I wanted to share with y’all 🙂

So a quick reminder, Pythagoras’s theorem says that if we have a right-angle (90 degree) triangle, then there is the following relation between the length of the sides:

a = sqrt(b^2 + c^2) (where a is the length of the longest side) – and vice-versa.

The proof goes like this: lets rewrite the formula like a^2 = b^2 + c^2. We can interpret this geometrically as: (for a right-angled triangle) the are of the square constructed on the longer side is equal to the sum of the areas of the two squares constructed on the shorter sides.

And now the proof goes as follows:

  • consider a right angled triangle
  • "clone" it 4 times and put it together such that the longer sides form a square. Now the area of the inner square is a^2 while the area of the big square is a^2 + 4*At (At is the area of a triangle)
  • rearrange the triangles as shown. The outer square is still of the same size (the length of its side – a+b is the same) but now it can be written as b^2 + c^2 + 4At. Hence a^2 + 4*At = b^2 + c^2 + 4At which can be simplified to a^2 = b^2 + c^2, or if you prefer to a = sqrt(b^2 + c^2).

I only had one nagging feeling after seeing this proof – how do we know that the first big square constructed is actually a square. Can’t it be that its "edges" are not lines, but slightly crooked like below?

Fortunately we can use the fact that the angles in a triangle add up to 180 degrees (ie. a straight line) and show that the sides of the external triangle are indeed straight lines:

]]>
https://grey-panther.net/2017/01/an-interesting-proof-for-pythagorass-theorem.html/feed 0 1116
Finding the N-th word in a complete dictionary https://grey-panther.net/2017/01/finding-the-n-th-word-in-a-complete-dictionary.html https://grey-panther.net/2017/01/finding-the-n-th-word-in-a-complete-dictionary.html#respond Mon, 02 Jan 2017 04:49:00 +0000 https://grey-panther.net/?p=1114

Problem statement

Find the N-th word in a dictionary which contains all the words that can be generated from a given alphabet of length at most M (and sorted by the conventional dictionary sorting rule / lexicographical order).

As a short detour: why did I become interested in it? It was during my investigation of the upper limit for the number of strings formed from a given alphabet that can be encoded in a given number of bits. Even more concretely: what is the upper limit for the length of a DNA/RNA string formed from nucleotides (ie. a string with alphabet [A,C,G,T]) that can be encoded on 64 bits. Note: the problem statement that we need a codec (ie. both enCOding and DECoding, so we’ll solve a bit more generic problem than just the one-way one described in the title).

The first solution which came to mind was to use some bits for the length and the remaining bits to encode the nucleotides (2 bit / nucleotide) however the question remained: how many bits for the length? And is the solution optimal?

So finally I came up with the following formulation: consider that we have a dictionary of all the possible nucleotide strings for length at most M. Now let the 64 bit value just be an index in this dictionary. This is guaranteed to be the optimal solution (if we assume that the probability of occurrence for every string is the same). Now we need three things:

  1. what is the largest value of M for which the index can be stored on 64 bits?
  2. a time and space efficient way (ie. not generating the entire dictionary and keeping it in memory for lookup) to get the index of a given string (the enCOde step)
  3. the same to get the word at a given index (the DECode step)

There is also a somewhat related problem on Project Euler (24: Lexicographic permutations) – that wasn’t the inspiration though, I found out about it later.

Some initial observations

Just by writing out the complete set of words of length at most M formed from a given alphabet we can make some observations. For example consider the alphabet [A,B] and write out:

  • the words of length 0: '' (the empty string)
  • the words of length 1: A and B
  • the words of length 2: AA, AB, BA and BB

So pretty quickly we can see that for a given alphabet and a given length we have exactly len(alphabet) ** length possible words (where ** is the exponentiation operator – ie. a ** b is the b-th power of a), since: we have length positions, at each position we can have one of the len(alphabet) characters, thus the total possibilities are len(alphabet) * len(alphabet) * ... length times which is len(alphabet) to power length.

After this we can ask "how many strings of length less than or equal to M are there"? (question 1 from the initial problem statement). This is simply sum(len(alphabet) ** i for i in [0, M]), also known as the geometric progression: (1 - La ** M) / 1 - La where La = len(alphabet).

So for example if we have the alphabet [A, C, G, T] and 64 bits available we can encode at most 32 characters according to Wolfram Alpha.

Finding the index of a string

To find this we just need to count how many strings there are in the dictionary before our string (remember the dictionary is in lexicographical order).

A concrete example: our dictionary contains all the words of length at most 3 (M=3) formed from the alphabet [A, B]. What is the index of the word BA? (we consider that index 0 is '' – the empty string, index 1 is A, index 2 is AA and so on).

What is the position of BA in our dictionary?

If we would only have words of length exactly K we could compute this by considering BA a number in base 2 (binary) where A=0 and B=1, transform it to base 10 and have our answer (ie BA -> 10b -> 2 -> BA is at position 2 – or is the 3rd word – in the dictionary AA, AB, BA, BB).

However our dictionary contains all words of length exactly 0, 1, 2 and 3. So just consider each in turn!

In a dictionary containing the words from the alphabet [A, B] of exactly length:

  • K=0: BA would have index 1
  • K=1: BA would have index 2 which is the same as indexOf(B) + 1
  • K=2: BA would have index 2
  • K=3: BA would have index 10, which is the same as indexOf(BAA)

So, to find the index of a string:

  • Go from 0 to M (the maximum length allowed for words in our dictionary)
  • Generate a word of length K from our word by either (assuming our strings are zero indexed):
    • Taking the characters 0 to K (exclusive) if K < len(word)
    • Padding the word with the first character of the alphabet up to length K
  • Finding the index of this (sub)word in a dictionary that contains words of length exactly K by considering the (sub)word as a value written in base La (La == length(alphabet)). Add 1 if we’re in the first case since the longer word would come after the shorter ones.
  • Sum up all the values

Or in Python 3 code:

def indexOf(self, word):
    assert len(word) <= self.__max_len
    result = 0
    for i in range(0, self.__max_len + 1):
        if i < len(word):
            subword = word[:i]
            result += self.__valueInBaseN(subword) + 1
        else:
            subword = word + (i - len(word)) * self.__alphabet[0]
            result += self.__valueInBaseN(subword)
    return result

Finding the N-th string

Finally getting at the problem stated in the title. For this I noted how the dictionary can be constructed for length M:

  • the dictionary for M=0 is just '' (the empty string) and for M=1 the empty string plus the alphabet itself.
  • for M>1 take the dictionary for M-1 and prefix it with each of the characters from the alphabet. Finally add the empty string as the first element.

So for example if we have [A, B] as the alphabet then:

  • the dictionary for M=1 is 0: '', 1: A, 2: B
  • to construct the dictionary for M=2 we replicate the above dictionary 2 times, first prefixing it with A, then with B and finally we add the empty string in front:
0: ''  1: A    4: B
       2: AA   5: BA
       3: AB   6: BB

This suggests an algorithm for finding the solution:

  • take the value. Decide in "column" it would be.
    • you know the number of words in each column: len(dictionary) - 1 / len(alphabet)
    • len(dictionary) is sum(len(alphabet) ** i for i in [0, K]) (see the initial observations)
    • this can also be precomputed for efficiency
  • the column index gives you letter index in the alphabet
  • now subtract from the value the index of the first word in the given column. If you get 0, stop.
  • Otherwise make K one less and look up the new value in the dictionary of length at most K.

A small worked example:

  • lets say we have [A, B] as the alphabet and M=2. We want to find the word at 5 (which is BA if you take a peak at the table above). So:
  • in each column we have 3 words, so 5 is in the 2nd row (the row with index 1) which gives us "B" as the first letter
  • now subtract 4 (the index of the first word in the 2nd column – B) from 5 which leaves us with 1
  • now find the word with index 1 in a dictionary with M=1 which is "A"
  • thus the final word is "BA"

Or in Python 3 code:

def wordAt(self, index):
    assert 0 <= index <= self.__lastIndex
    result, current_len = '', self.__max_len
    while index > 0:
        words_per_column = self.__wordsPerLetterForLen[current_len]
        column_idx = (index - 1) // words_per_column
        result += self.__alphabet[column_idx]
        index_of_first_word_in_col = 1 + column_idx * words_per_column
        index -= index_of_first_word_in_col
        current_len -= 1
    return result

Note: you can find a different algorithm to do the same on math.stackexchange.com, however I found the above to be visually more intuitive.

Can we do it simpler?

So we solved the initial problem (both the one stated in the title and the one which motivated this journey) however it took over a thousand words to describe and justify it. Can we do simpler? Turns out yes! We just need abandon our attachment to the lexicographical order and say that as long as we have a bijective encoding and decoding function with the property decode(encode(word)) == word we are satisfied.

A simple and efficient function is the transformation of the word from base La (length of alphabet) to base 10 and vice-versa. For example if we have [A, C, G, T] as the alphabet and GAT as the word we can do:

  • encode: 2*(4**2) + 0*(4**1) + 3*(4**0) which is 33
  • decode: 33 is written as powers of 4 as above and 2, 0, 3 corresponds to GAT

Again, the ordering will not be lexicographical (A, AA, AB, ...) but rather a numerical-order kind-of (A, B, AA, AB, ...) but the algorithm is much simpler and in the case that La is a power of two, very efficient to implement on current CPUs since division / remainder can be done using bit-shifts / masking.

More speculation

I didn’t actually want to encode DNA/RNA sequences, but rather mutations/variations which are pair of sequences (something like G -> TC or GT -> ''). Now I could just divide the 64 bits into two 32 bit chunks but the same initial question would arise: is this the most optimal way for encoding?

So we go the same solution: what if we would have a dictionary of all the variants ('' -> A, '' -> AA, ...) and just index into it. How would we construct such a dictionary and how would we order it?

Turns out there is an algorithm inspired by the proof that there are the same number of natural numbers as there are rational ones. However that doesn’t give us a way to find the N-th element in the sequence but a Calkin–Wilf sequence does.

So we can have the following algorithm:

  • represent the pair to -> from as two numbers A and B (refer to the discussion until now how we can do that)
  • use the Calkin-Wilf sequence (combined with the continued fraction formula) to find the index of A/B
  • or conversely use the sequence to transform the index into the A/B fraction and then transform the numerator and denominator into the original sequences

This is just speculation but it should work in theory. Also, it is fairly complicated so perhaps there is a better way to do it by making some simplifying assumptions? (like us eliminating the lexicographic ordering requirement).

Source code

A complete implementation of the above algorithms (with tests!) in Python 3 can be found on GitHub.

]]>
https://grey-panther.net/2017/01/finding-the-n-th-word-in-a-complete-dictionary.html/feed 0 1114
The limits of science https://grey-panther.net/2016/09/the-limits-of-science.html https://grey-panther.net/2016/09/the-limits-of-science.html#respond Thu, 08 Sep 2016 04:44:00 +0000 https://grey-panther.net/?p=1110

In a lot of ways Science has become the religion of the day. We can’t go more than a day to hear / see / read a "news" story about "scientists" saying something about something important. We can’t help but feel dazzled, confused, perplexed, overwhelmed by these announcements. And we have discussions with others:

  • I heard that scientists say that X cures cancer.
  • Didn’t you hear? Scientist announced that X causes cancer How is this really different from saying "my shaman said"?

The core principles of science

I’m going to argue that scientific thinking is really in a different league – but it also has many limitations. The core ideas of scientific thinking are:

  • We can observe things and those observations have some relation to the objective reality
  • This objective reality is probabilistically deterministic (ie. if we flip a coin it will land either heads or tails but it generally won’t turn into a unicorn and fly away)
  • Based on our observations we can construct models / hypothesis about cause and effect. While there can be an infinite amount of hypothesis, we assume that reality has a simple and elegant structure and thus prefer the hypothesis which explains the largest category of events with the minimal assumptions
  • Scientific hypothesis need to be testable (also called "falsifiable") – ie. they need to be forward looking / have predictive power. For example if I say "object alway fall downwards" then you or I can take a ball, a key, a rock, etc and verify that indeed, when I let go of it, it falls towards the ground. However such positive outcomes have little value (they increase the likelihood that the hypothesis / theory is true only by a small amount) and the most valuable things are falsifications – when somebody makes a prediction based on the theory but then a different outcome is observed – which most likely means that the theory is false (unless there was an error in the experiment).

This looks an awful lot like "the ten commandments of the science religion", doesn’t it? Because there is no reason to believe them intrinsically. There are only two things in favor of this mindset:

This is how our world seems to work – and thus these principles seem intrinsically true to a lot of us. Then again this "gut feeling" is no different from being convinced that a given kind of deity governs our lives

What is different from every other religion/philosophy however is that it contains the framework to extend it. You just observe, theorize and then try to falsify your theory. Voilà! You’re adding to the scientific knowledge. All other religions are closed while the religion of science is open.

Science is not "truth"

You might have observed that the principles – as described in the previous section – have somewhat of a wishy-washy nature. I keep using words like "likely" and "usually".

For example I said "if we flip a coin it will land either heads or tails but it generally won’t turn into a unicorn and fly away". I can’t say for certain that it won’t turn into a unicorn, but so far nobody reported a case of this happening, but we observed the coin landing heads or tails a lot of times so we’ll assign a very small probability to something else happening.

Scientific results are always probabilistic, but that is just life: if I go out tomorrow I might get hit by a car, but I most probably won’t. Note that other religions generally avoid saying anything about "this life/world" and reserve their proclamations for "the other/after life". At least science is trying to make some predictions, even though they sometimes turn out to be false.

This bears repeating: Science is not "truth". There is no such thing as a "scientific fact". Our scientific knowledge is simply a collection of theories which we failed to falsify – as of yet. Now, some of those theories have been around for a long time and have been found true in many situations and it’s prudent to do to act as if they were the absolute truth, but we must accept the fact that there is always the small possibility that they might turn out to be false.

To pound some more this idea: all the above is true even if we would "do science perfectly". However we are humans: subject to our biases, feelings and other motivations.

Science is not math

(a couple of words about induction)

Related to "there is no scientific truth" is the mirage of mathematics in science, which goes something like this: "we all know that 2+2 is 4, this scientist is using a mathematical formula, so whatever comes out of the formula must be true".

No!

Mathematics and science use very different ways of reasoning:

In math we use deduction: we state some ground rules ("axioms") and from those we deduce (prove) all the other statements. All the deduced statements are "true" (if we didn’t do any mistakes) since they are derived from the axioms which we assume to be true. This doesn’t mean however that it necessarily has any relation to the objective reality. (And just a side-note: even if we stay within the bounds of such imaginary systems – ie. don’t try to apply it to the "real world" – we will hit some fundamental limitations ^1)

Science however uses induction: if I see that both an apple and a rock is failing down, I guess that both "falling down" events happen because of the same cause. Such "rules of thumbs" ^2 work generally but can give the "wrong" results sometimes (as opposed to the rules of deduction which are always correct if we accept the initial axioms). Of course for a scientist these cases of "unexpected outcome" are the most interesting ones since they signal an opportunity to learn something new, but they can be most distressing if we think of science as "the source of truth".

So how does science use mathematics? It is just a precise way to describe hypothesis / theories and to manipulate those. Thus we might say that the maximum distance of a ball thrown is described by the equation v^2 / g and we can make predictions about the distance of a ball will travel before throwing it (and verify after the fact that prediction was – mostly – correct) but the fact that we formulated the theory in mathematical term does not make it fundamentally more true than any other scientific theory. It still is "the best theory we have for now which isn’t refuted by evidence".

Other limitations of science

Coming back to the idea "science is not the absolute truth even if we would "do science perfectly":

We don’t do science perfectly:

  • We have biases and when we have a theory we might not look "hard enough" for evidence to refute it
  • The academic structure is not set up to encourage verification of results: publishing replication results or negative results is discouraged (may scientific journals don’t even accept them for publication)
  • Science is entirely probabilistic (it only tells you what probably is true), however statistics (the branch of mathematics which deals with probabilities) is complicated and it’s easy to make mistakes when judging what constitutes probably true (Is Most Published Research Wrong?)
  • In some very interesting fields it is hard to conduct experiments (more on this in the next section)

And as if the above wasn’t enough, there is the problem of communicating the results (the step where "there is a 60% probability that eating a bit of chocolate improves bone density in white women over 50" turns into "Science Fact! Women should eat chocolate daily!").

To give some uplifting news: these issues are known and people are trying to work on it. There have been scientific journals set up to publish replicating studies and/or studies with negative results. There is a movement to encourage researchers to pre-register their studies (to state what data they’ll collect and how they will analyze it) and publish the results even if they are negative. Finally some journals require scientists to give a "simplified abstract" when publishing the research which can be adapted by journalists easier.

However these are hard issues and we can help out by not having unrealistic expectations.

Words about human-centric fields

A couple of final words about the science in fields like health (mental and physical) or economics: these are the fields which are the most important to us but where the difficulties presented multiply:

  • Generally we can’t do random double-bind trials where we select a group of people and infect them with AIDS or make them live on less than $1 a day (you know, because we value human life)
  • This means that we can only study people who already are in this situation but that makes it very likely that we confuse cause and effect
  • Even if the experiment is non-intrusive (or we’re doing an observational study) it is very hard to get a diverse set of participants (ie. most psychology experiments are done on young white males in the US – no wonder that they don’t replicate across the world)

Again, people are working on addressing these issues, but it is just one more reason not to believe the "fad" articles published daily and shared virally on social media.

Science is not perfect, but it’s the best that we have.

]]>
https://grey-panther.net/2016/09/the-limits-of-science.html/feed 0 1110
A fresh start with Pelican https://grey-panther.net/2016/01/a-fresh-start-with-pelican.html https://grey-panther.net/2016/01/a-fresh-start-with-pelican.html#respond Mon, 04 Jan 2016 04:15:56 +0000 https://grey-panther.net/?p=1107 Here we are in 2016, trying to start blogging again. Using Pelican is more complicate than it needs to be :-(.

]]>
https://grey-panther.net/2016/01/a-fresh-start-with-pelican.html/feed 0 1107