Mersenne Twister, Blum Blum Shub and other stories: H2G2 Mark II - the multi-dimensional guide with a secret purpose

Why the title?

The Hitchhikers Guide to the Galaxy is frequently almost nonsensical and that is what makes it appealing. Rendering linear thought impossible, the reader jumps through hyperspace and the Infinite Improbability Drive to understand the Universe as seen by Douglas Adams, and emerges a little wiser if not more confused.

An example that rings a bell in academia is the mice, who ask an all-knowing computer Deep Thought, the answer to life, universe and everything, only to be told that it is 42. Then, knowing that they've blown up taxpayer dollars, because they had not framed a proper question here, the answer fails to make sense. So they commission another experiment to build a computer to find what the Ultimate Question is, the answer to which is 42 because obviously, "What gives when you multiply 7 and 6?" lacks a ring to it. Sort of reminds you of labs that churn out data and worry about the analysis and the questions/hypothesis that lead to the story later. Written in the period, 1979-1992, they ring quite a bell in 2015.

This article introduces nothing new that you wouldn't have already realized if you are in the middle of your Ph.D, however, just like Mark II had a purpose, so does this piece, so read at your own risk

Long Story:

Once upon a time, a faculty told me that the format of a Ph.D was something that had survived all the way from Renaissance to modern-day. According to him, students start as apprentices, fulfilling the whims and fancies of their masters/supervisors. The grind, or tough-love or whatever he thought of it, was in fact necessary. Another faculty happened to tell me the same thing 3 years later rephrased as "these kids need to go through the grind because that is the only way the proper work-culture can be established".

Work-culture here implies a hierarchical structure where the kids are constantly in fear/awe of the faculty and are always corrected by them and rarely does it happen the other way round. Questions are expected from the students but only rhetorically, the student has the moral responsibility to show his work as the next big thing in the world no matter how visibly mediocre it maybe. A student cannot criticize his own work/methodology or existing frameworks of scientific discovery. The un-spoken pressure on the student is to show his work in a positive light and defend it to his death no matter how he feels about it in his own head.

Nonsense, this is killing them inside, their self-confidence goes to an all-time low, and they suffer from feelings of inadequacy and uselessness which can be debilitating. Always in a pressure to show their work in a good light takes a toll on their scientific temper and causes them to develop confirmation bias and become religious about their theories and hypotheses and causes them to irrationally become argumentative about it. Doesn’t that beat the whole point of doing a Ph.D in the first place?

What is missing here? Proponents of the current system argue that only through such a system, do they eventually become good scientists, eventually being the key word here. This system also ensures that the hierarchy does not get destroyed by chaos caused by arguments between faculty and students. It is a deeper question that needs to be addressed on why there are or should be arguments in the first place.

It is undeniable that everything comes with an expiration date and that applies to knowledge as well. Faculty who do not revise their basics and who can’t tell that a flat line along the x-axis signifies no correlation or having the tendency to use a battery of statistical tests until the data gives the answer that they are looking for are the dangerous people here. The picture of Science they give to their students is almost formulaic and unimaginative. The protocol is simple enough. Take two conditions, one control, one treated. Apply statistical tests (T-tests, other multivariate analysis) until some difference is found between the two and then do an enrichment analysis to find which entities (genes, proteins or metabolites) are differentially expressed and find the pathways corresponding to them. In this list of pathways find the ones that are enriched using a frequency difference test like the Chi-square or Fisher exact test. You can increase the samples and conditions and make it high-throughput and then looking at the gene ontology lists, try to guess at what is happening within the system.

They are too comfortable in their positions and while that is not necessarily a bad thing for them, it is a bad thing for their students. I’m not going to claim that it is their moral responsibility to take care of the intellectual growth of their Ph.D students but to passively damage their learning is also something that is entirely undesirable.

However I believe that if students are given all the relevant information from the beginning without overloading them with minutiae and assuming they’re individuals who are fairly systematic and sensible. It isn’t entirely impossible to believe that they can execute tasks which are considered the main-stay of scientific research, which, frankly if you consider biological sciences, are not arcane or inaccessible to a person with school level knowledge of calculus and algebra unlike some of the other sciences.

To presume apriori that they are dumb and need to be spoon-fed is a dis-service you do to them. You should probe and test them and see what they can do, if they can’t do it, then, there is no point force-feeding them. At this stage in life if they can’t be interested in something that they chose to do it is probably healthier for them to find an alternate career that they would love and enjoy. Encourage such students to leave because pushing them towards something that they clearly don’t like is a waste of time and energy for you and them. However if you insist that they need to be molded into replicas of you then you should go see a shrink about this control-freak, micro-managing condition that you seem to be developing.

What do I think we should do about it?

Encourage chaos in discussions and encourage them to question everything from the experiment design to the statistical tests and violated assumptions. Experimental designs should be criticized by having alternate designs and comparing them to see which experiment will answer the question posed while minimizing the confounding factors. When the merits and de-merits of each method are discussed an emergent thought process occurs and there is a lot more absorption when they are forced to think something through to the end rather than receive it passively as an instruction.

Statistical tests have been the same for quite some time now and their limitations are known, for instance, the inflation of the T-statistic when variance is low and that correlation should be accompanied by a P-value. Here what is important is that the students realize that the statistical test is not a means to the end. You don’t just acquire data using experiments and then sit down to apply statistical tests one by one until you get the answer you want. Rather (in another school of thought), the choice of the statistical test can more or less dictates the kind of experiment that should be performed and specify the number of replicates among other things.

For instance if you wish to find an association between two variables, one of which you can vary (like the addition of a compound) and one which you can observe (optical density). In this case correlation is the way to go to test for an association and regression will give you a model that allows you to predict the response of the system (output) given a certain quantified input of the drug.

However if you wanted to find out if the numerical difference of a particular parameter that you are observing between two groups is significantly different then you go for something like the T-test. The fact that the T-test involves a variance term means that you should have replicate observations in the groups whose significance of difference you are testing.

The above mentioned is a small example of the fact that the choice of your statistical test dictates the experiment that must be performed. To a small extent, this thought process prevents you from just gathering data without systematic planning and then trying to find patterns, patterns that could exist in random data as well. The P-values are an additional checkpoint but if you are trying to game the P-values as well, then we are talking about serious ethical issues here.

The standard classical hypothesis testing format is designed to give you the scientist, some lee-way in making mistakes while on the process of scientific discovery. A process which itself is a little labile and prone to error, but trying to subvert the procedure just to find anything, any association or differential expression is not a healthy career move.

Apart from all of this, learn to ask questions and formulate experiments to answer them. Questions do not come out of thin air, in fact they do but when they come out of thin air they aren’t the cleverest of questions and most of the times have been worked to death by someone, somewhere in the world. To begin addressing questions of importance, one must first know everything that is known. Only after the limits of knowledge are known is when you can start asking questions that nobody else has asked before and the journey to the answer for those questions will lead to the learning process that makes scientific research worthwhile.

However it is also possible that reading too much can confuse you. When you read too much, there is an overload of information and the inability to chew on it and digest it. So space out your reading and integrate it with your work, don’t read at a stretch and work at a stretch because you could get stuck in a rut that way, use one to freshen up the perspective on the other. Write down the interesting things that you read into your lab notebook, whether they be clever methods or new ways of statistical analysis.

Sometimes, knowing too much can also be paralyzing and render you unable to work with the sheer weight of all that knowledge inside your head. In that case, stop reading and start working. The hope here is that there is something truly unique about your perspective that the rest of the world doesn’t share with regards to the solution to your Ph.D problem and that is the fresh perspective that your work needs. It’s unique because it is this conscious thing in you that has absorbed all that you ever read about the things that you liked, your hobbies, your interests and the games you play and the puzzles you’ve solved, your abilities at any of the physical sciences, music, craft or engineering. They all contribute to your unique perspective which should advance the understanding of your Ph.D problem if not, help you solve it. Above all, have fun and appreciate the good things in life.

Mersenne Twister, Blum Blum Shub and other stories

Saturday, 7 March 2015

H2G2 Mark II - the multi-dimensional guide with a secret purpose

No comments:

Post a Comment

Blog Archive