Saturday, 7 March 2015

Chicken Soup for the Ph.D soul or Hitchhikers Guide to the Ph.D


The title is taken from a popular series of books that help different classes of people deal with different sad things in their life. So you have Chicken Soup for the Christian Soul or Chicken Soup for the Teenage Soul and over 103 other titles if you are interested. Obviously, it is much more enriching to just save money and skip these titles and buy yourself a copy of the ultimate Hitchhikers Guide to the Galaxy by Douglas Adams and if you aren't too puritanical and hated what happened to Arthur, buy Part 6 of 3 "And Another Thing" H2G2 by Eoin Colfer who continues where Douglas left off without ruining the effect and effectively executing a near perfect ghost-write.

However this post isn't really about the books, which will probably merit a post of their own but rather a survey of my mostly incomplete surmise of what it means to be in a Ph.D and how it should have ideally been (although every Ph.D may have a different opinion on this). Of course when I write all of this, I write it out of personal experience with nary an exaggeration. I'm at that stage in a Ph.D where I know that I enjoy thinking unfettered but I'm dismal about how reality fails to keep pace with imagination where I work. Working in computational biology allows me to simulate things which I couldn't using pen or paper or my imagination. I've learned a lot more about philosophy and the empirical nature of cutting edge science from this field.

The rant that follows came out of my experience of being a teaching assistant in a class teaching kids how to use the Arduino to create novel diagnostic devices, which is the mandate of the institute that I work in. We had one batch which was more or less successful in coming up with ideas which whilst not exactly being novel, were quite good in what they were able to achieve for the price with commercial devices being as much as 100 times more expensive. However pedagogy, of which I have no professional experience, to me dictates that setting a restriction on the class of devices that can be made can be unnecessarily constraining the imagination of kids.

They end up having to think of something new in an almost saturated field which naturally leads to more derivative work with incremental improvements. It is quite similar to being an apprentice to a carpenter and making a cupboard as the final test of your ability (ref. Ph.D supervisor, graduate student and Thesis). The debate of whether the system is broken or not could go on forever so we stop there.

It is true that science proceeds incrementally most of the time which only a few jumps, few and far in between that advance a new field. That field is eventually saturated again and the cycle repeats.

There is a huge emphasis on not re-doing work that has already been done unless there is some novelty. While the perspective makes sense, somehow I feel that we are robbing the kids of the rich experience of being able to think through the whole process of scientific discovery afresh. We create pipe-lines and simple push-button interfaces and the dirty details of statistical analysis and data processing are black-boxed and the output is pretty graphs and plots which make for great publications and funding and that is altogether necessary.

This creates the Nintendo generation of Ph.D's, the ones who probably rightly presuppose that science is about putting samples in, pressing buttons, putting data into pipelines and getting plots and differential expression analysis, P-values, Gene/ Protein enrichment and Ontology analysis to make up a story about the list of differentially expressed Genes/Proteins which were found.

I keep hearing statements like, "How does GeneSpring calculate the P-values for differential expression? Oh, I don't care about the details of how it happens, a statement of the factoid would be enough for my presentation". Such questions make me wonder if my stress on knowing the details of what I am doing and plumbing to the depths of statistical analysis is really worth it. Am I wasting my time trying to find out how statistical tests work and where they possibly wouldn't and trying to understand them from an algebraic and geometric perspective? Should I flush myself into the Pipeline as well?

Currently, no. I've found a strange love for statistics which I never had before. Knowing that there are these formulae which take numbers and bring a certain predictability about their outputs and how linear algebra, calculus, co-ordinate geometry, probability, permutations & combinations and polynomial algebra come together to make a subject that allows us to estimate uncertainty in this random unpredictable world gives some sort of comfort. The fact that these methods could still not lead to the right decision and make wrong choices makes the whole process very organic. I don't mind being wrong about something I learned, because eventually I will find out that I was wrong and learn something that is less wrong than what I already thought I knew. The world just keeps getting better everyday and it never ends.

Of course the hypocritical human that I am, you might find yours truly one day dunking all the statistics and advertising his publications on this blog. However, not for sometime, so don't worry.

So coming back to the point, is it reasonable to restrict kids to make a particular class of device or should we let them thing of something utterly fantastic and unimaginable and then explore the currently feasible technology to find how much of their idea could be implemented into a novel device that might be far from diagnostic, but would have been a good learning experience when they would work on it. It would teach them to think outside the conventional rules and innovate and come up with alternative instead of the "standard way of doing things".

Most of the things that have been designed to make life in the laboratory easier have already been invented. They are expensive yes, but they are available in a place like the one where I work. So there is no point in having a device that can convert a manual pipette into an electronic one because in the eyes of the faculty that is re-inventing the wheel.

However, how creative is designing a sensor that measures some biological parameter? How about making an EKG for zebrafish? It is quite expensive to buy one, so the point is that you make a cheaper alternative. The question now is whether the faculty would trust such a device to make quantitative measurements that they would trust enough to send for publication. There is this mentality of believing that nothing good can come out of your own house on an off-beat, out-of-course topic especially in India as far as my experience suggests.  The compartmentalization of the sciences ensures that a biologist would never dare to take an approximation that a physicist might make with his system to gain a broader perspective. Since this is hammered into their heads when the subjects are divided into Science, Arts and Humanities at 10th (Grade), these kids then lock their perspectives into something resembling a fundamentalist outlook that defines everything in terms of their specializations and refusal to borrow ideologies.

So while we swoon at 10th grade kids abroad in the Google Science Fair who have made PCR cyclers using the Arduino with a simple PWM program, we tell ourselves, conveniently over-looking the fact that the West does not compartmentalize knowledge and lets kids do whatever they want. We worship the phoren baby geniuses and moan about how our kids are really no good at all. Truth be told, we are quite responsible for the present state and if I were to be truthful, I think (at the risk of sounding nationalistic), our kids could do a lot more. Only if we let our kids explore instead of stunting them into regimented engineering courses that are supposed to render them employable in completely unrelated fields that just needs cheap programmers to fuel code for software that operates overseas and makes money there. What our kids need are shoulders to stand upon. We should let them see ahead of what we already know so that they can do better. There is no point in hammering them into the ground and then letting them rise to a lower level that you are comfortable with because they aren't threatening your position in the hierarchy.

We as students are made to go through a culture of shaming where the student is given information that is incomplete and made to work on the problem and when they fail because of the incomplete information. The supervisor/senior/post-doc  tells them what a bunch of miserable morons they are for not having made sure of all the details and how they are going to need all the help they can get because they are truly useless when it comes to scientific methodology and how scientific methodology is attained through endless hours of toil and sacrifice at the Altar of Science.

Here's looking at you kids, the system sucks, don't be sucked into it. Rebel in your hearts, seek out what you would love to learn. The internet has broken all barriers to knowledge except for the most specialized kind with commercial, legal or national security interests. Explore, as much as you can because there is something in this world that might just excite that truly unique brain of yours and cause you to come up with something new. There is no shame in quitting something that you can't really be bothered to be interested in because frankly being mediocre at something when everyone else around you is much better can be damaging to your self-esteem unless you are the kind of person who turns it into a personal challenge to learn something new. If you aren't that fired up, then quit, it would be nice for everyone including you. Find something new to love and do.

H2G2 Mark II - the multi-dimensional guide with a secret purpose


Why the title?

The Hitchhikers Guide to the Galaxy is frequently almost nonsensical and that is what makes it appealing. Rendering linear thought impossible, the reader jumps through hyperspace and the Infinite Improbability Drive to understand the Universe as seen by Douglas Adams, and emerges a little wiser if not more confused.

An example that rings a bell in academia is the mice, who ask an all-knowing computer Deep Thought, the answer to life, universe and everything, only to be told that it is 42. Then, knowing that they've blown up taxpayer dollars, because they had not framed a proper question here, the answer fails to make sense. So they commission another experiment to build a computer to find what the Ultimate Question is, the answer to which is 42 because obviously, "What gives when you multiply 7 and 6?" lacks a ring to it. Sort of reminds you of labs that churn out data and worry about the analysis and the questions/hypothesis that lead to the story later. Written in the period, 1979-1992, they ring quite a bell in 2015.

This article introduces nothing new that you wouldn't have already realized if you are in the middle of your Ph.D, however, just like Mark II had a purpose, so does this piece, so read at your own risk

Long Story:
Once upon a time, a faculty told me that the format of a Ph.D was something that had survived all the way from Renaissance to modern-day. According to him, students start as apprentices, fulfilling the whims and fancies of their masters/supervisors. The grind, or tough-love or whatever he thought of it, was in fact necessary. Another faculty happened to tell me the same thing 3 years later rephrased as "these kids need to go through the grind because that is the only way the proper work-culture can be established".

Work-culture here implies a hierarchical structure where the kids are constantly in fear/awe of the faculty and are always corrected by them and rarely does it happen the other way round. Questions are expected from the students but only rhetorically, the student has the moral responsibility to show his work as the next big thing in the world no matter how visibly mediocre it maybe. A student cannot criticize his own work/methodology or existing frameworks of scientific discovery. The un-spoken pressure on the student is to show his work in a positive light and defend it to his death no matter how he feels about it in his own head.


Nonsense, this is killing them inside, their self-confidence goes to an all-time low, and they suffer from feelings of inadequacy and uselessness which can be debilitating. Always in a pressure to show their work in a good light takes a toll on their scientific temper and causes them to develop confirmation bias and become religious about their theories and hypotheses and causes them to irrationally become argumentative about it. Doesn’t that beat the whole point of doing a Ph.D in the first place?

What is missing here? Proponents of the current system argue that only through such a system, do they eventually become good scientists, eventually being the key word here. This system also ensures that the hierarchy does not get destroyed by chaos caused by arguments between faculty and students. It is a deeper question that needs to be addressed on why there are or should be arguments in the first place. 

It is undeniable that everything comes with an expiration date and that applies to knowledge as well. Faculty who do not revise their basics and who can’t tell that a flat line along the x-axis signifies no correlation or having the tendency to use a battery of statistical tests until the data gives the answer that they are looking for are the dangerous people here.  The picture of Science they give to their students is almost formulaic and unimaginative. The protocol is simple enough. Take two conditions, one control, one treated. Apply statistical tests (T-tests, other multivariate analysis) until some difference is found between the two and then do an enrichment analysis to find which entities (genes, proteins or metabolites) are differentially expressed and find the pathways corresponding to them. In this list of pathways find the ones that are enriched using a frequency difference test like the Chi-square or Fisher exact test. You can increase the samples and conditions and make it high-throughput and then looking at the gene ontology lists, try to guess at what is happening within the system.

They are too comfortable in their positions and while that is not necessarily a bad thing for them, it is a bad thing for their students. I’m not going to claim that it is their moral responsibility to take care of the intellectual growth of their Ph.D students but to passively damage their learning is also something that is entirely undesirable.

However I believe that if students are given all the relevant information from the beginning without overloading them with minutiae and assuming they’re individuals who are fairly systematic and sensible. It isn’t entirely impossible to believe that they can execute tasks which are considered the main-stay of scientific research, which, frankly if you consider biological sciences, are not arcane or inaccessible to a person with school level knowledge of calculus and algebra unlike some of the other sciences.

To presume apriori that they are dumb and need to be spoon-fed is a dis-service you do to them. You should probe and test them and see what they can do, if they can’t do it, then, there is no point force-feeding them. At this stage in life if they can’t be interested in something that they chose to do it is probably healthier for them to find an alternate career that they would love and enjoy. Encourage such students to leave because pushing them towards something that they clearly don’t like is a waste of time and energy for you and them. However if you insist that they need to be molded into replicas of you then you should go see a shrink about this control-freak, micro-managing condition that you seem to be developing.

What do I think we should do about it?

Encourage chaos in discussions and encourage them to question everything from the experiment design to the statistical tests and violated assumptions. Experimental designs should be criticized by having alternate designs and comparing them to see which experiment will answer the question posed while minimizing the confounding factors. When the merits and de-merits of each method are discussed an emergent thought process occurs and there is a lot more absorption when they are forced to think something through to the end rather than receive it passively as an instruction.

Statistical tests have been the same for quite some time now and their limitations are known, for instance, the inflation of the T-statistic when variance is low and that correlation should be accompanied by a P-value. Here what is important is that the students realize that the statistical test is not a means to the end. You don’t just acquire data using experiments and then sit down to apply statistical tests one by one until you get the answer you want. Rather (in another school of thought), the choice of the statistical test can more or less dictates the kind of experiment that should be performed and specify the number of replicates among other things.

For instance if you wish to find an association between two variables, one of which you can vary (like the addition of a compound) and one which you can observe (optical density). In this case correlation is the way to go to test for an association and regression will give you a model that allows you to predict the response of the system (output) given a certain quantified input of the drug.

However if you wanted to find out if the numerical difference of a particular parameter that you are observing between two groups is significantly different then you go for something like the T-test. The fact that the T-test involves a variance term means that you should have replicate observations in the groups whose significance of difference you are testing.

The above mentioned is a small example of the fact that the choice of your statistical test dictates the experiment that must be performed. To a small extent, this thought process prevents you from just gathering data without systematic planning and then trying to find patterns, patterns that could exist in random data as well. The P-values are an additional checkpoint but if you are trying to game the P-values as well, then we are talking about serious ethical issues here.   

The standard classical hypothesis testing format is designed to give you the scientist, some lee-way in making mistakes while on the process of scientific discovery. A process which itself is a little labile and prone to error, but trying to subvert the procedure just to find anything, any association or differential expression is not a healthy career move.

Apart from all of this, learn to ask questions and formulate experiments to answer them. Questions do not come out of thin air, in fact they do but when they come out of thin air they aren’t the cleverest of questions and most of the times have been worked to death by someone, somewhere in the world. To begin addressing questions of importance, one must first know everything that is known. Only after the limits of knowledge are known is when you can start asking questions that nobody else has asked before and the journey to the answer for those questions will lead to the learning process that makes scientific research worthwhile.



However it is also possible that reading too much can confuse you. When you read too much, there is an overload of information and the inability to chew on it and digest it. So space out your reading and integrate it with your work, don’t read at a stretch and work at a stretch because you could get stuck in a rut that way, use one to freshen up the perspective on the other. Write down the interesting things that you read into your lab notebook, whether they be clever methods or new ways of statistical analysis.


Sometimes, knowing too much can also be paralyzing and render you unable to work with the sheer weight of all that knowledge inside your head. In that case, stop reading and start working. The hope here is that there is something truly unique about your perspective that the rest of the world doesn’t share with regards to the solution to your Ph.D problem and that is the fresh perspective that your work needs. It’s unique because it is this conscious thing in you that has absorbed all that you ever read about the things that you liked, your hobbies, your interests and the games you play and the puzzles you’ve solved, your abilities at any of the physical sciences, music, craft or engineering. They all contribute to your unique perspective which should advance the understanding of your Ph.D problem if not, help you solve it. Above all, have fun and appreciate the good things in life.