Humans are excellent at finding patterns in the world around them. This applies to cavemen understanding seasons and inventing agriculture, and today it applies to you and me, in our everyday experience of the world; but especially so to scientists whose job it is to find and unravel patterns in nature.
A typical scientific paper describes a concept (hypothesis) to be proved or disproved, how the data were collected and analysed, the findings and discussion. Somewhere between the data and discussion lies some statistics (no pun intended). This tricky maths can ascend to a form of art, and descend into despicable deceit. We must accept that, even with the best of intentions, scientists suffer from human failings: hopefully not outright dishonesty, but more likely bias which may be known or unknown to an individual. The scientific method provides for this, by forcing scientific claims to be repeatable, so if you do not believe someone’s word, you can follow their method and prove it yourself.
One way in which scientists can succumb is to over-analyse their data. This is a paradox: surely to find patterns in data, it must be subjected to the most detailed examination? The trouble is that our yearning for patterns means we find them where nothing real exists. When the scientific data are complicated and interrelated – for example, looking into race, diet and disease in a population – many links can be found. And if we compare enough results, chances are that we’ll find at least one connection. Statistical tools should tell us which ones are true, but we need to set very high thresholds to weed out false findings.
Data are often hard-won, expensive and time-consuming in their collection, and ever so precious to their owners. Imagine having followed a dozen HIV patients for ten years, taking blood samples every month. Would you and the consenting patients not wish the most to be made of this great contribution and do every possible test on the blood? Unfortunately, the more hard-won the raw data, the more likely it will be assaulted by an army of lab technicians and statisticians. The most common trick is to apply many tests to your data, but only report the ones which show positive results. For example, you look at all the people in Oxford to see if their hair color is related to their educational achievements. You look at men and women, children, adolescents and pensioners, foreign students, English, Scottish and Welsh subgroups; you move on to look at car owners, shop keepers, left-handed people, and on and on, and lo! A clear relationship between hair color and educational achievement is found in Chinese mathematics professors.
To avoid this problem, all major trials of new medicines now have to state in advance the hypotheses they will test, perverse though it may seem to deny oneself the opportunity to mine the data afterwards. However, other scientific research is not subject to the rules of clinical trials (like chemistry, environmental studies, physics, biology, geography and engineering, to name a few). When life and health are at stake, we rightly demand the highest standards of evidence, but why not apply this to other scientific fields? Apply it to your own life; raise your standards and question everything you’re told.