Topic 1: Statistical analysis

  Assessment statement Obj Teacher’s notes
1.1.1 State that error bars are a graphical representation of the variability of data. 1 Error bars can be used to show either the range of the data or the standard deviation.

“Using error bars in experimental Biology” by Geoff Cumming, Fiona Fidler, and David L. Vaux. (Journal of cell biology) or you can download here as well. error bars in experimental biology

  Assessment statement Obj Teacher’s notes
1.1.2 Calculate the mean and standard deviation of a set of values. 2 Students should specify the standard deviation (s), not the population standard deviation.Students will not be expected to know the formulas for calculating these statistics. They will be expected to use the standard deviation function of a graphic display or scientific calculator.Aim 7: Students could also be taught how to calculate standard deviation using a spreadsheet computer program.

Calculate mean and Standard Deviation with Excel here

Assessment statement Obj Teacher’s notes
1.1.3 State that the term standard deviation is used to summarize the spread of values around the mean, and that 68% of the values fall within one standard deviation of the mean. 1 For normally distributed data, about 68% of all values lie within ±1 standard deviation (s or σ) of the mean. This rises to about 95% for ±2 standard deviations.
1.1.4 Explain how the standard deviation is useful for comparing the means and the spread of data between two or more samples. 3 A small standard deviation indicates that the data is clustered closely around the mean value. Conversely, a large standard deviation indicates a wider spread around the mean.
Standard Deviation

Standard Deviation

In statistics and probability theory, standard deviation (represented by the symbol σ) shows how much variation or “dispersion” exists from the average (mean, or expected value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data points are spread out over a large range of values. from wikipedia

Assessment statement Obj Teacher’s notes
1.1.5 Deduce the significance of the difference between two sets of data using calculated values for t and the appropriate tables. 3 For the t-test to be applied, the data must have a normal distribution and a sample size of at least 10. The t-test can be used to compare two sets of data and measure the amount of overlap. Students will not be expected to calculate values of t. Only a two-tailed, unpaired t-test is expected.Aim 7: While students are not expected to calculate a value for the t-test, students could be shown how to calculate such values using a spreadsheet program or the graphic display calculator.TOK: The scientific community defines an objective standard by which claims about data can be made.

Are two sets of data really different?

Click here to perform Student’s t-test

Click here to perform Student’s t-test via copy and paste

If we have two collections of maple leaves (i.e., two samples), it is quite likely that in detail the collections are different: different highs, lows, and average leaf sizes. Is the measured difference in average leaf size large enough that we should reject the null hypothesis that in fact such differences are due to “chance”? Given the above sort of information on the likely range for the actual mean of each sample, the question basically reduces to whether the likely ranges overlap (in which case the means could be the same: in the overlap of the intervals, and we may not reject the null hypothesis) or if they do not overlap (in which case we must reject the null hypothesis: the difference is most likely not due to chance). To report the variety of possible outcomes: from means not “significantly” different to means in fact “significantly” different, the probability that the difference is due to chance is reported. Reject the null hypothesis if P is “small”. from here

or read this from the Handbook of Biological Statistics or some worked examples here.

Assessment statement Obj Teacher’s notes
1.1.6 Explain that the existence of a correlation does not establish that there is a causal relationship between two variables. 3 Aim 7: While calculations of such values are not expected, students who want to use r and r 2 values in their practical work could be shown how to determine such values using a spreadsheet program.

What is the difference between causation and correlation?

One of the most common errors we find in the press is the confusion between correlation and causation in scientific and health-related studies. In theory, these are easy to distinguish — an action or occurrence can cause another (such as smoking causes lung cancer), or it can correlate with another (such as smoking is correlated with alcoholism). If one action causes another, then they are most certainly correlated. But just because two things occur together does not mean that one caused the other, even if it seems to make sense.

Unfortunately, our intuition can lead us astray when it comes to distinguishing between causality and correlation. For example, eating breakfast has long been correlated with success in school for elementary school children. It would be easy to conclude that eating breakfast causes students to be better learners. It turns out, however, that those who don’t eat breakfast are also more likely to be absent or tardy — and it is absenteeism that is playing a significant role in their poor performance. When researchers retested the breakfast theory, they found that, independent of other factors, breakfast only helps undernourished children perform better.

Many many studies are actually designed to test a correlation, but are suggestive of “reasons” for the correlation. People learn of a study showing that “girls who watch soap operas are more likely to have eating disorders” — a correlation between soap opera watching and eating disorders — but then they incorrectly conclude that watching soap operas gives girls eating disorders.

In general, it is extremely difficult to establish causality between two correlated events or observances. In contrast, there are many statistical tools to establish a statistically significant correlation. read more here

or read an article about Cause and correlation by wisegeek here

Additional Resources

Hart online IB Biology Topic Notes and links

Choosing a statistical test: Merlin software.

Excel charts that you can draw by using Merlin free software

Excel charts that you can draw by using Merlin free software

Merlin is our statistics software for biology students. We use it to make A-level biology statistics as easy as possible. There are no calculations and no look-up tables, so students can concentrate on choosing the right statistical test and drawing the correct conclusion. By using Merlin, students also improve their ICT skills and learn to use Excel for drawing charts and presenting data.

Merlin is an add-in for Microsoft Excel, so is easy to use.

  • Merlin is completely free of charge. Copy it to as many computers as you like.
  • Merlin adds over 30 new functions and 6 charts to Excel’s existing capabilities.
  • Merlin’s functions can be typed in to a cell like any other function, or inserted using Excel’s function wizard.
  • The statistical functions return P-values directly from the raw data in one simple step. No need for look-up tables, intermediate test statistics or degrees of freedom.
  • All Merlin’s functions and graphs are dynamic, which means that the output is updated automatically whenever the source cells are changed.
  • Merlin includes detailed help for all the functions and graphs. Just go to the Help menu.
  • An example Excel spreadsheet demonstrates the use of each function and graph.
  • Merlin also includes a basic introduction to statistics for biology students. This includes examples of the main tests and problems for students to solve. The introduction is aimed at A-level students, but may also be useful to undergraduates and teachers, as well as students studying social sciences or geography.

Hans Rosling: Let my dataset change your mindset TEDtalk

Flash card Questions about Statistical Analysis here.

Leave a comment