In the real world, very few things have absolute, fixed probabilities. Many of the aspects of the world that we are familiar with are not truly random. Take for instance, the probability of developing schizophrenia. Say that the prevalence of schizophrenia in a population is 1%. If we know nothing else about an individual, we would say that the probability of this individual developing schizophrenia is 0.01. In mathematical notation,
P(Sz) = 0.01
We know from empirical research, however, that certain people are more likely to develop schizophrenia than others. For example, having a schizophrenic first-degree relative greatly increases the risk of becoming schizophrenic. The probability above is essentially an average probability, taken across all individuals both with and without schizophrenic first-degree relatives.
The notion of conditional probability allows us to incorporate other potentially important variables, such as the presence of familial schizophrenia, into statements about the probability of an individual developing schizophrenia. Mathematically, we write
P( X | Y)
meaning the probability of X conditional on Y or given Y. In our example, we could write
P (Sz | first degree relative has Sz)
and
P (Sz | first degree relative does not have Sz)
Whether or not these two values differ is an indication of the influence of familial schizophrenia upon an individual's chances of developing schizophrenia.
---
Previously, we mentioned that all probability statements depend on some kind of model in some way. The probability of an outcome will be conditional upon the parameter values of this model. In the case of the coin toss,
P (H | p=0.5)
where H is the event of obtaining a head and p is the model parameter, set at 0.5.
Let's think a little more carefully about what the full model would be for tossing a coin, if p is the parameter. What do we know about coin tossing?
- The outcome is a discrete, binary outcome for each toss - it is either heads or tails.
- We assume that the probability of either outcome does not change over time.
- We assume that the outcome of each toss of a coin can be regarded as independent from all other outcomes. That is, getting five heads in a row does not make it any more likely to get a tail on the next trial.
- In the case of a 'fair' coin, we assume a 50:50 chance getting either heads or tails - that is, p=0.5.
Say we toss a coin a number of times and record the number of times it lands on heads. The probability distribution that describes just this kind of scenario is called the binomial probability distribution. It is written as follows :
Let's take a moment to work through this. The notation is as follows:-
- n = total number of coin tosses
- h = number of heads obtained
- p = probability of obtaining a head on any one toss
(The ! symbol means factorial (5! = 1x2x3x4x5 = 120).)
We can think of this equation in two parts. The second part involves the joint probability of obtaining h heads (and therefore n-h tails) if a coin is tossed ntimes and has probability p of landing heads on any one toss (and therefore probability 1-p of landing tails). Because we have assumed that each of the ntrails is independent and with constant probability the joint probability of obtaining h heads and n-h tails is simply the product of all the individual probabilities. Imagine we obtained 4 heads and 5 tails in 9 coin tosses. Then
is simply convenient notation for
The first half of the binomial distribution function is concerned with the fact that there is more than 1 way to get, say, 4 heads and 5 tails if a coin is tossed 9 times. We might observe
H, T, H, H, T, T, H, T, T.
or
T, H, H, T, H, T, T, H, T.
or even
H, H, H, H, T, T, T, T, T.
Every one of the permutations is assumed to have equal probability of occurring - the coefficient
represents the total number of permutations that would give 4 heads and 5 tails.
So, the probability of obtaining 4 heads and 5 tails for a fair coin is