Suppose we consider three tosses of a coin, associating a 1 to heads and 0 to tails each time, and call Xi the random variable that results from trial i, then we could consider the random vector that describes the three tosses. The state space for is , and we can compute the probability distribution on this space as products of the individual coordinates' distributions because the random variables Xi are independent.
Here is an example based on the same experiment but the random variables are different and are not independent: Let Yi be the number of heads up to and including the ith toss: , the state space or sample space for is the same as that of , however there are some triplets that are impossible. For instance P(Y2=0|Y1=1)=0, the coordinate random variables are not independent and we have to give the distribution of all the vectors one by one because we cannot build them up from the marginals.
In the example on colorblindedness, suppose I consider the binary random variables associated to color blindness and gender (associate 0 if male, 1 if female), these are called indicator variables, we can tabulate the probabilities of all 4 possible pairs of outcomes as:
So that from this table of joint distribution we read:
In general, when we build the joint distribution of two random variables we can make such a two-way table, of course, for more variables this is impossible.
In the case of two random variables X and Y we define the joint probability mass function of X and Y as :
The row-sums and column-sums produce
the complete distribution functions for
the coordinate random variables, they are
here for instance we have:
In general, given the joint distribution on the pairs (x,y)
for two random variables X and Y: P(x,y) we have
the marginal distributions