Examples:

Suppose we consider three tosses
of a coin, associating a 1 to heads and 0 to tails each time,
and call *X*_{i} the random variable that results from trial i,
then we could consider the random vector
that describes the three tosses.
The state space for
is
,
and we can compute the probability distribution on
this space as products of the individual
coordinates' distributions because the
random variables *X*_{i} are independent.

Here is an example based on the same
experiment but the random variables
are different and are not
independent:
Let *Y*_{i} be the number of heads up to and
including the *i*th toss:
,
the state space or sample space for
is the same as that of ,
however there are some triplets that are impossible.
For instance
*P*(*Y*_{2}=0|*Y*_{1}=1)=0,
the coordinate random
variables are not independent and
we have to give the distribution of all the vectors
one by one because we cannot build them up from the marginals.

Example:

In the example on colorblindedness, suppose I consider the binary
random variables associated to color blindness and gender (associate
0 if male, 1 if female), these are called indicator variables,
we can tabulate the probabilities of all 4 possible pairs of outcomes
as:

So that from this table of joint distribution we read:

In general, when we build the joint distribution of two random variables we can make such a two-way table, of course, for more variables this is impossible.

Definition:

In the case of two
random variables X and Y we define
the joint probability mass function of X and
Y as :

The row-sums and column-sums produce
the complete distribution functions for
the coordinate random variables, they are
called the
marginal probabilities,
here for instance we have:

In general, given the joint distribution on the pairs (*x*,*y*)
for two random variables *X* and *Y*: *P*(*x*,*y*) we have
the marginal distributions