Skip to main content

Section7.2Probability Theory

Objectives
  1. Learn the basic framework for working with probabilities with finitely many outcomes.
  2. Recipe: calculate probabilities for events of interest.
  3. Bayes' Rule for conditional probability.
  4. Vocabulary: sample space, outcome, event, false positive/negative, independence, random variable, expectation

Subsection7.2.1Introduction: Monty Hall Problem

Lots of problems in computer science involve probabilities:

  • Randomized (Monte Carlo) algorithms (for example in computer graphics)
  • Machine learning and statistical modeling
  • Cryptography and secure communication
  • Network reliability and performance analysis

Most of these can be modeled using only finite sample spaces, i.e., there are only finitely many possible outcomes. To get started, we will look at a simple problem that is often misunderstood: the Monty Hall problem.

Suppose you are on a game show and are given the choice of three doors. Behind one door is a car (electric, of course!); behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door that has a goat, say No. 3. He then says to you, “Do you want to pick door No. 2 instead?” What is the probability of getting the car if you stay and if you switch?

Intuitively, it seems like the probability of getting the car is 12 if you stay and 12 if you switch. But this is wrong! The probability of getting the car if you stay is 13 and the probability of getting the car if you switch is 23. The reason is that the host's action of opening a door gives you more information about the location of the car. We can model this problem using a tree diagram.

carhostopensDoor3Door2Door1Door2Door3Door2Door31/31/31/3111/21/2probability1/31/31/61/6

The tree diagram shows all possible outcomes of the game. The first level of the tree shows the three possibilities for the car. The second level shows the host's action of opening a door. The probabilities of the host's action depend on the car's location. The host will always open a door with a goat behind it. The probabilities of the host's action are shown on the edges of the tree. There are four possible outcomes of the game, listed by the leaves of the tree on the right. If you stay at door No. 1, the probability of getting the car is 13 (the first two leaves), while if you switch to the remaining closed door, the probability of getting the car is 23 (the last two leaves). This is the correct answer to the Monty Hall problem.

With this example in mind, we can now introduce the basic concepts of probability theory.

Subsection7.2.2Basic concepts

We will work with finite sample spaces, i.e., there are only finitely many possible outcomes. The sample space S is the (finite) set of all possible outcomes of an experiment. An event is a subset of the sample space. We get a probability space by assigning a probability to each outcome in the sample space, i.e., a function P:S[0,1] such that AsSP(s)=1. The probability of an event AS is given by P(A)=AsAP(s). By assuming a total ordering on the sample space, we can also represent the probabilities as a vector p=(P(s1),P(s2),...,P(sn)).

Subsection7.2.3Conditional probability

In the Monty Hall problem, we saw the utility of conditional probabilities. Let us now define this more formally: The conditional probability of an event A given another event B is defined as: P(A|B)=P(AB)P(B), provided P(B)>0. This is the probability of A given that B has occurred. Another way to think about this is that we are restricting our attention to the sample space B with the probability of B normalized to 1.

Note that the rule makes sense even if P(A)=0, since the right-hand side is then 0.

Definition

Two events A and B are said to be independent if the occurrence of one does not affect the probability of the other. Formally, this means: P(A|B)=P(A) or equivalently P(AB)=P(A)P(B).

Subsection7.2.4Random variables and expectation

A random variable (RV) is a function that assigns a real number to each outcome in the sample space. Formally, a random variable is a function X:SR that maps outcomes in the sample space S to real numbers. Random variables are often denoted by uppercase letters (e.g., X, Y ) and their values by lowercase letters (e.g., x, y ).

Probability mass functions (PMFs) fundamental tools for describing random variables (RVs). The PMF of a random variable assigns a probability to each possible value of the RV: P(X=x)=P(Ax), where Ax={sS|X(s)=x} is the set of outcomes (i.e., the event) where X has the value x.

For example, the Bernoulli distribution models a single trial with two outcomes (success has value 1 with probability p and failure has value 0 with probability 1p ). This has PMF P(X=x)=px(1p)1x for x∈{0,1}. The binomial distribution generalizes this to n independent trials, with a PMF

P(X=k)=CnkDpk(1p)nk

for k∈{0,1,...,n}. The uniform distribution, on the other hand, assigns equal probability to all outcomes in a finite sample space, with a PMF P(X=x)=1n for x in the range of the random variable.

Definition

The expectation (or expected value) of a random variable X, denoted E[X], is a measure of the “average” value of X. For a discrete random variable on the sample space S with probability function P, the expectation is defined as:

E[X]=BsSX(s)P(s).

For example, if X is a random variable representing the outcome of rolling a fair six-sided die, then the expectation is:

E[X]=6Bi=1i·P(X=i)=6Bi=1i·16=16(1+2+3+4+5+6)=216=3.5.

This means that the average outcome of rolling the die is 3.5.

It's often useful to rewrite the expectation in terms of the probability mass function (PMF) of the random variable. The PMF gives the probability of each possible outcome, and we can express the expectation as:

E[X]=Bxx·P(X=x).
Definition

Two random variables X and Y are said to be independent if the joint probability distribution of X and Y factors into the product of their marginal distributions. Formally, this means: P(X=x,Y=y)=P(X=x)P(Y=y) for all values x and y.

In other words, two random variables are independent if the events of them taking on specific values are independent.

Here there are two kinds of conditional expectation: E[Y|A] is the expected value of Y restricted to the probability space determined by the event A with P(A)>0, i.e.,

E[Y|A]=BsAY(s)P(s)P(A)=ByyP(Y=y|A),

while E[Y|X] is the expected value of Y given the value of X. It is defined as:

E[Y|X]=BsSY(s)P(s|X)=ByyP(Y=y|X)

It is a random variable since it depends on X. More formally, we ought to write

E[Y|X=x]=ByyP(Y=y|X=x).

Note this also means that if X and Y are independent, then this further simplifies as

E[Y|X=x]=ByyP(Y=y|X=x)=ByyP(Y=y)=E[Y].

Thus, when X and Y are independent, we have E[Y|X]=E[Y].