Section7.2Probability Theory¶ permalink

Objectives

Learn the basic framework for working with probabilities with finitely many outcomes.
Recipe: calculate probabilities for events of interest.
Bayes' Rule for conditional probability.
Vocabulary: sample space, outcome, event, false positive/negative, independence, random variable, expectation

Subsection7.2.1Introduction: Monty Hall Problem

Lots of problems in computer science involve probabilities:

Randomized (Monte Carlo) algorithms (for example in computer graphics)
Machine learning and statistical modeling
Cryptography and secure communication
Network reliability and performance analysis

Most of these can be modeled using only finite sample spaces, i.e., there are only finitely many possible outcomes. To get started, we will look at a simple problem that is often misunderstood: the Monty Hall problem.

Suppose you are on a game show and are given the choice of three doors. Behind one door is a car (electric, of course!); behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door that has a goat, say No. 3. He then says to you, “Do you want to pick door No. 2 instead?” What is the probability of getting the car if you stay and if you switch?

Intuitively, it seems like the probability of getting the car is if you stay and if you switch. But this is wrong! The probability of getting the car if you stay is and the probability of getting the car if you switch is The reason is that the host's action of opening a door gives you more information about the location of the car. We can model this problem using a tree diagram.

The tree diagram shows all possible outcomes of the game. The first level of the tree shows the three possibilities for the car. The second level shows the host's action of opening a door. The probabilities of the host's action depend on the car's location. The host will always open a door with a goat behind it. The probabilities of the host's action are shown on the edges of the tree. There are four possible outcomes of the game, listed by the leaves of the tree on the right. If you stay at door No. 1, the probability of getting the car is (the first two leaves), while if you switch to the remaining closed door, the probability of getting the car is (the last two leaves). This is the correct answer to the Monty Hall problem.

With this example in mind, we can now introduce the basic concepts of probability theory.

Subsection7.2.2Basic concepts

We will work with finite sample spaces, i.e., there are only finitely many possible outcomes. The sample space is the (finite) set of all possible outcomes of an experiment. An event is a subset of the sample space. We get a probability space by assigning a probability to each outcome in the sample space, i.e., a function such that The probability of an event is given by By assuming a total ordering on the sample space, we can also represent the probabilities as a vector

Theorem

The following are the basic laws of probability:

Non-negativity: For any event
Normalization: The probability of the entire sample space is 1, i.e.,
Additivity: For any two mutually exclusive events and i.e., we have
Complement Rule: For any event where is the complement of
Inclusion-Exclusion Principle: For any two events and

Example

Subsection7.2.3Conditional probability

In the Monty Hall problem, we saw the utility of conditional probabilities. Let us now define this more formally: The conditional probability of an event given another event is defined as: provided This is the probability of given that has occurred. Another way to think about this is that we are restricting our attention to the sample space with the probability of normalized to 1.

Theorem

Bayes' Rule relates the conditional probability of two events and provided

Note that the rule makes sense even if since the right-hand side is then 0.

Proof

Example

Definition

Two events and are said to be independent if the occurrence of one does not affect the probability of the other. Formally, this means: or equivalently

Example

Subsection7.2.4Random variables and expectation

A random variable (RV) is a function that assigns a real number to each outcome in the sample space. Formally, a random variable is a function that maps outcomes in the sample space to real numbers. Random variables are often denoted by uppercase letters (e.g., ) and their values by lowercase letters (e.g., ).

Probability mass functions (PMFs) fundamental tools for describing random variables (RVs). The PMF of a random variable assigns a probability to each possible value of the RV: where is the set of outcomes (i.e., the event) where has the value

For example, the Bernoulli distribution models a single trial with two outcomes (success has value with probability and failure has value with probability ). This has PMF for The binomial distribution generalizes this to independent trials, with a PMF

for The uniform distribution, on the other hand, assigns equal probability to all outcomes in a finite sample space, with a PMF for in the range of the random variable.

Definition

The expectation (or expected value) of a random variable denoted is a measure of the “average” value of For a discrete random variable on the sample space with probability function the expectation is defined as:

For example, if is a random variable representing the outcome of rolling a fair six-sided die, then the expectation is:

This means that the average outcome of rolling the die is 3.5.

It's often useful to rewrite the expectation in terms of the probability mass function (PMF) of the random variable. The PMF gives the probability of each possible outcome, and we can express the expectation as:

Definition

Two random variables and are said to be independent if the joint probability distribution of and factors into the product of their marginal distributions. Formally, this means: for all values and

In other words, two random variables are independent if the events of them taking on specific values are independent.

Theorem

The following properties hold for expectations:

Linearity of Expectation: For any random variables and This holds regardless of whether and are independent.
Law of Total Expectation: If is a random variable and we have a partition of the sample space then (An average can be calculated as an average of averages.)
Law of Iterated Expectation: If are random variables, then
Product of Independent Random Variables: If and are independent random variables, then

Here there are two kinds of conditional expectation: is the expected value of restricted to the probability space determined by the event with i.e.,

while is the expected value of given the value of It is defined as:

It is a random variable since it depends on More formally, we ought to write

Note this also means that if and are independent, then this further simplifies as

Thus, when and are independent, we have

Proof

Comments, corrections or suggestions?(Free GitHub account required)