Learn the basic framework for working with probabilities with finitely many outcomes.
Recipe: calculate probabilities for events of interest.
Bayes' Rule for conditional probability.
Vocabulary:sample space, outcome, event, false positive/negative, independence, random variable, expectation
Subsection7.2.1Introduction: Monty Hall Problem
Lots of problems in computer science involve probabilities:
Randomized (Monte Carlo) algorithms (for example in computer graphics)
Machine learning and statistical modeling
Cryptography and secure communication
Network reliability and performance analysis
Most of these can be modeled using only finite sample spaces, i.e., there are only finitely many possible outcomes. To get started, we will look at a simple problem that is often misunderstood: the Monty Hall problem.
Suppose you are on a game show and are given the choice of three doors. Behind one door is a car (electric, of course!); behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door that has a goat, say No. 3. He then says to you, “Do you want to pick door No. 2 instead?” What is the probability of getting the car if you stay and if you switch?
Intuitively, it seems like the probability of getting the car is if you stay and if you switch. But this is wrong! The probability of getting the car if you stay is and the probability of getting the car if you switch is The reason is that the host's action of opening a door gives you more information about the location of the car. We can model this problem using a tree diagram.
The tree diagram shows all possible outcomes of the game. The first level of the tree shows the three possibilities for the car. The second level shows the host's action of opening a door. The probabilities of the host's action depend on the car's location. The host will always open a door with a goat behind it. The probabilities of the host's action are shown on the edges of the tree. There are four possible outcomes of the game, listed by the leaves of the tree on the right. If you stay at door No. 1, the probability of getting the car is (the first two leaves), while if you switch to the remaining closed door, the probability of getting the car is (the last two leaves). This is the correct answer to the Monty Hall problem.
With this example in mind, we can now introduce the basic concepts of probability theory.
Subsection7.2.2Basic concepts
We will work with finite sample spaces, i.e., there are only finitely many possible outcomes. The sample space is the (finite) set of all possible outcomes of an experiment. An event is a subset of the sample space. We get a probability space by assigning a probability to each outcome in the sample space, i.e., a function such that The probability of an event is given by By assuming a total ordering on the sample space, we can also represent the probabilities as a vector
Theorem
The following are the basic laws of probability:
Non-negativity: For any event
Normalization: The probability of the entire sample space is 1, i.e.,
Additivity: For any two mutually exclusive events and i.e., we have
Complement Rule: For any event where is the complement of
Inclusion-Exclusion Principle: For any two events and
In the Monty Hall problem, we saw the utility of conditional probabilities. Let us now define this more formally: The conditional probability of an event given another event is defined as: provided This is the probability of given that has occurred. Another way to think about this is that we are restricting our attention to the sample space with the probability of normalized to 1.
Theorem
Bayes' Rule relates the conditional probability of two events and provided
Note that the rule makes sense even if since the right-hand side is then 0.
A random variable (RV) is a function that assigns a real number to each outcome in the sample space. Formally, a random variable is a function that maps outcomes in the sample space to real numbers. Random variables are often denoted by uppercase letters (e.g., ) and their values by lowercase letters (e.g., ).
Probability mass functions (PMFs) fundamental tools for describing random variables (RVs). The PMF of a random variable assigns a probability to each possible value of the RV: where is the set of outcomes (i.e., the event) where has the value
For example, the Bernoulli distribution models a single trial with two outcomes (success has value with probability and failure has value with probability ). This has PMF for The binomial distribution generalizes this to independent trials, with a PMF
for The uniform distribution, on the other hand, assigns equal probability to all outcomes in a finite sample space, with a PMF for in the range of the random variable.
Definition
The expectation (or expected value) of a random variable denoted is a measure of the “average” value of For a discrete random variable on the sample space with probability function the expectation is defined as:
For example, if is a random variable representing the outcome of rolling a fair six-sided die, then the expectation is:
This means that the average outcome of rolling the die is 3.5.
It's often useful to rewrite the expectation in terms of the probability mass function (PMF) of the random variable. The PMF gives the probability of each possible outcome, and we can express the expectation as:
Definition
Two random variables and are said to be independent if the joint probability distribution of and factors into the product of their marginal distributions. Formally, this means: for all values and
In other words, two random variables are independent if the events of them taking on specific values are independent.
Theorem
The following properties hold for expectations:
Linearity of Expectation: For any random variables and This holds regardless of whether and are independent.
Law of Total Expectation: If is a random variable and we have a partition of the sample space then (An average can be calculated as an average of averages.)
Law of Iterated Expectation: If are random variables, then
Product of Independent Random Variables: If and are independent random variables, then
Here there are two kinds of conditional expectation: is the expected value of restricted to the probability space determined by the event with i.e.,
while is the expected value of given the value of It is defined as:
It is a random variable since it depends on More formally, we ought to write
Note this also means that if and are independent, then this further simplifies as
1. Linearity of Expectation: Let and be random variables on the sample space By the definition of expectation, we have:
1. Law of Total Expectation: Indeed, this boils down to the following calculation:
3. Law of Iterated Expectation: Now let be random variables. Suppose that the range of is giving a partition of the sample space where Then by the above we can write the expectation of as
proving the law of total expectation.
3. Product of Independent Random Variables: Let and be independent random variables. Let's use the law of total expectation: The inner expectation (which, recall is a random variable depending on the value of ) simplifies by our independence assumption:
Now we can substitute this back, and take the expectation over
Thus proving the product rule for independent random variables.