Unit: Probability, Random Variables & Probability Distributions
Chapter: Binomial & Geometric Distributions
Reference: – Random Variables & Its types, Discrete Probability distribution, Continuous probability distributions, Expected Value & Variance, Law of Large Number & Central limit Theorem, Sampling Distributions, Transformations, Joint Distributions, Independent Random Variables, Bivariate data, Probability Models. Probability Mass function, Mean & variance, Binomial coefficient, Cumulative Probability, Statistical technology, Conditions for Binomial distributions, Relationship to Exponential distributions, Memoryless Property, Geometric Distributions.
After studying this chapter, you should be able to:
- Random Variables & Its Types.
- Discrete & Continuous Distributions.
- Sampling Distributions & Independent Random Variables.
- Law of Large Number & Central Limit Theorem.
- Bivariate Data & Probability Models
- Probability Mass Function, Mean & Variance.
- Binomial Coefficient & Cumulative Probability.
- Binomial & Exponential Distributions.
- Geometric Distributions
Random Variables & Its Types
Definition of Random Variable: A random variable is a numerical outcome that results from a random experiment or process. It assigns a value to each possible outcome.
Discrete Random Variable: A discrete random variable can take on a countable number of distinct values. Examples include the number of heads in coin flips or the number of cars passing by in an hour.
Continuous Random Variable: A continuous random variable can take any value within a certain range. Examples include height, weight, or time. Continuous random variables are described by probability density functions.
Probability Distribution: The probability distribution of a random variable describes the likelihood of each possible value occurring. It can be represented through a table, graph, or equation.
Probability Mass Function (PMF): For discrete random variables, the PMF gives the probability of each individual value. It's often denoted as P(X = x).
Probability Density Function (PDF): For continuous random variables, the PDF gives the relative likelihood of the variable falling within a specific range of values.
Expected Value (Mean): The expected value of a random variable is its long-term average over repeated trials. It's denoted as E(X) and represents the center of the distribution.
Variance and Standard Deviation: Variance measures the spread of a distribution, and the standard deviation is its square root. They quantify the variability of the random variable's values around the mean.
Binomial Random Variable: A type of discrete random variable that represents the number of successes in a fixed number of independent Bernoulli trials (experiments with two possible outcomes).
Poisson Random Variable: Another discrete random variable that models the number of events occurring in a fixed interval of time or space, given a certain average rate.
Normal Random Variable: A continuous random variable that follows the bell-shaped normal distribution. It's characterized by its mean and standard deviation and is central to many statistical analyses.
Exponential Random Variable: A continuous random variable often used to model the time between events in a Poisson process, such as the time between arrivals at a service point.
Joint Random Variables: When dealing with multiple random variables, joint distributions describe the probabilities associated with various combinations of values from these variables.
Independent Random Variables: Two random variables are independent if the occurrence or value of one does not affect the occurrence or value of the other. Independence is a fundamental concept in probability.
Transformations of Random Variables: Applying functions to random variables results in transformed random variables. This is important in cases where the relationship between variables is nonlinear.
Discrete & Continuous Distributions
Discrete Distributions:
Definition: Discrete distributions describe random variables that take on a countable set of distinct values. These values are often integers.
Probability Mass Function (PMF): Discrete distributions are described by their PMFs, which give the probability of each individual value.
Examples: Common discrete distributions include the binomial, Poisson, and geometric distributions.
Binomial Distribution: Models the number of successes in a fixed number of independent trials, each with the same probability of success.
Poisson Distribution: Models the number of rare events occurring in a fixed interval of time or space, given a certain average rate.
Geometric Distribution: Models the number of trials needed for the first success in a sequence of independent trials with a fixed probability of success.
Expected Value and Variance: Discrete distributions have expected values and variances that can be calculated using the PMF.
Probability and Cumulative Probability: The probability of a specific outcome and cumulative probabilities can be calculated from the PMF.
Continuous Distributions:
Definition: Continuous distributions describe random variables that can take on any value within a certain range. These distributions are characterized by their probability density functions (PDFs).
Probability Density Function (PDF): The PDF gives the relative likelihood of the variable falling within a specific range of values.
Examples: Common continuous distributions include the normal, exponential, and uniform distributions.
Normal Distribution: Often referred to as the "bell curve," it is widely used due to its prevalence in natural phenomena and the Central Limit Theorem.
Exponential Distribution: Models the time between events in a Poisson process, such as the time between arrivals at a service point.
Uniform Distribution: Represents outcomes that are equally likely within a certain range. The PDF is constant over this range.
Expected Value and Variance: Continuous distributions also have expected values and variances, calculated using the PDF. Integration is often required for these calculations.
Sampling Distributions & Independent Random Variables
Sampling Distribution:
Definition: A sampling distribution is the probability distribution of a statistic (such as the mean or proportion) calculated from a sample. It shows how the statistic varies across different samples of the same size from the same population.
Central Limit Theorem (CLT): The CLT states that for a large enough sample size, the sampling distribution of the sample mean (or sum) will be approximately normally distributed, regardless of the population distribution.
Sampling Distribution of the Sample Mean: For a sufficiently large sample size, the mean of the sampling distribution of the sample mean is equal to the population mean, and the standard deviation is equal to the population standard deviation divided by the square root of the sample size.
Sampling Distribution of the Sample Proportion: The sampling distribution of the sample proportion approaches a normal distribution as the sample size increases. Its mean is the population proportion, and its standard deviation is determined by the population proportion and sample size.
Sampling Distribution of a Difference in Means or Proportions: When comparing two sample means or proportions, the sampling distribution of the difference follows specific properties based on the original distributions and sample sizes.
Independent Random Variables:
Definition: Two random variables are considered independent if the occurrence or value of one does not affect the occurrence or value of the other. Independence is a crucial concept in probability and statistics.
Joint Probability Distribution: For two or more random variables, the joint probability distribution describes the probabilities associated with various combinations of values from these variables.
Covariance and Correlation: Covariance measures the degree to which two random variables vary together. Correlation standardizes this measure to fall between -1 and 1, indicating the strength and direction of the linear relationship.
Properties of Independent Variables: If X and Y are independent, their expected values and variances are not affected by each other. Specifically, E(XY) = E(X)E(Y) and Var(X + Y) = Var(X) + Var(Y) if X and Y are independent.
Application in Probability and Statistics: Independence simplifies the analysis of complex systems, allowing for easier calculations and predictions. It's often assumed in various statistical methods and models, such as in the calculation of probabilities involving multiple events.
Example: Rolling a Fair Die
Suppose you are rolling a fair six-sided die. Let's define a random variable X as the outcome of the roll (the number that appears face-up on the die).
Probability Mass Function, Mean & Variance
Probability Mass Function (PMF):
The Probability Mass Function (PMF) describes the probabilities of different outcomes in a discrete random variable.
For the Binomial distribution, the PMF gives the probability of getting exactly "k" successes in "n" independent trials with a fixed probability of success.
For the Geometric distribution, the PMF gives the probability of needing exactly "k" trials to achieve the first success.
The PMF is non-negative for all possible values of the random variable.
The sum of all probabilities in the PMF equals 1, representing the certainty of an outcome occurring.
Mean (Expected Value):
The mean (expected value) of a random variable measures its average value over many trials.
For the Binomial distribution, the mean is given by the product of the number of trials "n" and the probability of success "p": μ = np.
For the Geometric distribution, the mean is the reciprocal of the probability of success: μ = 1/p.
The mean represents the center of the distribution and provides a measure of the "typical" value.
Variance:
Variance measures the spread or dispersion of the random variable's values around the mean.
For the Binomial distribution, the variance is given by the product of the number of trials "n," the probability of success "p," and the probability of failure "q" (1 – p): σ² = npq.
For the Geometric distribution, the variance is calculated as (1 – p) / p².
Standard Deviation: The square root of the variance gives the standard deviation, which is another measure of the distribution's spread.
Interpretation and Application:
Mean and variance provide insights into the central tendency and variability of data, respectively, in both Binomial and Geometric distributions.
In real-world applications, understanding the mean and variance helps in making predictions, comparing different scenarios, and making informed decisions based on probability distributions.
Binomial Coefficient & Cumulative Property
Binomial Coefficient:
The binomial coefficient, often denoted as "n choose k" or C(n, k), represents the number of ways to choose "k" items from a set of "n" items without regard to order. It is computed as C(n, k) = n! / (k! * (n – k)!), where "n!" denotes the factorial of "n."
In the context of the binomial distribution, the binomial coefficient C(n, k) is used to calculate the probability of getting exactly "k" successes in "n" independent trials.
Binomial coefficients have combinatorial significance, as they count the number of possible combinations in various scenarios, such as forming committees or selecting items from a set.
Binomial coefficients satisfy the Pascal's Triangle pattern, where each number is the sum of the two numbers directly above it, representing the combination of items.
Binomial coefficients are essential for calculating probabilities and understanding the distribution of outcomes in binomial experiments.
Cumulative Property:
The cumulative property, also known as the cumulative distribution function (CDF), provides the probability that a random variable is less than or equal to a specific value.
In the context of the binomial distribution, the cumulative property helps calculate the probability of having "k" or fewer successes in "n" trials, denoted as P(X ≤ k), where X is the random variable representing the number of successes.
The cumulative property allows you to consider a range of outcomes, providing insights into the overall likelihood of different levels of success.
The cumulative property is used to analyze the behavior of the distribution as a whole, rather than just focusing on individual outcomes.
The cumulative property is especially useful for making decisions based on probabilities, such as determining a cutoff point for acceptable performance in quality control.
Interpretation and Application:
Binomial coefficients and the cumulative property are crucial tools for calculating probabilities and understanding the behavior of random variables in binomial experiments.
They help in making informed decisions by quantifying the likelihood of different outcomes and ranges of success.
These concepts are widely applicable in fields such as statistics, probability theory, engineering, quality control, biology, and more.
The cumulative property allows for a comprehensive analysis of the distribution, enabling you to answer questions about the probability of achieving specific levels of success.
Together, binomial coefficients and the cumulative property provide a solid foundation for understanding and working with the binomial distribution, making them valuable skills in various areas of study and professional practice.
Binomial & Exponential Distributions
Binomial Distribution:
Definition: A sampling distribution is the probability distribution of a statistic (such as the mean or proportion) calculated from a sample. It shows how the statistic varies across different samples of the same size from the same population.
Central Limit Theorem (CLT): The CLT states that for a large enough sample size, the sampling distribution of the sample mean (or sum) will be approximately normally distributed, regardless of the population distribution.
Sampling Distribution of the Sample Mean: For a sufficiently large sample size, the mean of the sampling distribution of the sample mean is equal to the population mean, and the standard deviation is equal to the population standard deviation divided by the square root of the sample size.
Sampling Distribution of the Sample Proportion: The sampling distribution of the sample proportion approaches a normal distribution as the sample size increases. Its mean is the population proportion, and its standard deviation is determined by the population proportion and sample size.
Sampling Distribution of a Difference in Means or Proportions: When comparing two sample means or proportions, the sampling distribution of the difference follows specific properties based on the original distributions and sample sizes.
Exponential Distributions:
Definition: Two random variables are considered independent if the occurrence or value of one does not affect the occurrence or value of the other. Independence is a crucial concept in probability and statistics.
Joint Probability Distribution: For two or more random variables, the joint probability distribution describes the probabilities associated with various combinations of values from these variables.
Covariance and Correlation: Covariance measures the degree to which two random variables vary together. Correlation standardizes this measure to fall between -1 and 1, indicating the strength and direction of the linear relationship.
Properties of Independent Variables: If X and Y are independent, their expected values and variances are not affected by each other. Specifically, E(XY) = E(X)E(Y) and Var(X + Y) = Var(X) + Var(Y) if X and Y are independent.
Application in Probability and Statistics: Independence simplifies the analysis of complex systems, allowing for easier calculations and predictions. It's often assumed in various statistical methods and models, such as in the calculation of probabilities involving multiple events.
Geometric Distributions
Geometric Distribution Overview:
The geometric distribution models the number of independent trials needed to achieve the first success in a sequence of trials, where each trial has a fixed probability of success.
It is a discrete probability distribution that deals with a single binary outcome (success or failure) over multiple trials.
The geometric distribution is memoryless, meaning the probability of success on the next trial remains the same regardless of past outcomes.
It is commonly used to model scenarios such as the number of coin flips needed to get the first "heads" or the number of attempts to make a successful sale.
Probability Mass Function (PMF):
The Probability Mass Function (PMF) of the geometric distribution gives the probability of requiring exactly "k" trials to achieve the first success.
The PMF formula is P(X = k) = (1 – p)^(k – 1) * p, where "p" is the probability of success on each trial.
The first trial is the only one where success occurs, and subsequent trials can be failures until the first success is achieved.
Mean and Variance:
The mean (expected value) of the geometric distribution is μ = 1/p, where "p" is the probability of success.
The variance of the geometric distribution is σ² = (1 – p) / p².
Applications:
Geometric distributions are used in various real-world scenarios, such as modelling the number of attempts until a computer system fails or the number of phone calls until reaching a customer who agrees to a survey.
They are particularly applicable in situations where each trial is independent and has the same probability of success.
Interpretation:
The mean of the geometric distribution represents the average number of trials required to achieve the first success.
The variance indicates the spread or variability of the number of trials needed to succeed.
Use of Technology:
Calculators or statistical software can be used to calculate probabilities, mean, and variance in geometric distributions.
Comparisons:
Geometric distributions are related to exponential distributions, where the exponential distribution models the continuous time between events, while the geometric distribution models the discrete trials until the first success.
Example: Suppose you are flipping a fair coin (50% chance of heads and 50% chance of tails) repeatedly until you get the first heads. Let's explore the Binomial and Geometric distributions associated with this scenario.
Binomial Distribution:
Question: What is the probability of getting exactly 3 heads in 5 coin flips?
Solution: -In this case, we have "n" trials (coin flips) and each trial has a probability of success ("p") of getting heads. The probability of getting exactly "k" heads in "n" trials is given by the binomial distribution formula:
P(X = k) = C(n, k) * pk * (1 – p)n-k
For our scenario:
n = 5 (coin flips)
k = 3 (heads)
p = 0.5 (probability of heads)
Using the formula:
P(X = 3) = C(5, 3) * (0.5)3 * (0.5)2
= 10 * 0.125 * 0.25
= 0.3125
So, the probability of getting exactly 3 heads in 5-coin flips is 0.3125 or 31.25%.
Question: What is the probability distribution of the random variable X?
Solution: – Since the die is fair, each face (number) has an equal probability of 16
of appearing.
The probability distribution of X is as follows:

In this case, X is a discrete random variable, and its probability distribution is a uniform distribution, where each outcome has an equal probability of 1/6.
Key Points
Definition: A discrete random variable is a variable that can take on a finite or countable set of distinct values, usually integers.
Examples: The number of heads in a series of coin flips, the number of cars passing by in an hour, the number of books sold in a day.
Probability Mass Function (PMF): The PMF of a discrete random variable gives the probability of each possible value. It's denoted as P(X = x), where X is the random variable and x is a specific value.
Range: The range of a discrete random variable is the set of all possible values it can take.
Probability Rules: The sum of probabilities for all possible values of a discrete random variable is equal to 1.
Probability Distribution of Discrete Random Variables:
Probability Distribution: The probability distribution of a discrete random variable lists the possible values and their corresponding probabilities.
Expected Value (Mean): The expected value of a discrete random variable E(X) is the weighted average of its possible values, where the weights are the probabilities.
Variance and Standard Deviation: The variance of a discrete random variable measures the spread of its values around the mean. The standard deviation is the square root of the variance.
Binomial Distribution: A common discrete distribution that models the number of successes in a fixed number of independent trials, each with the same probability of success.
Poisson Distribution: Another discrete distribution that models the number of rare events occurring in a fixed interval of time or space, given a certain average rate.
Geometric Distribution: Models the number of trials needed for the first success in a sequence of independent trials with a fixed probability of success.
Hypergeometric Distribution: Models the number of successes in a sample drawn without replacement from a finite population, with distinct groups of successes and failures.