Maths @ CHARUSAT: April 2015

Wednesday, 29 April 2015

Introduction to Bio statistics

Dear Students this page shows the brief notes compiled from books prescribed in syllabus for your information only

Text Books

Robert R. Sokal and F. James Rohlf: Introduction to Biostatistics, Dover Publications.
Olive Jean Dunn and Virginia A Clark: Basic Statistics, A primer for the Biomedical Sciences,Fourth Edition, John Wiley & Sons.
Wayne W. Daniel: Biostatistics, A foundation for Analysis in the Health Sciences,Eighth Edition, John Wiley & Sons
Bernard Rosner: Fundamentals of Biostatistics, Duxbury, Thomson Learning, Fifth Edition

Some important concepts that we have discussed in lectures are^[1]:

Biostatistics: We may define biostatistics as the application of statistical methods to the solution of biological problems arising say in health-related sciences, agricultural sciences. We also call Biostatistics as biological statistics or biometry. In modern sense statistics may be defined as the scientific study of numerical data based on natural phenomena. For example number of peas in a pod, the heartbeats of rats in response to adrenalin, the mutation rate in a maize after irradiation, or the incidence or morbidity in patients treated with a vaccine. However Scientist have interfered with the phenomenon through their intervention.

Population: The biological definition of this term refers to all the individuals of a given species (perhaps of a given life-history stage or sex) found in a circumscribed area at a given time. In statistics, population always means the totality of individual observations about which inferences are to be made, existing anywhere in the world or at least within a definitely specified sampling area limited in space and time. For example,

If we carry out six replicate determinations of sodium in a certain material, then the six individual observations constitute a sample from a population of all determinations of sodium that could have been made with that measurement technique.
A population may represent the outcomes of experiments, such as all the heartbeat frequencies produced in an animal by injections of adrenalin.A given experiment, such as the administration of adrenalin to animal,could be repeated as long as the experimenter could obtain material .
In an experiment to study the number of leukocytes in peripheral blood of Five male patients and suppose we draw conclusion about all men from the group of five male patients then Population from which this five males are selected constitutes all extant males of species Homo sapiens.

Populations can be thought of as existing or conceptual:

Conceptual population cannot be visualized, but one can think of sets of measurement like characteristics of all diseased persons at present and in near future, or an effect of treatment given to large number of individuals though it is not possible to administer this treatment to all individuals.

Sample :
We shall first discuss Variable: The characteristic or property measured of an item (or object or individual or more general sampling unit under study) will be referred to as Variable and corresponding value is called an observation. For example If we measure weight, (also blood pH and red cell count say) in 100 rats, then weight of each rat is an individual measurement; the hundred rat weights together represent sample of observations. Each mouse( a biological individual) is the smallest sampling unit. If we study weight in a single rat over a period of time, the sample of individual observations would be the weights recorded on one rat at successive times. If we consider estimate of DNA content of a single mammalian sperm cell to be an individual observation, the sample of observations may be estimates of DNA content of all sperm cells studied in one individual mammal. Thus Sample may be defined as collection of individual observations selected by a specific procedure. In a group of 25 mice, measurements are obtained on blood pH and the erythrocyte count, (two variables are studied). Then we say the pH readings and cell counts are individual observations and two samples of 25 observations, or a bivariate sample of 25 observations, each referring to a pH reading paired with an erythrocyte count.

Variables in Biostatistics:
A variable is a characteristic with respect to which individuals in a sample differ from each other. If the property does not differ within sample, it cannot be of statistical study. Length, Weight, Height, number of teeth, vitamin C content, and genotypes are examples of variables in ordinary genetically and phenotypically diverse groups of organisms. Warm-bloodedness in a group of mammals is not a variable, since mammals are all alike in this regard but body temperature of individual mammals is a variable.
We can divide variables as:

Variables
Measurement Variables
Continuous Variables	discontinuous Variables
Ranked Variables
Attributes

Measurement variables are those measurement or counts that are expressed numerically. Continuous variables are values that can fall anywhere corresponding to points on a line segment. For example lengths, areas, volumes, weights, angles, temperatures, period of time, percentages, concentrations, and rates. Discontinuous (or discrete) variables are those that can take on only a finite (or countably infinite) number of outcomes. For example numbers of given structure (such as segments, bristles, teeth, or glands), number of offspring, number of colonies of microorganisms or animals, or number of plants in a given quadrat.

Some variables cannot be measured but at least can be ordered or ranked by their magnitude. For example in an experiment one might record the rank order of emergence of ten pupae without specifying the exact time at which each pupa emerged. Examples in the medical field typically relate to degrees of change in patients after some treatment (such as: vast improvement, moderate improvement, no change, moderate degradation, vast degradation/death), a level of intensity, growth.

Variables that cannot be measured but must be expressed qualitatively are called attributes, or nominal variables.Nominal variables have distinct levels that have no inherent ordering.Hair color and sex, death (no or yes) of an experimental animal in an antibiotic study and growth or no growth of an organism in a culture medium investigation are examples of variables that would be described as nominal.

Descriptive Statistics:The sample observations are summarized such that it describes the certain characteristics of sample that are corresponding to that of population of interest (under the assumption that the sample is representative of that population).

Two types of descriptive statistics are Statistics of location and Statistics of dispersion.

Statistics of location (Measures of Central Tendency)

Arithmetic Mean

The arithmetic mean of a variable is obtained by dividing the sum of its given values by their number. If the variable is denoted by $x$ and if $n$ values of $x$ are given: $x_{1},x_{2},\ldots,x_{n}$, then arithmetic mean of $x$ is $\bar{x}=\dfrac{\sum\limits_{i=1}^{n}x_{i}}{n}$
Properties of Arithmetic Mean

The sum of the deviations of the given values of variable from its mean is necessarily zero. If $x_{1},x_{2},\ldots x_{n}$ are $n$ values of variable $x$ and $\bar{x}$ denotes mean of $x$, then $\sum\limits_{i=1}^{n}{(x_{i}-\bar{x})}=0$.
If variables $x$ and $y$ are related as $y=a+bx$, corresponding to the $n$ values of $x$, $x_{1},x_{2},\ldots,x_{n}$ there are $n$ values of $y$ as $y_{1}=a+bx_{1},y_{2}=a+bx_{2}\ldots y_{n}=a+bx_{n}$,Then mean of $y$ is given by $\bar{y}=a+b\bar{x}$. Where $a,b$ are constants.
If the given values of variable $x$ are all equal to a constant $a$, $x_{1}=a,x_{2}=a,\ldots x_{n}=a$ then mean of variable is same as the common value. That is $\bar{x}=a$.
Let there be two sets of values of variable $x$, the number of values in two sets being $n_{1}$ and $n_{2}$ and means being $\bar{x}_{1}$ and $\bar{x}_{2}$, then mean of variable $x$ when values in two sets are taken together is given by $\bar{x}=\dfrac{n_{1}\bar{x}_{1}+n_{2}\bar{x}_{2}}{n_{1}+n_{2}}$

Median
If the given values of $x$ are arranged in an increasing or decreasing order of magnitude, then middle-most value in this arrangement is called median of $x$.The median may alternatively be defined as a value of $x$ such that half of the given values of $x$ are smaller than or equal to it and half are greater than or equal to it.
When the number of values, $n$ is odd, the middle-most value- that is $\dfrac{\left(n+1\right)}{2}$th value in arrangement will be the unique median of $x$.
When $n$ is even, there will be no unique median. Any number between $\dfrac{n}{2}$th and $\left(\dfrac{n}{2}+1\right)$st values of $x$ in the arrangement, being regarded as middle-most. The arithmetic mean of $\dfrac{n}{2}$th and $\left(\dfrac{n}{2}+1\right)$st values is accepted as the median of $x$.

Mode
The mode of a variable is the value of the variable having the highest frequency.
Geometric Mean
If a variable $x$ has $n$ given values, $x_{1},x_{2},\ldots,x_{n}$,then its geometric mean is defined by $\text{GM} = \left(\prod\limits_{i=1}^{n}x_{i}\right)^{1/n}$
Also,$\log{GM}=\dfrac{1}{n}$ $\sum\limits_{i=1}^{n}$ $\log x_{i}$

Thus logarithm of the geometric mean of a variable is the arithmetic mean of its logarithm.
Harmonic Mean
The harmonic mean of a variable $x$, with the given values $x_{i},(i=1,2,\ldots,n)$ is defined by,
$\text{HM} = \dfrac{n}{\sum\limits_{i=1}^{n}\dfrac{1}{x_{i}}} $ or $\dfrac{1}{\text{HM}}=\dfrac{1}{n}\sum\limits_{i=1}^{n}\dfrac{1}{x_{i}}$

The second formula shows that the reciprocal of the harmonic mean of a variable is the arithmetic mean of its reciprocal.

Statistics of Dispersion

Range: The simplest measure of dispersion of a variable is its range, which is defined as the difference between its highest and lowest given values.

Mean Deviation
If $A$ is the chosen average value of the variable $x$, then $x_{i}-A$ is the deviation of the $i^{th}$ given value of $x$ from the average. Clearly the higher the deviations $x_{1}-A,x_{2}-A,\ldots,x_{n}-A$ in magnitude, the higher is the dispersion of $x$. The arithmetic mean of absolute deviations $|x_{1}-A|,|x_{2}-A|,\ldots,|x_{n}-A|$ may be taken as the measure of dispersion. It is referred to as the mean deviation of $x$ about $A$. Denoting this mean deviation by $\text{MD}_A$, we have $\text{MD}_A=\dfrac{\sum\limits_{i=1}^{n}|x_{i}-A|}{n}$. Note that the mean deviation is least when measured about median of variable.

Standard Deviation
If $A$ is the chosen average value of the variable $x$, then $x_{i}-A$ is the deviation of the $i^{th}$ given value of $x$ from the average. Clearly the higher the deviations $x_{1}-A,x_{2}-A,\ldots,x_{n}-A$ in magnitude, the higher is the dispersion of $x$. By taking positive square root of the arithmetic mean of squares of the deviations $\left(x_{i}-A\right)^{2}$, i.e. $\sqrt{\dfrac{\sum\limits_{i=1}^{n}\left(x_{i}-A\right)^{2}}{n}}$ is called the root-mean-square deviation about $A$.

The measure of dispersion obtained by putting $\bar{x}$ for $A$ above is called the standard deviation of $x$ and is denoted by $s$ or $S_{x}$. We have therefore $s=\sqrt{\dfrac{\sum\limits_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}{n}}$

Mathematical Properties of Standard Deviation

If the given values of variable $x$ are all equal to a constant $a$, $x_{1}=a,x_{2}=a,\ldots x_{n}=a$ then $S_{x}= \sqrt{\dfrac{\sum\limits_{i=1}^{n}{(x_{i}-\bar{x})}^2}{n}}=0$ where $S_{x}$ denotes standard deviation of $x$
If variables $x$ and $y$ are related as $y=a+bx$, corresponding to the $n$ values of $x$, $x_{1},x_{2},\ldots x_{n}$ there are $n$ values of $y$ as $y_{1}=a+bx_{1},y_{2}=a+bx_{2}\ldots y_{n}=a+bx_{n}$, Then $S_{y}=|b|S_{x}$, where $S_{x}$ and $S_{y}$ denote standard deviations of $x$ and $y$ respectively.Where $a,b$ are constants.

Tuesday, 28 April 2015

Hypothesis Test Table

Dear Students as we have discussed in class following tables are summary of Statistical Hypothesis tests. The PDF version of the same is available. Send your comments for any queries. Thank You.

Let $X_{1},X_{2},\ldots X_{n}$ be independent and ideally distributed random variables according to $N(\mu,\sigma^{2})$ where $\sigma^{2}$ is unknown. We wish to test a hypothesis of the type $\sigma^{2} \geq \sigma_{0}^{2}$ or,$\sigma^{2} \leq \sigma_{0}^{2}$ or $\sigma^{2} = \sigma_{0}^{2}$, where $\sigma_{0}^{2}$ is some given positive number. Let $\bar{X} = \dfrac{\sum\limits_{i=1}^{n}X_{i}}{n}$ and $S^{2} = \dfrac{\sum\limits_{i=1}^{n}X_{i}^{2}-\dfrac{\left(\sum\limits_{i=1}^{n}X_{i}\right)^{2}}{n}}{n-1}$
We summarize the tests in the following table:

			or
			Reject H0 at level $\alpha$ if
	$H_{0}$	$H_{1}$	$\mu$ known	$\mu$ unknown
1	$\sigma \geq \sigma_{0} $	$\sigma < \sigma_{0}$	$\sum\limits_{i=1}^{n}(x_{i}-\mu)^{2}\leq \chi_{n,1-\alpha}^{2}\sigma_{0}^{2} $	$s^{2} \leq \dfrac{\sigma_{0}^{2}}{n-1}\chi_{n-1,1-\alpha}^{2}$
2	$\sigma \leq \sigma_{0} $	$\sigma > \sigma_{0}$	$\sum\limits_{i=1}^{n}(x_{i}-\mu)^{2}\geq \chi_{n,\alpha}^{2}\sigma_{0}^{2} $	$s^{2} \geq \dfrac{\sigma_{0}^{2}}{n-1}\chi_{n-1,\alpha}^{2}$
3	$\sigma = \sigma_{0} $	$\sigma \neq \sigma_{0}$	$\sum\limits_{i=1}^{n}(x_{i}-\mu)^{2} \leq \chi_{n,1-\alpha/2}^{2}\sigma_{0}^{2}$	$s^{2} \leq \dfrac{\sigma_{0}^{2}}{n-1}\chi_{n-1,1-\alpha/2}^{2}$
			or	or
			$\sum\limits_{i=1}^{n}(x_{i}-\mu)^{2}\geq \chi_{n,\alpha/2}^{2}\sigma_{0}^{2} $	$s^{2} \geq \dfrac{\sigma_{0}^{2}}{n-1}\chi_{n-1,\alpha/2}^{2}$

The test is called Chi-Square test

Let $X_{1},X_{2},\ldots X_{n}$ be independent and ideally distributed random variables according to $N(\mu,\sigma^{2})$.Let $\bar{X} = \dfrac{\sum\limits_{i=1}^{n}X_{i}}{n}$ and $S^{2} = \dfrac{\sum\limits_{i=1}^{n}X_{i}^{2}-\dfrac{\left(\sum\limits_{i=1}^{n}X_{i}\right)^{2}}{n}}{n-1}$
We summarize the tests in the following table:

			Reject H0 at level $\alpha$ if
	$H_{0}$	$H_{1}$	$\sigma$ known	$\sigma$ unknown
1	$\mu \leq \mu_{0} $	$\mu > \mu_{0}$	$\bar{x} \geq \mu_{0}+\dfrac{\sigma}{\sqrt{n}}z_{\alpha} $	$\bar{x} \geq \mu_{0}+\dfrac{s}{\sqrt{n}}t_{n-1,\alpha}$
2	$\mu \geq \mu_{0} $	$\mu < \mu_{0}$	$\bar{x} \leq \mu_{0}+\dfrac{\sigma}{\sqrt{n}}z_{1-\alpha} $	$\bar{x} \leq \mu_{0}+\dfrac{s}{\sqrt{n}}t_{n-1,1-\alpha}$
3	$\mu = \mu_{0} $	$\mu \neq \mu_{0}$	$\mid \bar{x}-\mu_{0} \mid \geq \dfrac{\sigma}{\sqrt{n}}z_{\alpha/2} $	$\mid \bar{x}-\mu_{0} \mid \geq \dfrac{s}{\sqrt{n}}t_{n-1,\alpha/2}$

The test is called t-test

Let $X_{1},X_{2},\ldots X_{m}$ and $Y_{1},Y_{2},\ldots Y_{n}$ be independent random samples distributed according to $N(\mu_{1},\sigma_{1}^{2})$ and $N(\mu_{2},\sigma_{2}^{2})$ respectively.Also
$\bar{X} = \dfrac{\sum\limits_{i=1}^{m}X_{i}}{m}$ , $S_{1}^{2} = \dfrac{\sum\limits_{i=1}^{m}X_{i}^{2}-\dfrac{\left(\sum\limits_{i=1}^{m}X_{i}\right)^{2}}{m}}{m-1}$
$\bar{Y} = \dfrac{\sum\limits_{i=1}^{n}Y_{i}}{n}$ $S_{2}^{2} = \dfrac{\sum\limits_{i=1}^{n}Y_{i}^{2}-\dfrac{\left(\sum\limits_{i=1}^{n}Y_{i}\right)^{2}}{n}}{n-1}$
and $S_{p}^{2} = \dfrac{(m-1)S_{1}^{2}+(n-1)S_{2}^{2}}{m+n-2}$
$S_{p}^{2}$ is sometimes called pooled sample variance
The following table summarize the test:

			Reject H0 at level $\alpha$ if
	$H_{0}$	$H_{1}$	$\sigma_{1}^{2},\sigma_{2}^{2}$ known
1	$\mu_{1}- \mu_{2} \leq \mu_{0} $	$\mu_{1}- \mu_{2} > \mu_{0}$	$\bar{x}-\bar{y} \geq \mu_{0}+z_{\alpha}\sqrt{\dfrac{\sigma_{1}^{2}}{m}+\dfrac{\sigma_{2}^{2}}{n}} $
2	$\mu_{1}- \mu_{2} \geq \mu_{0} $	$\mu_{1}- \mu_{2} < \mu_{0}$	$\bar{x}-\bar{y} \leq \mu_{0}-z_{\alpha}\sqrt{\dfrac{\sigma_{1}^{2}}{m}+\dfrac{\sigma_{2}^{2}}{n}} $
3	$\mu_{1}- \mu_{2} = \mu_{0}$	$\mu_{1}- \mu_{2} \neq \mu_{0}$	$\mid \bar{x}-\bar{y}-\mu_{0}\mid \geq z_{\alpha/2}\sqrt{\dfrac{\sigma_{1}^{2}}{m}+\dfrac{\sigma_{2}^{2}}{n}} $

			Reject H0 at level $\alpha$ if
	$H_{0}$	$H_{1}$	$\sigma_{1}^{2},\sigma_{2}^{2}$ unknown and $\sigma_{1} =\sigma_{2}$
1	$\mu_{1}- \mu_{2} \leq \mu_{0} $	$\mu_{1}- \mu_{2} > \mu_{0}$	$\bar{x}-\bar{y} \geq \mu_{0}+t_{m+n-2,\alpha}s_{p}\sqrt{\dfrac{1}{m}+\dfrac{1}{n}} $
2	$\mu_{1}- \mu_{2} \geq \mu_{0} $	$\mu_{1}- \mu_{2} < \mu_{0}$	$\bar{x}-\bar{y} \leq \mu_{0}-t_{m+n-2,\alpha}s_{p}\sqrt{\dfrac{1}{m}+\dfrac{1}{n}} $
3	$\mu_{1}- \mu_{2} = \mu_{0}$	$\mu_{1}- \mu_{2} \neq \mu_{0}$	$\mid \bar{x}-\bar{y} -\mu_{0}\mid \geq t_{m+n-2,\alpha/2}s_{p}\sqrt{\dfrac{1}{m}+\dfrac{1}{n}} $

			or
			Reject H0 at level $\alpha$ if
	$H_{0}$	$H_{1}$	$\mu_{1},\mu_{2}$ known	$\mu_{1},\mu_{2}$ unknown
1	$\sigma_{1}^{2} \leq \sigma_{2}^{2}$	$\sigma_{1}^{2} > \sigma_{2}^{2}$	$\dfrac{\sum\limits_{i=1}^{m}(x_{i}-\mu_{1})^{2}}{\sum\limits_{i=1}^{n}(y_{i}-\mu_{2})^{2}}\geq \dfrac{m}{n}F_{m,n,\alpha} $	$\dfrac{s_{1}^{2}}{s_{2}^{2}} \geq F_{m-1,n-1,\alpha}$
2	$\sigma_{1}^{2} \geq \sigma_{2}^{2}$	$\sigma_{1}^{2} < \sigma_{2}^{2} $	$\dfrac{\sum\limits_{i=1}^{n}(y_{i}-\mu_{2})^{2}}{\sum\limits_{i=1}^{m}(x_{i}-\mu_{1})^{2}}\geq \dfrac{n}{m}F_{n,m,\alpha} $	$\dfrac{s_{2}^{2}}{s_{1}^{2}} \geq F_{n-1,m-1,\alpha}$
3	$\sigma_{1}^{2} = \sigma_{2}^{2} $	$\sigma_{1}^{2} \neq \sigma_{2}^{2} $	$\dfrac{\sum\limits_{i=1}^{m}(x_{i}-\mu_{1})^{2}}{\sum\limits_{i=1}^{n}(y_{i}-\mu_{2})^{2}}\geq \dfrac{m}{n}F_{m,n,\alpha/2} $	$\dfrac{s_{1}^{2}}{s_{2}^{2}} \geq F_{m-1,n-1,\alpha/2}$
			or	or
			$\dfrac{\sum\limits_{i=1}^{m}(x_{i}-\mu_{1})^{2}}{\sum\limits_{i=1}^{n}(y_{i}-\mu_{2})^{2}}\leq \dfrac{m}{n}F_{m,n,1-\alpha} $	$\dfrac{s_{1}^{2}}{s_{2}^{2}} \leq F_{m-1,n-1,1-\alpha/2}$

The test is called F-test

One- Way ANOVA

Null Hypothesis: $H_{0}:\mu_{1} = \mu_{2} = \mu_{3}=$ …$=\mu_{k} = \mu$

Group 1	Group 2	…	Group j	…	Group k

$x_{11}$	$x_{12}$	…	$x_{1j}$	…	$x_{1k}$
$x_{21}$	$x_{22}$	…	$x_{2j}$	…	$x_{2k}$
⋮	⋮	…	⋮	…	⋮
$x_{i1}$	$x_{i2}$	…	$x_{ij}$	…	$x_{ik}$
⋮	⋮	…	⋮	…	⋮
$x_{n_{1}1}$	$x_{n_{2}2}$	…	$x_{n_{j}j}$	…	$x_{n_{k}k}$

$T_{1}=\sum\limits_{i=1}^{n_{1}}x_{i1}$	$T_{2}=\sum\limits_{i=1}^{n_{2}}x_{i2}$	…	$T_{j}=\sum\limits_{i=1}^{n_{j}}x_{ij}$	…	$T_{k}=\sum\limits_{i=1}^{n_{k}}x_{ik}$

$T_{1}^{2}$	$T_{2}^{2}$	…	$T_{j}^{2}$	…	$T_{k}^{2}$

Let G = $\sum\limits_{i=1}^{n_{j}}\sum\limits_{j=1}^{k}x_{ij}$,and $n = \sum\limits_{j=1}^{k}n_{j}$, Define CF = $\dfrac{G^{2}}{n}$, Therefore $\text{TSS} = \sum\limits_{i=1}^{n_{j}}\sum\limits_{j=1}^{k}x_{ij}^{2}- CF$, $\text{BSS} = \sum\limits_{j=1}^{k}\left(\dfrac{T_{j}^{2}}{n_{j}}\right)-CF$, and WSS = TSS - BSS

Source of Variation	SS	df	MS	Fratio	F table

Between Group	$\sum\limits_{j=1}^{k}\left(\dfrac{T_{j}^{2}}{n_{j}}\right)-CF$	$k-1$	BSS/df	MSB/MSW	$F_{\alpha,(k-1,n-k)}$
Within Groups	TSS-BSS	$n-k$	WSS/df

Total	$\sum\limits_{i=1}^{n_{j}}\sum\limits_{j=1}^{k}x_{ij}^{2}- CF$	$n-1$	MS

Reject $H_{0}$ if Fratio > Ftable

Let $X_{1},X_{2},\ldots,X_{n}$ denotes random sample of size $n$ from Normal Population. The population has a mean $\mu$ and standard deviation $\sigma$.

Confidence Interval for population mean $\mu$ and Normal distribution

If the sample data conforms to the normal distribution, a $(1-\alpha)100 \%$-level two sided confidence interval for mean $\mu$ is given by $\left(\bar{x}-z_{\alpha/2}\dfrac{\sigma}{\sqrt{n}},\bar{x}+z_{\alpha/2}\dfrac{\sigma}{\sqrt{n}}\right)$

If $\sigma$ is not known, then

$\left(\bar{x}-t_{n-1,\alpha/2}\dfrac{S}{\sqrt{n}},\bar{x}+t_{n-1,\alpha/2}\dfrac{S}{\sqrt{n}}\right)$

is a $(1-\alpha)100 \%$-level two sided confidence interval for mean $\mu$.

Confidence Interval for population variance $\sigma^{2}$ and Normal distribution

If the sample data conforms to the normal distribution, a $(1-\alpha)100 \%$-level two sided confidence interval for variance $\sigma^{2}$ when mean $\mu$ is known is

$\left(\dfrac{\sum\limits_{i=1}^{n}(x_{i}-\mu)^{2}}{\chi_{n,\alpha/2}^{2}},\dfrac{\sum\limits_{i=1}^{n}(x_{i}-\mu)^{2}}{\chi_{n,1-\alpha/2}^{2}}\right)$

If $\mu$ is unknown then

$\left(\dfrac{(n-1)S^{2}}{\chi_{n-1,\alpha/2}^{2}},\dfrac{(n-1)S^{2}}{\chi_{n-1,1-\alpha/2}^{2}}\right)$

Let $X_{1},X_{2},\ldots,X_{n_{1}}$ and $Y_{1},Y_{2},\ldots,Y_{n_{2}}$ be independent random samples of from two Normal Populations. The means and standard deviations for these populations are $\mu_{1},\mu_{2},\sigma_{1},\sigma_{2}$ respectively.

If sample data confirms the Normal distribution, a $(1-\alpha)100\%$-level two sided confidence interval for $(\mu_{1}-\mu_{2})$ when both $\sigma_{1}^{2},\sigma_{2}^{2}$ are known is given by,

$\left((\bar{x}-\bar{y})-z_{\alpha/2}\sqrt{\dfrac{\sigma_{1}^{2}}{n_{1}}+\dfrac{\sigma_{2}^{2}}{n_{2}}},(\bar{x}-\bar{y})+z_{\alpha/2}\sqrt{\dfrac{\sigma_{1}^{2}}{n_{1}}+\dfrac{\sigma_{2}^{2}}{n_{2}}}\right)$

a $(1-\alpha)100\%$-level two sided confidence interval for $(\mu_{1}-\mu_{2})$ when both $\sigma_{1}^{2},\sigma_{2}^{2}$ are unknown is given by,

$\left((\bar{x}-\bar{y})-t_{n_{1}+n_{2}-2,\alpha/2}S_{p}\sqrt{\dfrac{1}{n_{1}}+\dfrac{1}{n_{2}}},(\bar{x}-\bar{y})+t_{n_{1}+n_{2}-2,\alpha/2}S_{p}\sqrt{\dfrac{1}{n_{1}}+\dfrac{1}{n_{2}}}\right)$

Where $S_{p}^{2}=\dfrac{(n_{1}-1)S_{1}^{2}+(n_{2}-1)S_{2}^{2}}{n_{1}+n_{2}-2}$

Monday, 27 April 2015

Assignment (Sem4)

Dear Students, based on discussion in theory lecture, you are given an assignment on Probability.
As I have checked the solutions submitted by you, I suggest you to revise them as:

There are 10 computers in a store. Among them, 7 are brand new and 3 are refurbished. Four computers are purchased for a student lab. From the first look, they are indistinguishable, so the four computers are selected at random. Compute the probability that among the chosen computers,

two are refurbished
exactly one is refurbished

Answer: Let S denotes set of all possible outcomes when four computers are selected at random
Number of outcomes in S can be calculated as $10 \choose 4$ $=\dfrac{10!}{4! \times 6!}=210$. (These include 0 refurbished, or 1 refurbished, or 2 refurbished, or 3 refurbished computers respectively)
Let A denotes set of all possible outcomes when four computers are selected at random and two of them are refurbished.
Number of outcomes in A can be calculated as $3 \choose 2$ $7 \choose 2$ $=\dfrac{3!}{2! \times 1!}$ $\times \dfrac{7!}{2! \times 5!} = 63$
The required probability is therefore $\dfrac{63}{210}=\dfrac{3}{10}=0.3$

Let B denotes set of all possible outcomes when four computers are selected at random and exactly one of them is refurbished.
Number of outcomes in B can be calculated as $3 \choose 1$ $7 \choose 3$ $=\dfrac{3!}{1! \times 2!}$ $\times \dfrac{7!}{3! \times 4!} = 105$
The required probability is therefore $\dfrac{105}{210}=\dfrac{1}{2}=0.5$

Saturday, 25 April 2015

Normal Distribution

The normal distribution is expressed mathematically as $f(x)=\dfrac{1}{\sqrt{2\pi}\sigma}e^{-\dfrac{1}{2}\left( \dfrac{x-\mu}{\sigma}\right)^{2}}$ $-\infty$ <$x$,$\mu$ < $+\infty$, $\sigma$ > 0.The function $f(x)$ is called probability density function. By taking $Z = \dfrac{X-\mu}{\sigma}$ in formula of Normal distribution, we have $f(z)=\dfrac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}z^{2}}$ $-\infty$ <$z$ < $+\infty$. The function $f(z)$ is called Standard Normal Distribution. The graph of $f(z)$ is called standard normal curve. Computing probability of an event in case of Normal Probability model is explained in notes provided here. The area under the standard normal curve represent corresponding ,Probability (Click here)(proportion or percentage in frequency approach) of observing random variable lying in interval. (Note: This PDF document contains embedded video demonstrating area properties, read a statistical table. You must have latest version of pdf reader to view these videos in file.) For a normal distribution with mean $\mu$ and standard deviation $\sigma$, approximately

68.27$\%$ of the population values lie within one standard deviation ($\pm1\sigma$) of the mean,
95.45$\%$ of the population values lie within two standard deviation ($\pm2\sigma$) of the mean, and
99.73$\%$ of the population values lie within three standard deviation ($\pm3\sigma$) of the mean.

The following Application explains Area properties of Normal distribution. You can select markers on $x-$Axis and move right or left. The change in area is shown numerically. First Select Icon with + sign (see right corner) and move the graph in center region, now select Arrow point Icon (see upper left corenr) and go to $x-$axis. Try this!!! (If you cannot see in Mobile Version send your comments. Thank you)

Pages