Maths @ CHARUSAT: Introduction to Bio statistics

Dear Students this page shows the brief notes compiled from books prescribed in syllabus for your information only

Text Books

Robert R. Sokal and F. James Rohlf: Introduction to Biostatistics, Dover Publications.
Olive Jean Dunn and Virginia A Clark: Basic Statistics, A primer for the Biomedical Sciences,Fourth Edition, John Wiley & Sons.
Wayne W. Daniel: Biostatistics, A foundation for Analysis in the Health Sciences,Eighth Edition, John Wiley & Sons
Bernard Rosner: Fundamentals of Biostatistics, Duxbury, Thomson Learning, Fifth Edition

Some important concepts that we have discussed in lectures are^[1]:

Biostatistics: We may define biostatistics as the application of statistical methods to the solution of biological problems arising say in health-related sciences, agricultural sciences. We also call Biostatistics as biological statistics or biometry. In modern sense statistics may be defined as the scientific study of numerical data based on natural phenomena. For example number of peas in a pod, the heartbeats of rats in response to adrenalin, the mutation rate in a maize after irradiation, or the incidence or morbidity in patients treated with a vaccine. However Scientist have interfered with the phenomenon through their intervention.

Population: The biological definition of this term refers to all the individuals of a given species (perhaps of a given life-history stage or sex) found in a circumscribed area at a given time. In statistics, population always means the totality of individual observations about which inferences are to be made, existing anywhere in the world or at least within a definitely specified sampling area limited in space and time. For example,

If we carry out six replicate determinations of sodium in a certain material, then the six individual observations constitute a sample from a population of all determinations of sodium that could have been made with that measurement technique.
A population may represent the outcomes of experiments, such as all the heartbeat frequencies produced in an animal by injections of adrenalin.A given experiment, such as the administration of adrenalin to animal,could be repeated as long as the experimenter could obtain material .
In an experiment to study the number of leukocytes in peripheral blood of Five male patients and suppose we draw conclusion about all men from the group of five male patients then Population from which this five males are selected constitutes all extant males of species Homo sapiens.

Populations can be thought of as existing or conceptual:

Conceptual population cannot be visualized, but one can think of sets of measurement like characteristics of all diseased persons at present and in near future, or an effect of treatment given to large number of individuals though it is not possible to administer this treatment to all individuals.

Sample :
We shall first discuss Variable: The characteristic or property measured of an item (or object or individual or more general sampling unit under study) will be referred to as Variable and corresponding value is called an observation. For example If we measure weight, (also blood pH and red cell count say) in 100 rats, then weight of each rat is an individual measurement; the hundred rat weights together represent sample of observations. Each mouse( a biological individual) is the smallest sampling unit. If we study weight in a single rat over a period of time, the sample of individual observations would be the weights recorded on one rat at successive times. If we consider estimate of DNA content of a single mammalian sperm cell to be an individual observation, the sample of observations may be estimates of DNA content of all sperm cells studied in one individual mammal. Thus Sample may be defined as collection of individual observations selected by a specific procedure. In a group of 25 mice, measurements are obtained on blood pH and the erythrocyte count, (two variables are studied). Then we say the pH readings and cell counts are individual observations and two samples of 25 observations, or a bivariate sample of 25 observations, each referring to a pH reading paired with an erythrocyte count.

Variables in Biostatistics:
A variable is a characteristic with respect to which individuals in a sample differ from each other. If the property does not differ within sample, it cannot be of statistical study. Length, Weight, Height, number of teeth, vitamin C content, and genotypes are examples of variables in ordinary genetically and phenotypically diverse groups of organisms. Warm-bloodedness in a group of mammals is not a variable, since mammals are all alike in this regard but body temperature of individual mammals is a variable.
We can divide variables as:

Variables
Measurement Variables
Continuous Variables	discontinuous Variables
Ranked Variables
Attributes

Measurement variables are those measurement or counts that are expressed numerically. Continuous variables are values that can fall anywhere corresponding to points on a line segment. For example lengths, areas, volumes, weights, angles, temperatures, period of time, percentages, concentrations, and rates. Discontinuous (or discrete) variables are those that can take on only a finite (or countably infinite) number of outcomes. For example numbers of given structure (such as segments, bristles, teeth, or glands), number of offspring, number of colonies of microorganisms or animals, or number of plants in a given quadrat.

Some variables cannot be measured but at least can be ordered or ranked by their magnitude. For example in an experiment one might record the rank order of emergence of ten pupae without specifying the exact time at which each pupa emerged. Examples in the medical field typically relate to degrees of change in patients after some treatment (such as: vast improvement, moderate improvement, no change, moderate degradation, vast degradation/death), a level of intensity, growth.

Variables that cannot be measured but must be expressed qualitatively are called attributes, or nominal variables.Nominal variables have distinct levels that have no inherent ordering.Hair color and sex, death (no or yes) of an experimental animal in an antibiotic study and growth or no growth of an organism in a culture medium investigation are examples of variables that would be described as nominal.

Descriptive Statistics:The sample observations are summarized such that it describes the certain characteristics of sample that are corresponding to that of population of interest (under the assumption that the sample is representative of that population).

Two types of descriptive statistics are Statistics of location and Statistics of dispersion.

Statistics of location (Measures of Central Tendency)

Arithmetic Mean

The arithmetic mean of a variable is obtained by dividing the sum of its given values by their number. If the variable is denoted by $x$ and if $n$ values of $x$ are given: $x_{1},x_{2},\ldots,x_{n}$, then arithmetic mean of $x$ is $\bar{x}=\dfrac{\sum\limits_{i=1}^{n}x_{i}}{n}$
Properties of Arithmetic Mean

The sum of the deviations of the given values of variable from its mean is necessarily zero. If $x_{1},x_{2},\ldots x_{n}$ are $n$ values of variable $x$ and $\bar{x}$ denotes mean of $x$, then $\sum\limits_{i=1}^{n}{(x_{i}-\bar{x})}=0$.
If variables $x$ and $y$ are related as $y=a+bx$, corresponding to the $n$ values of $x$, $x_{1},x_{2},\ldots,x_{n}$ there are $n$ values of $y$ as $y_{1}=a+bx_{1},y_{2}=a+bx_{2}\ldots y_{n}=a+bx_{n}$,Then mean of $y$ is given by $\bar{y}=a+b\bar{x}$. Where $a,b$ are constants.
If the given values of variable $x$ are all equal to a constant $a$, $x_{1}=a,x_{2}=a,\ldots x_{n}=a$ then mean of variable is same as the common value. That is $\bar{x}=a$.
Let there be two sets of values of variable $x$, the number of values in two sets being $n_{1}$ and $n_{2}$ and means being $\bar{x}_{1}$ and $\bar{x}_{2}$, then mean of variable $x$ when values in two sets are taken together is given by $\bar{x}=\dfrac{n_{1}\bar{x}_{1}+n_{2}\bar{x}_{2}}{n_{1}+n_{2}}$

Median
If the given values of $x$ are arranged in an increasing or decreasing order of magnitude, then middle-most value in this arrangement is called median of $x$.The median may alternatively be defined as a value of $x$ such that half of the given values of $x$ are smaller than or equal to it and half are greater than or equal to it.
When the number of values, $n$ is odd, the middle-most value- that is $\dfrac{\left(n+1\right)}{2}$th value in arrangement will be the unique median of $x$.
When $n$ is even, there will be no unique median. Any number between $\dfrac{n}{2}$th and $\left(\dfrac{n}{2}+1\right)$st values of $x$ in the arrangement, being regarded as middle-most. The arithmetic mean of $\dfrac{n}{2}$th and $\left(\dfrac{n}{2}+1\right)$st values is accepted as the median of $x$.

Mode
The mode of a variable is the value of the variable having the highest frequency.
Geometric Mean
If a variable $x$ has $n$ given values, $x_{1},x_{2},\ldots,x_{n}$,then its geometric mean is defined by $\text{GM} = \left(\prod\limits_{i=1}^{n}x_{i}\right)^{1/n}$
Also,$\log{GM}=\dfrac{1}{n}$ $\sum\limits_{i=1}^{n}$ $\log x_{i}$

Thus logarithm of the geometric mean of a variable is the arithmetic mean of its logarithm.
Harmonic Mean
The harmonic mean of a variable $x$, with the given values $x_{i},(i=1,2,\ldots,n)$ is defined by,
$\text{HM} = \dfrac{n}{\sum\limits_{i=1}^{n}\dfrac{1}{x_{i}}} $ or $\dfrac{1}{\text{HM}}=\dfrac{1}{n}\sum\limits_{i=1}^{n}\dfrac{1}{x_{i}}$

The second formula shows that the reciprocal of the harmonic mean of a variable is the arithmetic mean of its reciprocal.

Statistics of Dispersion

Range: The simplest measure of dispersion of a variable is its range, which is defined as the difference between its highest and lowest given values.

Mean Deviation
If $A$ is the chosen average value of the variable $x$, then $x_{i}-A$ is the deviation of the $i^{th}$ given value of $x$ from the average. Clearly the higher the deviations $x_{1}-A,x_{2}-A,\ldots,x_{n}-A$ in magnitude, the higher is the dispersion of $x$. The arithmetic mean of absolute deviations $|x_{1}-A|,|x_{2}-A|,\ldots,|x_{n}-A|$ may be taken as the measure of dispersion. It is referred to as the mean deviation of $x$ about $A$. Denoting this mean deviation by $\text{MD}_A$, we have $\text{MD}_A=\dfrac{\sum\limits_{i=1}^{n}|x_{i}-A|}{n}$. Note that the mean deviation is least when measured about median of variable.

Standard Deviation
If $A$ is the chosen average value of the variable $x$, then $x_{i}-A$ is the deviation of the $i^{th}$ given value of $x$ from the average. Clearly the higher the deviations $x_{1}-A,x_{2}-A,\ldots,x_{n}-A$ in magnitude, the higher is the dispersion of $x$. By taking positive square root of the arithmetic mean of squares of the deviations $\left(x_{i}-A\right)^{2}$, i.e. $\sqrt{\dfrac{\sum\limits_{i=1}^{n}\left(x_{i}-A\right)^{2}}{n}}$ is called the root-mean-square deviation about $A$.

The measure of dispersion obtained by putting $\bar{x}$ for $A$ above is called the standard deviation of $x$ and is denoted by $s$ or $S_{x}$. We have therefore $s=\sqrt{\dfrac{\sum\limits_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}{n}}$

Mathematical Properties of Standard Deviation

If the given values of variable $x$ are all equal to a constant $a$, $x_{1}=a,x_{2}=a,\ldots x_{n}=a$ then $S_{x}= \sqrt{\dfrac{\sum\limits_{i=1}^{n}{(x_{i}-\bar{x})}^2}{n}}=0$ where $S_{x}$ denotes standard deviation of $x$
If variables $x$ and $y$ are related as $y=a+bx$, corresponding to the $n$ values of $x$, $x_{1},x_{2},\ldots x_{n}$ there are $n$ values of $y$ as $y_{1}=a+bx_{1},y_{2}=a+bx_{2}\ldots y_{n}=a+bx_{n}$, Then $S_{y}=|b|S_{x}$, where $S_{x}$ and $S_{y}$ denote standard deviations of $x$ and $y$ respectively.Where $a,b$ are constants.

Maths @ CHARUSAT

Pages

Wednesday, 29 April 2015

Introduction to Bio statistics

No comments:

Post a Comment