1.gif (1892 bytes)

Essentials of Biostatistics

Indian Pediatrics 1999; 36:691-696 

2. Desings of Medical Studies

A. Indrayan
L. Satyanarayana*

From the Division of Biostatistics and Medical Informatics, University College of Medical Sciences, Dilshad Garden, Delhi 110095, India and *Institute of Cytology and Preventive Oncology, Maulana Azad Medical College Campus, New Delhi 110 002, India.

Reprint requests: Dr. A. Indrayan, Professor of Bio-statistics, Division of Biostatistics and Medical Informatics, University College of Medical Sciences, Dilshad Garden, Delhi 110095, India.

It may have been realized from the previous article(l) that empiricism is the backbone of medical knowledge. Studies in various forms are constantly carried out to acquire better and wider knowledge. To effectively detect a trend, generally a series of subjects is studied. Empiricism requires that the steps to study such a series should essentially consist of preparation of a protocol, collection of observations, and their collation and analysis. The tool which really architects the study is the protocol. It delineates the procedures to be followed from the beginning to the end - the inclusion and the exclusion criteria for the subjects, the method of selection, the number of subjects to be studied of various types, the methods to be adopted for collection of observations including modes of measurement, and the plan of analysis. Proper justification including the ethics is given for the procedures to be followed at each step. A protocol is considered complete when the background of the study explaining the need is also described along with a short review of the literature appraising the existing status. It also contains a clear statement on the broad and specific objectives of the study. Much of the planning depends on these objectives. We give a brief overview of various types of designs in Section 2.1 and give details of each in the subsequent sections.

2.1 Types of Study Design

Studies where the primary objective is to evaluate the relationship between a cause and an effect, an exposure (a risk factor or a protective factor) and a disease, or an antecedent and an outcome, are called analytical studies. These have further subdivisions (Fig. 1). The other class of studies is descriptive which seeks to delineate the magnitude of the problem in different segments of the population, say, in terms of prevalence and incidence, or to establish normal and abnormal levels of measurements. Descriptive studies can also provide the spectrum of clinical findings in a group of patients. Most of the descriptive studies are cross-sectional in nature and are generally called surveys. A descriptive study provides the distribution of the disease or the health condition by person, place or time. An analytical study on the other hand is designed to provide the answer to the question why some subjects are more commonly affected than the others. The designs discussed in this article related to the analytical type rather than to the descriptive type.


The term observational study is used for an analytical study which investigates the natural course of events. Opposed to these are experiments which necessarily entail an intervention. We discuss them in the folIowing sections.

2.2 Observational Studies

A study is observational when the natural course of events is monitored without any intervention. In this context, it is helpful to appreciate the statistical distinction between a factor and a response. While the term factor generally is used for any attribute or measurement of consequence but, statistically speaking, factor is what is already known about the subjects. The response is elicited during the course of the investigation. In a study on birth weight and maternal anemia, the latter is antecedent and the former is outcome. There are three ways that this study can be carried out. These are shown in Table I. In the setup of the type shown in Table I; study design A, known anemic and non-anemic women are observed and the number of subjects in various birth weight categories become known only after the data are colIected. Note that in this design antecedent characteristic (anemia) is a factor and the outcome (birth weight) is a response. This is a natural course since outcome in any case occurs afterwards. A study folIowing such a design is calIed a prospective study or a cohort study. The second method to investigate this relationship is to first enlist babies of different birth weights and then elicit the anemia status of their mother through records or otherwise. It is necessary in this setup that the birth weight is known. Note that antecedent is the response and outcome is the factor in this case. Such a design moves in a reverse direction from known outcome to the "exposure" (Table I; study design B). A study using this design is calIed a retrospective study or a case-control study.

Table 1

Three Designs (A,B and C) for studying the relationship Between Maternal
Anemia and Birth Weight

  A (Prospective) B (Retrospective) C(cross-sectional)
Birth Maternal anemia Maternal anemia Maternal anemia
Weight(g) Yes No Total Yes No Total Yes No Total
<2500     ?     n1     ?
2500     ?     n2     ?
Total n1 n2 n ? ? n ? ? n

The third method is to take a sample of total women without any consideration of either. Maternal anemia and birth weight both are elicited. In this design, the number of women with and without anemia and with babies of different birth weight would be known only after the survey is completed (Table I; study design C). Neither the antecedent nor the outcome is a factor in this setup and both are response. Such a study is calIed cross-sectional. There could be setups where the antecedent and outcome are not distinctive. Gender and blood group are genetically deter- mined simultaneously and none is antecedent to the other. Then cross-sectional is the design of choice. The details of these three designs are as follows.

Cohort or Prospective Studies

A cohort is a group of subjects which shares a common base and followed forward in time. They can, for example, be male and female newborns followed-up to study their physical growth upto adolescent period or a group of asphyxiated and non-asphyxiated newborns followed up for a year or more to look at their cognitive abilities. The baseline generally is exposure or non-exposure to a "risk factor". They are followed for an ad- equate length of time so that the outcome such as . occurrence or non-occurrence of disease, pregnancy, or improvement in health can be noticed. It is not necessary that the follow-up is in future. Records of births occurring in the year 1970 can be examined (if available) for happenings till the year 1990. This then is called retrospective cohort.

A basic feature of a prospective study is that the incidence of outcome such as disease or recovery is evaluated in those who are exposed. It is many times helpful to study a parallel group of non-exposed for comparison. Such parallel group is called a control group. This group should ideally be exactly similar to the exposed group except for the exposure. This matching is specially required for those factors which could influence the outcome. For details of planning and analysis of cohort studies, consult Diggle et al.(2).


A cohort is generally a sample from a larger group of subjects, the population, who are the target for extrapolation of the results. Selection bias is said to have occurred when the study group has a different composition than in the target population. Major task in cohort studies is to accomplish successful follow-up of all the subjects. Bias can occur due to change in the status (e.g., from under weight to normal) in the follow-up period. Confounding bias occurs when two or more factors move together, such as malnutrition and infections. The tools used to assess the subjects can introduce bias if they are not sufficiently valid. The identification and resolution of various sources of bias is primarily a matter of epidemiological judgement. The success of a cohort study depends often on the care taken by the investigator in recognizing and correcting these biases.

Case-Control or Retrospective Studies

A study which moves from outcome to the antecedents may seem unnatural but is generally considered more efficient. In this setup, frequency of presence of antecedents in those with one outcome (disease) is compared with the frequency in those with another outcome (no disease) to come to a conclusion. A good reference on analysis of case-control studies is Breslow and Day(3).

The basic advantage of a case-control design is that the long period of follow-up is avoided. This can drastically reduce the cost.

A case should either be newly diagnosed (incident) or already existing (prevalent) subject. Inclusion of prevalent cases, particularly for chronic disease, can easily increase the sample size but a care is required at the time of interpretation of results since the factors determining the duration of disease could be important. Newly diagnosed cases do not have any such problem and thus are preferable. Cases included should ideally be representative of all persons with the outcome ( or disease) but most case-control studies are carried out on sample that is not necessarily randomly drawn. Random selection is specially important in the case of descriptive studies but probably not so much in the case of analytical studies. Experience suggests that the relationship between antecedent and outcome can be adequately assessed despite a non-random sample in some cases so long as the bias is kept under check. The method of control selection is same as explained in the previous section.

Cross-Sectional Studies

An analytical study in which the antecedent and the outcome are simultaneously investigated in a group of subjects is called a cross-sectional study. This implies that out- come must have occurred in at least some subjects. Cross-Sectional design may turn out to be a poor choice in situations where either the antecedent or the outcome or both are rare.

The cross-sectional design is particularly well suited to acute conditions with short latent period such as typhoid and measles or to those chronic diseases that are not fatal (for example, congenital malformations). This type of study is generally considered a rapid and an inexpensive way to provide clues for further and more valid investigations. It seldom provides explicit answers.

2.3 Experimental Studies

Experiment is an investigation of the effect of a deliberate intervention or a stimulus so as to change the course of events. The investigator keeps a control on allocation of experimental units to different types of interventions, and the conditions of experiments can be mostly standardized. Thus a cause-effect type of relationship can be easily inferred. An experiment can be carried out in a laboratory, clinic or field. The subjects for experiments in the clinic or field are human beings and the experiment " is generally termed a trial. Laboratory experiments, on the other hand, may involve inanimates such as physical forces or chemicals but, in the context of medicine, these are conducted generally on animals. These experiments prepare a valid base for clinical trials.

Two cardinal principles of experimentation are randomization and replication. The first envisages random allocation of subjects to different stimuli, one of which could be null - no stimulus or an inert procedure such as an ineffective pill or saline injection - also called a placebo. If the subjects are heterogeneous with respect to the prognostic factors then stratification is done. The subjects in each stratum across groups should be similar to begin with yet randomization is advocated so that the residual or unforeseen heterogeneity also gets chance to be equally divided among various groups. Randomization also tends to remove the possible bias of the investigator in allocating and observing subjects with different stimuli. In addition, randomization is a necessary ingredient for validity of the statistical methods that are used to analyze data. This could be achieved either by draw of lots or by use of random numbers. The second method, namely, the replication is the process of repeating the same treatment on more than one experimental unit. It provides data to quantify the experimental error and helps to reduce it. It increases the confidence in the results and helps in bringing the clear signals to the fore.

Clinical Trials

Clinical trials are experiments on human beings and therefore require an extreme care. The intervention generally is a therapeutic agent or a modality but could also be a preventive or rehabilitative procedure. The primary objective of clinical trials usually is to evaluate safety and efficacy of the treatment in individuals with different severity of diseases and of various backgrounds. An important consideration in a trial is the ethics of conducting a trial on the subjects some of which could be sick. The treatment is under trial is itself an indication that its utility is doubtful. Some such important considerations are: (i) have the subjects been informed about the potential benefits and possible side-effects and their consent obtained? (ii) is it ethical to use a placebo on some subjects who are sick? and (iii) is the treatment regimen under test reasonably safe?

Clinical trials are precarious and need to be pursued in phases. Phase I is a trial on volunteers, primarily to assess side effects. Phase II is a larger trial for efficacy in the target patients. Phase III is a randomized control trial (RCT) on a large number of subjects and controls to delineate the doses, the type of beneficiaries, rare side-effects, etc..

There is a tendency in the subjects to respond differently depending upon whether they are in treatment group or in placebo group. To control this bias, two precautions are taken. First, the placebo is designed to look like the treatment so that there is no apparent difference for the subjects or for any nursing staff to detect. Secondly, the subjects are not told that they are getting treatment or placebo though they know that they are participating in a trial. This is called blinding and is a very potent tool in clinical trials. Similar biases can occur at the observer's end. If he knows that a particular subject is a case or is a control, this may well affect the way questions are asked, investigations done or interpretations made. When either the subject or the observer is blind about the treatment received by particular subjects, then this is called single blinding and when both are blind then it is called double blinding. Sometimes even the data analyst be- comes interested in particular findings and can gear the analysis and interpretation accordingly. To avoid this, he also is kept blind about the codes. The codes are broken only after the data analysis is complete. This makes the trial triple blind.

In a clinical setup, placebo and blinding are easily said than implemented. Ethics might contraindicate a placebo. Existing treatment regimen then is taken as control. Medicinal rituals such as type and frequency of examination or transfer of patients from one unit to the other can break the code. Extremely careful strategy may have to be developed in some trials so as to reduce the effect of bias, if not to eliminate it. For further details of clinical trials, see Spilker(4).


1. Indrayan A, Satyanarayana L. Essentials of Bio- statistics: 1. Medical uncertainties. Indian Pediatr 1999; 36: 476-483.

2. Diggle PI, Liang K, Zeger SL. The Analysis of longitudinal Data. Oxford Statistical Sciences Series, No. 13. Oxford, Clarendon Press, 1994.

3. Breslow NE. Day NE. Statistical Methods in Cancer Research; Vol. I - The analysis of Case- Control Studies, Lyon. International Agency for Research on Cancer, 1980.

4. Spilker B. Guide to Clinical Trials. New York, Raven Press, 1991.



Past Issue

About IP

About IAP



 Author Info.