Introduction
to Statistics
The word
statistics is coming out from the Latin word status or the Italian word
‘statista’ or the German word ‘statistik’ or French word statistique each of
which means political state. In the general word statistics means the
collection of numerical data for its further use. In the other words statistics
means numerical data.
Definition:-
(i)
This definition is due to ‘Bowley’. He defined that
statistics are numerical statements of facts in any department of enquiry placed
in relation to each other.
(ii)
This definition is due to ‘wallis and Roberts’. They
defined that statistics may be regarded as a body of methods for making wise
decisions in the face of uncertainty.
Ø Functions of statistics:-
Following are the main functions of statistics
(i)
It presents data in a standard form.
(ii)
It provides easy comparison.
(iii)
It simplifies mass data into small figures.
(iv)
It helps to formulate and examine the hypothesis.
(v)
It helps in decision making.
(vi)
It simplifies the process of future planning and forecasting.
(vii)
It helps in budgeting.
Ø Scopes of statistics:
Following are the important
scopes of statistics in different fields.
(i)
Statistics and
the state:-
Statistics is essential for a
country. It supplies essential information to run a government. Different
policies of government are based on statistics. Periodical collection of data
relating to population, national wealth, agriculture, exports, imports,
education, crime etc. are the main guide lines to a government for a good
administration. More over all the ministries and departments of the state like
finance, transport, defense, railway etc depends on statistics for their
efficient use.
(ii)
Statistics and commerce:-
Commerce is a process of
collecting, manufacturing, supplying, and
distribution of goods and services for profit. Statistics is helping in
formulation of policy and making future decisions.
(iii)
Statistics and
economics:-
The scope of statistics in the
field of economics is highly remarkable. Economics is concerned with production
and distribution of wealth as well as consumption, saving and also investment
of income. Above all these things statistics is widely useful.
(iv)
Statistics and
natural science:-
Statistical techniques have
proved extremely useful in the study of all natural sciences like biology,
medicine, astronomy, meteorology, etc.
(v)
Statistics and
accounts:-
Statistics is very much
useful in the field of accountancy like
presentation of accounts and
determination of profits and loss etc.
Ø
Limitations
of statistics:-
Irrespective of so many uses of
statistics it has certain limitations. These are as follows.
Ø
Statistics does not deal with individuals.
Ø
Statistics deals with only quantitative data but
not with qualitative data.
Ø
Statistical results are true only on an average.
Ø
In statistics there may be chance of misuse of
data.
Ø
Statistical data should be uniform and
homogeneous at the time of comparison.
COLLECTION OF DATA
What is mean by data?
What are the main sources of collecting data?
In the
view of a layman data means information. In statistics the data means mass of
information collected from different sources.
The
collection of data is an important task in a statistical enquiry. One should
take care while collecting the data otherwise it leads to wrong conclusions and
faulty decisions.
According
to the basic sources of collecting data may be classified into two types.
i)
Internal and external data.
ii)
Primary and secondary data.
Ø Methods of collecting primary data or
sources of primary data:
Primary data:-The data which is
collected in the process of investigation are called primary data.
For the
collection of primary data the investigator may choose any one of the following
methods.
i)
Direct personal observation.
ii)
Indirect oral interviews.
iii)
Mailed questionnaire method.
iv)
Schedules sent through enumerators.
i)
Direct
personal observations:
In this method
the person who collects data visits the field of survey and collects the
information through discussions with the persons who are either directly or
indirectly in touch with the facts under study. Some times through personal
observations by investigator can get information. In this method of collecting
data the investigator should be very much careful for his public behavior and
he should mix with the people of that locality where he is collecting data.
Merits:
1) Original
data are collected.
2) True
and reliable data are collected.
3) Response
will be encouraging because of personal approach.
Demerits:
1) It
is not suitable where the area under survey is very large.
2) It
is expensive and time consuming.
3) Untrained
investigators will not bring good results.
ii)
Indirect
oral interviews:
Under this method of collecting data
the investigator contacts the third person who is capable of supplying the
necessary information. This method is generally adopted in those cases where
the informants are not interested to respond if investigator approaches
directly. For example if we want to interview a drug addict, he may be
interested to supply information about his own habits. In this case we get the
necessary information from those persons who know him well.
Merits:
1) It
is simple and convenient.
2) It
saves time, money and labor.
3) It
can be used in the investigation of a large area.
Demerits:
1) Interview
with an improper person will spoil the results.
2) In
order to get the real position, a sufficient number of persons are to be
interviewed.
(iii) Mailed questionnaire
method.
Under this method a questionnaire (list of objective type of questions)
are sent to the informants by post. The questionnaire should be attractive and
questions are arranged in such a way that it will take minimum time to answer.
The questionnaire should contain a request note and guidelines that how to
appear the questions. The questionnaire should be in expensive and should
contain a self addressed covering letter to send it back with in specified
time.
Merits:
1) Of
all the methods this method is most economical.
2) This
method is suitable when the area of investigation is very large.
3) It
saves time, money and labor.
Demerits:
1) In
this method there is no direct contact between the investigator and the
informant, therefore we can be sure
about the accuracy of the data.
2) This
method is only suitable when the informants are literate.
3) There
is a chance of delay in receiving of answers from the informants.
(iv) Schedules sent
through investigators:
It is the most widely used method of collecting primary data. In this
method a number of enumerators are selected and trained. They are provided with
a standardized questionnaire and specific training and instructions are given
to them for filling up the schedules. Each enumerator will be the in charge of
a certain area. The investigator goes to the informants along with the
questionnaire and gets replies to the questions in the schedules and records
their answers. They explain the object and purpose of the enquiry.
Merits:
1) This
method is very much useful where the informants are illiterates.
2) In
this method because of direct contact between the investigators and the
Informants, the investigator can get
accurate information.
Demerits:
1) This
method is more expensive and time consuming.
2) The
success of this method is depends upon the trained, intelligent and
qualified investigators.
Ø Sources of secondary data:
The various sources of secondary data can be
divided into two categories
1)
Published sources
2) Unpublished
sources.
1) Published sources:
Under this method the data are
previously collected and published.
Ex: government publications, statistical reports, journal and news
paper, census reports, and report on national sample surveys conducted in India
etc.
2) Unpublished sources.
In this method the data are not
published or data kept as personal use or departmental use.
Ex: books of accounts, files and records of government and private
offices, work
of research of various institutions
and universities or big organizations.
Ø Difference between primary and secondary
data:
Primary data
|
Secondary data
|
1.Primary
data are those data which are collected from the primary sources.
2.Primary
data are known as basic data.
3.The
collection of primary data is more expensive.
4.
It takes more time to collect the data.
5.
Primary data are more accurate.
6.
Primary data are known as first hand data.
7.
Primary data are not readily available.
8.
It is required to take much care at the time of collecting data
|
1.Secondary data are those data
which are collected from the secondary sources.
2.Secondary data are known as
subsidiary data.
3.The collection of secondary data
is comparatively less expensive.
4.It takes less time to collect the
data.
5.Secondary data are less accurate
than the primary data.
6.Secondary data are known as second
hand data.
7.Subsidiary data are readily
available.
8.It is not required to take much
care at the time of collecting data.
|
Data
collected either from primary sources or the secondary sources are called raw
data. Classification means arrangement of data or grouping of data according to
there behavior, nature and characteristics.
Types of
classification:
Statistical data may be classified according to different
characteristics. There are four important types of classifications.
i)
Geographical(spatial) classification
ii)
Chronological classification.
iii)
Quantitative classification.
iv)
Qualitative classification.
i)
Geographical(spatial)
classification:
Classification of data according to geographical areas is called
Geographical(spatial) classification.
Ex: Statewise classification of
production of food grains in India :
State
|
Production of food
grains(in tons)
|
Orissa
A.P
U.P
|
3,00,000
2,50,000
22,00,000
1,00,00,000
|
ii) Chronological
classification.
In this type
of classification the data are classified according to different time periods.
Ex: Population
of India
for different time periods.
Profits of a business establishment over different years.
Year
|
Population (in
crores)
|
1921
1931
1941
1951
|
24.8
27.3
31.8
35.6
|
(iii) Quantitative
classification:
In quantitative classification the data are
classified according to some characteristics that can be measured numerically
such as height, weight, production, income, marks secured by the students etc.
Ex: Students of a college may be classified according to
there weights as given in the table
Weight(in Kg)
|
No of students
|
30-40
40-50
50-60
60-70
|
20
25
40
45
|
(iv) Qualitative
classification:
In
qualitative classification the data are classified on the basis of attributes
or quality such as sex, colour of hair, literacy, religion etc.
Tabulation of data
Ø Types of tables:
Tables may
broadly classified into two categories.
1) Simple
and complex tables
2) General
purpose and special purpose (or) summery tables.
1) Simple and complex tables:-
The distinction between simple and complex
table is based on the number of characteristics studied.
In a simple
table only one character is shown. Hence this type of table is also known as
one-way table. On the other hand in a complex table two or more characteristics
are shown. When two characteristics are shown such a table is known as two-way
table or double tabulation.
Example of one way
table:
Number of employees in state
bank according to age group
Age (in years)
|
No. of employees
|
Below 25
25-35
35-45
Above 45
|
3
2
2
3
|
Example of two-way
table:
Number of
employees of state bank in different
age
groups according to sex
Age(in years)
|
Employees
|
Total
|
|
Males
|
Females
|
||
Below 25
25-35
35-45
Above 45
|
3
4
5
2
|
2
5
6
3
|
5
9
11
5
|
Total
|
14
|
16
|
30
|
When three or more characteristics are represented in the
same table is called three-way tabulation. As the number of characteristics
increases, the tabulation becomes so complicated and confusing.
2)
General and
special purpose tables:
General purpose tables some
times termed as reference tables or information tables. These tables provide
information for general use of reference. They usually contain detailed
information and are not constructed for specific discussion. These tables are
also termed as master tables.
Ex: The detailed tables prepared in census reports belong to this
class.
Special purpose tables also
known as summery tables which provide information for particular discussion.
These tables are constructed or derived from the general purpose tables. These
tables are useful for analytical and comparative studies involving the study of
relationship among variables.
Ex: Calculation of analytical statistics like ratios, percentages,
index numbers, etc is incorporated in these tables.
Ø General format of a table or parts of
table:
The main
parts of a table in general are the following.
1) Table
number
2) Title
of a table
3) Caption
4) Stub
5) Body
of the table
6) Head
note
7) Foot
note
8) Source
note
1)
Table
number:
Each table should be
numbered. The table number may be given either in the center at the top above
the title or in the bottom of the table on the left hand side.
2)
Title of
the table:
Every table must be given a
suitable title. The title is a description of the contents of the table. The
title should be clear, brief and self explanatory. The title should be as short
as possible. Its lettering should be the most prominent of any lettering on the
table.
3)
Caption:
caption refers to the column headings. It explains what the column represents.
It may consist of one or more column headings. The caption should be clearly defined
and placed at the middle of the column. As compared with the main part of the
table the caption should be shown in smaller letters. This helps in saving
space.
4)
Stub:
As distinguished from caption stubs are the designations of the rows or row
headings. They are at the extreme left and perform the same function for the
horizontal rows of numbers in the table as the column headings do for the
vertical columns of numbers.
5)
Body of
the table: The body of the table
contains the numerical information. This is the most vital part of the table.
Data presented in the body arranged according to the description or
classification of the captions and stubs.
6)
Head note:
It is used to explain certain points relating
to the table that have not been
included in the title nor in the caption or stubs. For example the unit of
measurement is frequently written as head note such as ‘in thousands’ or ‘in
million tonnes’ or ’in crores’
7)
Foot note:
Any thing in the table which the reader may find difficult to understand from the
title, captions and stubs should be explained in the foot notes. If foot notes
are needed they are placed directly below the body of the table. Ex: M for
male, F for female etc.
8)
Source
note: A source note is used
where the data are collected by the agency or person other than the one presenting them.
Format of a table
Ø Difference between classification and
tabulation:
Classification
is a process of classifying or grouping of raw data according to their object,
behavior, purpose and usages. Tabulation means a logical arrangement of data
into rows and columns.
Classification is the first step to arrange the data. Where as
tabulation is the second step to arrange the data.
The main
object of the classification to condense the mass of data in such a way that
similarities and dissimilarities can be readily find out. But the main object
of the tabulation is to simplify complex data for the purpose of better
comparison.
Ø Objects (or) merits (or) advantages(or)
role of a tabulation:
Following
are the main objects of the tabulation or tabular presentation of statistical
data:
i)
It simplifies the complex data.
ii)
It facilitates comparison.
iii)
It helps to give better identity to the data.
iv)
It provides a good means of arrangement.
Recommended book for this Paper