Top Searches
Trending Searches
New Articles
Top Articles
Trending Articles
Featured Articles
Top Members

Posted by emerjoytvale in Education on August 12th, 2010

THE STUDY OF STATISTICS

WHY STATISTICS?

·         important in empirical studies

·         aids in decision making

·         helps to forecast or predict future outcomes

·         estimates unknown values

·         aids in making inferences, comparison or establishing relationship

·         summarizes data

DEFINITIONS

·         (plural sense) – a set of numerical data or observations

Ex.

o   Vital statistics in a beauty pageant

o   Yearly income

o   Monthly expenses

·         (singular sense) – a branch of science which deals with the collection, presentation, analysis and interpretation of data.

MAIN DIVISION OF STATISTICS

·         Descriptive Statisticspertains to the methods dealing with the collection, organization and analysis of a set of data without making conclusions, predictions or inferences about a larger set.

o   the main goal is simply to provide a description of a particular data set

o   the conclusions or the important characteristics apply only to the data on hand

·         Inferential Statisticpertains to the methods dealing with making inferences, estimation or prediction about a larger set of data using the information gathered from a subset of this larger set

o   the main goal is not merely to provide a description of a particular data set but also to make prediction and inferences based on the available information gathered

o   the conclusions or the important characteristics apply to a larger set from which the data on hand is only a subset

Collection of Data – refers to the method of data gathering

Presentation of Data – refers to the process of organizing data such as tabulation, presenting through the use of charts, graph or paragraphs

Analysis of Data – refers to the methods of obtaining necessary, relevant and noteworthy information from the given data

Interpretation of Data – refers to the tools of drawing of conclusions or inferences from the analyzed data

Examples:

Descriptive Statistics

·         A college dean wants to determine the average semestral enrolment in the past 5 school years.

·         An instructor wants to know the exact number of students who pass in his subject.

·         A school president wants to determine the number of student-dropouts for this current school year

Inferential Statistics

·         A college dean wants to forecast the average semestral enrolment based on the enrolment for the last 5 school years

·         An instructor would like to predict the number of students who will pass in his subject based on the number of failures last year

·         A school president would like to estimate the number of student dropouts next school year based on the current data.

DESCRIPTIVE STATISTICS

POPULATION AND SAMPLE

·         Population – collection of all cases in which the researcher is interested in a statistical study

-          the entity that the researcher wished to understand

Examples:

·         All subjects of University of Bohol

·         All department heads in UB

·         Sample – a portion or a subset of the population from which the information is gathered

-          a representation of the population

Examples:

·         All students of University of Bohol  coming from the rural areas

·         All department heads in UB who have finished Ph.D. Degree

·         Parameter – a numerical characteristic of a population

-          denoted by small Greek letters

Examples:

·         µ - population mean

·         σ – population standard deviation

·         Statistic – a numerical characteristic of a sample

-          Denoted by lower case letters of the English alphabet

Examples:

·         X – sample mean

·         SD – sample standard deviation

TYPES OF DATA

·         Variable – a characteristic or attribute of persons or objects which assumes different values or label

·         Measurement – process of assigning the value or label of a particular variable for a particular experimental unit

·         Experimental unit – the person or the object on which a variable is measured

·         Classification of Variables

Ø  Qualitative Variable – yields categorical or qualitative responses

Examples:       Civil Status (Single, Married, Widow, etc.)

Religious Affiliation (Catholic, Protestant, etc.)

Ø  Quantitative Variable – yields numerical responses representing an amount or quantity

Examples:       Height, Weight, no. of children

Types of Quantitative Variable

Ø  Discrete Variable – assumes finite or countable infinite values such as 1,2,3,…

Example:         no. of children (0,1,2,3,…)

no. of student-dropouts

Ø  Continuous Variable – cannot take on finite values but the values are related/associated with points on an interval of the real line

Examples:       Height (5’4”)

Weight (130.42 kilos)

Temperature (32.5°C)

Levels of Measurement

Ø  NOMINAL LEVEL

-          Crudest form of measurement

-          Numbers or symbols are used for the purpose of categorizing subjects into groups

-          The categories are mutually exclusive, that is being in one category automatically excludes inclusion in another

-          The categories are exhaustive, that is all possible categories of a variable should be included

Examples:       Sex:                             1 – Male                      0 – Female

Faculty Tenure:           1 – Tenured                 0 – Non-Tenured

Ø  ORDINAL LEVEL

-          Improvement of nominal level

-          Order/rank the data in a somewhat “bottom to top” or “low to high” manner

Examples:       Class Standing (Excellent, Good, Poor)

Teacher Evaluation     1 – Poor

2 – Fair

3 – Good

4 – Very Good

Ø  INTERVAL LEVEL

-          Possesses the properties of the nominal and ordinal levels

-          Distances between any two numbers on the scale are known

-          Does not have a stable starting point (an absolute zero)

Example: Consider the IQ scores of four students

70, 140, 75 and 145

Here, we can say that the difference between 70 and 140 is the same as the differences between 75 and 145 but we cannot claim that the second student is twice as intelligent as the first.

Ø  RATIO LEVEL

-          Possesses all the properties of the nominal, ordinal and interval levels and in addition, this has or absolute zero point

-          We can classify it, place it in proper order

-          We can also compare magnitudes

Examples:             Age, Income, Exam Scores

SUMMARY CHARACTERISTICS OF LEVELS

OF MEASUREMENT

Levels of Measurement           Classify           Order               Equal Limits    Absolute Zero

NOMINAL                             Yes                  No                   No                   No

ORDINAL                             Yes                  Yes                  No                   No

INTERVAL                            Yes                  Yes                  Yes                  No

RATIO                                    Yes                  Yes                  Yes                  Yes

·         Other Classification of Data

Ø  Raw Data                    - data in their original form and structure

Ø  Grouped Data             - data placed in tabular form

Ø  Primary Data              - measured and gathered by the researcher that published it

Ø  Secondary Data          - any republication of data by another researcher or agency

METHODS OF DATA COLLECTION

·         Observation Method

-          Data can be obtained by observing the behavior of persons or objects but only at a particular time of occurrence

-          The data obtained is called an observational data

·         Experimental Method

-          Especially useful when one wants to collect data for cause and effects studies

-          There is actual human interference with the conditions or situations that can affect the variable under study

-          Prevalent in scientific researches

·         Use of Existing Studies

-          CHED or DECS enrollment data

-          Census Data

·         Registration Method

-          Respondents provide the necessary information in observance and compliance with existing laws

-          Our registration, birth registration, student registration, voter’s registration

·         Survey Method

-          The desired information is obtained through asking questions

Common Forms of Survey Method

Ø  Personal Interview Method

-          There is a person-to-person contact between the interviewer and the interviewee

-          Considered as one of the most effective methods of data collection because accurate and precise information can be directly obtained and verified from the respondents

-          Higher response rate

-          Can be administered to the respondents one at a time

Ø  Questionnaire Method

-          Considered the easiest method of data collection

-          Utilizes an instrument which is the questionnaire as a tool

-          Lower response rate

-          Can be administered to a large number of respondents simultaneously

General Classification of Data Collection

Ø  Census or Complete Enumeration

-          Method of gathering data from every unit in the population

-          Not always possible because of money, time and effort

Ø  Survey Sampling

-          Method of gathering data from every unit in the selected sample

-          Reduces cost, greater speed, scope and accuracy

PROBABILITY AND NON-PROBABILITY SAMPLING

·         Probability Sampling – sampling procedure in which every element in the population has known non-zero chance of being included in the sample

Common Methods of Probability Sampling

·         Simple Random Sampling (SRS)

-          Sampling procedure in which every element in the population has an equal chance of being included in the sample

-          Select n units out of N units in the population

-          The selection is through lottery or the use of the table of random numbers

·         Stratified Random Sampling

-          The population of N units into subpopulations called strata which consists of more or less homogeneous units

-          Perform simple random from each strata, the selection of which is independent in different strata

-          Requires a so-called “stratification variable

Example: Consider a population consisting of all UB students

§  Stratify the population by colleges (stratification variable)

 Commerce
 Education
 Liberal Arts
 Nursing
 Eng’g

§  Stratify the population by year level (stratification variable).

 I
 V
 IV
 III
 II

·         Systematic Sampling

-          Done with a random start

-          Select the sample by taking every kth unit from an ordered population

-          k is called the sampling interval and 1/k is the sampling fraction

Example: Suppose we select n=12 students from a population of N=50.

To employ systematic sampling, divide N by n to get k, that is

K = N/n = 50/12 = 4.667 ≈ 5

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

We chose every 5th unit. Thus if the random start r = 9th unit, then the sample comprise of students number

9,         14,       19,       24,       29,       34,       39,       44,       49 and 4.

·         Cluster Sampling

-          A variation of stratified sampling where the strata correspond to clusters, however with heterogeneous characteristics or attributes.

Example: A college unit may be considered as one cluster.

·         Multi-stage Sampling

-          A sampling procedure wherein the population is divided into a sequence of sampling units corresponding to the different sampling stages.

Common Methods of Non-Probability Sampling

·         Purposive Sampling

·         Quota Sampling

·         Convenience Sampling

PRESENTATION OF DATA

Data can be represented in various modes such as the textual, tabular and graphical displays.

·         Textual

-          Here, the data are presented by use of texts, phrases or paragraphs

-          Very common among newspaper stories, depicting only the salient findings

·         Tabular

-          This is a more reliable and effective way of showing relationships or comparisons of data through the use of tables

-          In many cases, the tables are accompanied by a short narrative explanation to make the facts clearer and more understandable

·         Graphical

-          Most effective way of presenting statistical findings by use of statistical graphs

-          Attracts attention to the reader

-          Sets the tone for clearer interpretation of findings especially in making comparisons

Common Form of Graphs

·         Bar Graph – uses rectangular bars the length of which represents the quantity or frequency for each type category

Example: We are given the ff. enrolment data of XYZ College for the Academic Year 1995-1996

 Year Level Males Females Total Percentage First Year 682 435 1117 10.17 Second Year 496 536 1032 27.88 Third Year 314 419 733 19.80 Fourth Year 435 385 820 22.15 Total 1927 1775 3702 100.00

We can draw the bar graph by placing the year levels on the horizontal axis and the frequency on the vertical axis as follows:

e

n

r

o

l

m

e

n

t

(Year Level)

·         Multiple Bar Graph     – useful when the researcher wants to compare figures

– uses legend to guide the viewer in analyzing the data

Example: For the same data set, the multiple bar graph is shown below.

·         Pie Chart         - used to present quantities that make up a whole

- the slices of the pie are drawn in proportion to the different values each class, item, group or category

- the total area of the pie is 100%

·         Line Chart – especially useful in showing trends over a period of time

Example: The following data shows the number of student-dropouts of UB from 1995-1999

 Year Number of Student-Dropouts (Male) Number of Student-Dropouts (Female) 1995 40 26 1996 38 43 1997 53 55 1998 48 32 1999 62 49

The line chart of the above data set is shown below:

THE FREQUENCY DISTRIBUTION TABLE

Data in its original form and structure are called raw data.

Example: Consider the following final exam scores of 40 students

 79 83 84 62 62 43 72 48 46 59 93 64 59 32 54 45 55 45 76 72 40 51 72 83 49 62 85 74 40 74 49 65 38 85 77 63 38 43 63 69

When these scores are arranged either in ascending or descending magnitude, then such an arrangement is called an array. It is usually helpful to put the raw data in an array because it is easy to identify the extreme values or the values where the scores most cluster.

When these data are placed into a system wherein they are organized, then these partake the nature of grouped data. This procedure of organizing data into groups is called a frequency distribution table (FDT).

Example: The following presents a frequency distribution table of the scores of seventy-five Statistics students.

 Scores Frequency 10-19 5 20-29 14 30-39 23 40-49 22 50-59 11 75

Some Basic Terms

·         Class Interval              - the numbers defining the class

- consists of the end numbers called the class limits namely the upper limit and the lower limit

·         Class Frequency          - shows the number of observation falling in the class

·         Class Boundaries        - these are true class limits

- LOWER CLASS BOUNDARY (LCB) is defined as the middle value of the lower class limits of the class and the upper class boundary of the preceding class

- UPPER CLASS BOUNDARY (UCB) is the middle value between the upper class limits of the class and the lower limit of the next class

·         Class size                     - the difference between the upper limits of the class and the preceding class

·         Class Mark                  - midpoint of a class interval

·         Open-ended class        - one which has no lower limit or upper limit

·         Cumulative frequency - shows the accumulated frequencies of successive classes

- GREATER THAN CF (

- LESS THAN CF (

Example:

 Class Interval Freq LCB UCB CM >CF 10-19 5 9.5 19.5 14.5 5 75 20-29 14 19.5 29.5 24.5 19 70 30-39 23 29.5 39.5 34.5 42 56 40-49 22 39.5 49.5 44.5 64 33 50-59 11 49.5 59.5 54.5 75 11

Steps in Constructing a Frequency Distribution Table (FDT)

Step 1: Determine the number of classes. For first approximation, it is suggested to use the STURGES APPROXIMATION FORMULA.

K = 1 + 3.322 log n

where

K = approximate number of classes

n = number of cases

Step 2: Determine the ranges R.

R = maximum value – minimum value

Step 3: Determine the approximate size, i using the formula.

i = R/K

It is usually convenient to round off i to a convenient number.

Step 4: Determine the lowest class limit (or the first class). This class should include the minimum value in the data set.

Step 5: Determine all class limits by adding the class size i to the limits of the previous class.

Step 6: Tally the scores/observations falling in each class.

Example: Constructing the FDT of the final exam scores of 40 students.

Step 1:             K        =           1 + 3.322 log n

=           1 + 3.322 log (40)

=           1 + 3.322 (1.60205)

=           1 + 5.32204

=           6.3220

K       =           6

Step 2:             R        =           max – min

=           93 – 32

R        =           61

Step 3:             i          =           R/K

=           61/6

=           10.167

i          =           10

Step 4: Let us decide to start at the minimum value. Thus, the lowest class is the class 32 – 41

Step 5: The classes are constructed by adding 10 to each class limit

32 – 41

42 – 51

52 – 61

62 – 71

72 – 81

82 – 91

92 – 101

Step 6: Determine now the frequency of each class by tallying the scores falling in each class.

 Classes Tally Frequency 32 – 41 | | | | 5 42 – 51 | | | |   | | | | 9 52 – 61 | | | | 4 62 – 71 | | | |   | | | 8 72 – 81 | | | |   | | | 8 82 – 91 | | | | 5 92 – 101 | 1 n = 40

We now proceed constructing the complete frequency distribution table.

 Classes Frequency LCB UCB CM >CF 32 – 41 5 31.5 41.5 36.5 5 40 42 – 51 9 41.5 51.5 46.5 14 35 52 – 61 4 51.5 61.5 56.5 18 26 62 – 71 8 61.5 71.5 66.5 26 22 72 – 81 8 71.5 81.5 76.5 34 14 82 – 91 5 81.5 91.5 86.5 39 6 92 – 101 1 91.5 101.5 96.5 40 1

Graphs Associated with the Frequency Distribution Table

·                     Histogram – a bar graph in which the class boundaries are plotted (on the horizontal axis) against the class frequencies (on the vertical axis)

The following depicts the histogram of the frequency distribution table of the final exam scores of 40 students.

Frequency

Class Boundaries

·                     Frequency Polygon     – a line chart that is constructed by plotting the class marks against the class frequencies.

– The graph is obtained by connecting the consecutive points by use of straight lines.

– The polygon is closed by adding additional classmarks at each end with a frequency of zero.

Frequency

Class Marks

(Scores)

·                     Ogives – graphs associated with cumulative frequencies

Cumulative Frequency

Class Boundaries

(Class Scores)

·                     Ogive – graph where the > CF is plotted against the LCB

Cumulative Frequency

Class Boundaries

(Class Scores)

MEASURES OF CENTRAL TENDENCY

Measures of Central Tendency

·                     Show the centrality of the data

·                     Measures of the average

·                     Common measures are the mean, the median and the mode

THE MEAN

Ø   For Ungrouped Data

·                  Population Mean (µ)

_________

N

·                  Sample Mean ( X )

_________

n

Example: The following are scores of 10 sample students:

43,      32,       72,       31,       28

25,      45,       38,       42,       38

The sample mean X is computed as:

43 + 32 + 72 + 31 + 28 + 25 + 45 + 38 + 42 + 38                  394

X = ----------------------------------------------------------------         =          -----      =          39.4

10                                                        10

Ø  Approximate the mean for Grouped Data

_________

n

Where:             X         =          sample mean

fi             =          frequency of the ith class

xi         =          midpoint of the ith class

n          =          number of cases

Example:       Consider the FDT scores of 75 Statistics students

 Classes Frequency 10 – 19 5 20 – 29 14 30 – 39 33 40 – 49 22 50 – 59 11

Then to compute for X, we construct the following table

 Classes frequency (fi) Classmark (xi) fixi 10 – 19 5 14.5 72.5 20 – 29 14 24.5 343 30 – 39 23 34.5 793.5 40 – 49 22 44.5 979 50 – 59 11 54.5 599.5 n = 75 ∑fixi = 2787.5

Thus,

_________

n

= 2787.5 / 75

Properties of the Mean

·         The mean is the most widely used measure, which applies only to interval/ratio data.

·         It is affected by extreme values.

·         Since it is a calculated average, and its value is determined in every observation, then the mean may not be the actual value or number in the data set.

·         The sum of the deviations about the mean is zero.

·         If a constant k is added (or subtracted) to every observation in the data set, then the mean of the new data set increases (or decreases) by the same constant k.

·         If a constant k is multiplied to every observation in the data set, the mean of the new data set is a constant multiple of the original mean.

THE MEDIAN

Ø  For Ungrouped Data

-          Denoted by Me

-          The middle most value when the observation are arranged either in ascending or descending order

-          If a data set has an even number of observations, then the median is the average (the Mean) of the two most middle values.

Examples:

a.)    Consider the ff. set of scores

12, 34, 45, 72, 38, 49, 65

Putting the scores in an array;

12, 34, 38, 45, 49, 65, 72

Observe that Me = 45

b.)    Consider the next ff. set of score

8, 16, 24, 7, 21, 17, 19, 18, 5, 26

Putting the scores in an array;

5, 7, 8, 16, 17, 18, 19, 21, 24, 26

Observe that the two middle most values are 17 and 18. Thus, the median is

17+18     35

Me = ---------- = ---- = 17.5

2           2

Ø  Approximating the median for Grouped Data

where:

LCBme             =          lower class boundary of the median class

c                      =          class size

n                      =          number of cases

me-1            =          less than cumulative frequency of the class preceding the median class

fme                    =          frequency of the median class

median class    =          class where the

Example:         Consider again the following FDT:

 Classes Frequency 10 – 19 5 20 – 29 14 30 – 39 23 40 – 49 22 50 – 59 11 n = 75

To approximate the median, we construct first the

 Classes Frequency 10 – 19 5 5 20 – 29 14 19 30 – 39 23 42 Median Class 40 – 49 22 64 50 – 59 11 75 n=75

Then we locate next the median class. As defined, the median class is the class where the

Thus,

LCBme            =           29.5

i                      =           10

n/2                  =           37.5

me-1              =           19

fme                        =           23

Substituting these values to the formula, we obtain:

Properties of the Median

• The median is an ordinal and positional measure
• Not affected by extreme values compared to the mean

THE MODE

Ø  For Ungrouped Data

·         Denoted by Mo

·         Value in the data set has the highest frequency

Example: Consider the ff. data set whose values are IQ scores

Data Set A      :           77, 83, 91, 85, 83, 100

Data Set B      :           88, 92, 71, 88, 71, 36

Data Set C      :           96, 43, 79, 68, 83, 110

Notice that the mode for Data Set A is 83; the modes for Data Set B are 88 and 71 and we say that this data set is bi-modal. Data Set C does not have a mode.

Ø  Approximating the Mode for Grouped Data

a.

b.

where:

LCBme             =          lower class boundary of the median class

i                       =          class size

fme                    =          frequency of the modal class

fi                      =          frequency of the class preceding the modal class

f2                            =          frequency of the class following the modal class

Example: Consider again the following FDT:

 Classes Frequency 10 – 19 5 20 – 29 14 30 – 39 23 Modal Class 40 – 49 22 50 – 59 11 n=75

To approximate the mode, let us first locate the modal class, the class which has the highest frequency. Thus, 30 – 39 is the modal class and

LCBme             =          29.5

i                       =          10

fme                    =          23

fi                      =          14

f2                            =          22

Substituting the values we obtain:

1. Crude Mode – rough

a.

b.

1. Refined Mode

For larger cases, n ≥ 100

a.       Mo = 3Me - 2

b.      Mo =  - 3 (  - Me)

Other Kinds of Mean

I.                   Weighted Mean (WM) = ∑xw / N

II.                Geometric Mean, (GM)

·         Used to derived the average of indexes, relative values and percentages

·         nth root of the product of n number of values

·         antilog of the logarithms of the middle values multiplied by the frequencies divided by the total frequency distribution

GM            =

= (X1 ∙ X2 ∙ X3 ∙ ∙ ∙Xn)1/n

GM            =

III.             Harmonic Mean, (HM)

·         Used for spatial measurements, lengths, areas and volumes.

·         The reciprocal of the arithmetic mean of the reciprocals of the values.

HM                  =

HM                  =

Example: 10, 15, 8, 10, 13, 12, 10, 14

Solve the following:

1.      GM            = (10 ∙ 5 ∙ 8 ∙ 10 ∙ 13 ∙ 12 ∙ 10 ∙ 14)1/8

= (262080000)1/8

= 11.28

GM            =

2.      HM            =

=

=

= 11.06

MEASURES OF DISPERSION OR VARIABILITY

·         Measure of Dispersion – indicate how the data are dispersed or scattered about the average

·         Classifications:

o   Measures of Absolute Dispersion

-          Expressed in the original units of the original observations

THE RANGE

-          Difference between the largest and the smallest values

-          Maximum value minus minimum value

-          For grouped data, the range is defined as the difference between the upper class limit of the highest class and the lower class limit of the lowest class

-          Simplest/roughest measure

Example:                     Consider the FDT

 Classes Frequency 10 – 19 5 20 – 29 14 30 – 39 23 40 – 49 22 50 – 59 11 n = 75

The Range                   R = 59 – 10 = 49         (for discrete)

R = 59.5 – 9.5 = 50     (for continuous)

Properties of the Range

·         It is a weak measure because it tells only the extreme values and does not provide information on the values between

·         It is greatly affected by outliers

·         For open-ended frequency distributions, the range cannot be computed

THE STANDARD DEVIATION

Ø  For Population

Ø  For Sample

Example: Consider the following scores of 5 students taken as samples:

8,         6,         3,         4,         4

Then  = 8 + 6 + 3 + 4 + 4 = 25 = 5

5                  5

Thus,

 x1 x1 (x1 2 8 5 3 9 6 5 1 1 3 5 -2 4 4 5 -1 1 4 5 -1 1 ∑(x1 2 = 6

The Standard deviation is therefore:

S =

S =

S =

S =

Computational Formula for the Sample Standard Deviation for Ungrouped Data

Example: To verify the computed standard deviation in the previous sample, we have

 x1 xi2 8 64 6 36 3 9 4 16 4 16 ∑x1 = 25 ∑x12 = 141

Thus,

s = 2

Ø  Approximating the Standard Deviation for Grouped Data

where

f1 = frequency of the ith class

x1 = classmark of the ith class

x = mean of the frequency distribution

n = number of cases

Example: Consider again the FDT:

 Classes Frequency 10 – 19 5 20 – 29 14 30 – 39 33 40 – 49 22 50 – 59 11 n = 75

The mean of this FDT was computed to be x = 37.17. The standard deviation is computed as follows:

 Classes fi xi (x1- ) (x1- )2 f1(x1- )2 10 – 19 5 14.5 37.17 -22.67 513.9289 2569.6445 20 – 29 14 24.5 37.17 -12.67 160.5289 2247.4046 30 – 39 23 34.5 37.17 -2.67 7.1289 163.9647 40 – 49 22 44.5 37.17 7.33 53.7289 1182.0358 50 – 59 11 54.5 37.17 17.33 300.3289 3303.6179

∑f1(x1- ) 2 = 9466.6675

Hence,

Ø  Computational Formula for Approximating the Standard Deviation for Grouped Data

Properties of Standard Deviation

·         Since it is a function of the mean, the standard deviation is affected by every value of the data set. Thus, it is sensitive against the presence of few extreme values.

·         If each observation of a data set is added or subtracted by the same amount k, then the standard deviation of the new data set is the same as the standard deviation of the original data set.

·         If each of the data set is multiplied by a constant k, then the standard deviation of the new data set is equal to k times the standard deviation of the standard deviation of the original data set.

OTHER MEASURES OF DISPERSION

·         Coefficient of Variation (CV)

-          Ratio of the standard deviation to its mean

-          Especially useful when one compares the variability of one data set with another data set having different units

CV = s/   x 100%

·         Index of Qualitative Variation (IQV)

-          Dispersion measures for qualitative nominal or ordinal variable

-          If all the values f a variable are in one category, then there is no variation and IQV = 0

-          If all the values are distributed evenly across the categories, IQV = 1, maximum value

IQV = k(N2 - ∑f2) / N2 (k-1)

where

k = number of categories

N = number of observations

MEASURES OF SKEWNESS

-          Shows the degree of asymmetry or departure from symmetry of a distribution

-          Indicates also the direction of skewness

§  Types of Skewness

o   Positively Skewed

§  Longer tail to the right

§  More concentration of values below than above the mean

§  This happens when  > Me > Mo

Mo Me

Example: If a teacher gives a very hard exam, then one can expect that the distribution of scores will be positively skewed.

o   Negatively Skewed

§ Longer tail to the left

§ More concentration of values above than below the mean

§ This happens when  < Me < Mo

Me Mo

Example: A very easy exam will result to a distribution of scores which is negatively skewed.

Ø  Pearson’s Coefficients of Skewness

1.      sk=x-Mox

2.      sk=3(x-Me) SD

where

x           = mean

Mo       = mode

Me       = median

SD       = standard deviation

We prefer to use Formula 2 over Formula 1 in as much as the mode does not always exist.

Example: Recall the Statistics obtained from the FDT of scores of fifteen statistics students.

x           = 37.17

Me       = 37.54

s           = 11.31

sk=3(x-Me)s

sk=3(37.17-37.54)11.31

sk=3(-0.37)11.31

sk=-1.1111.31

sk=-0.0981

Thus, we say that the distribution of scores is skewed to the left.

MEASURES OF KURTOSIS

Ø  Describes the relative flatness or peakness of a distribution

·         Platykurtic – relatively flat

·         Leptokurtic – usually peaked

·         Mesokurtic – is between the platykurtic and leptokurtic curves; approaches or look like the “normal curve”

Ø  Pearson’s Coefficient of Kurtosis

Ku=x1- x 4nSD4

where

x1         =          observations in a data set

x          =          the sample mean

n          =          no. of cases

SD       =          standard deviation

Observe that if:

Ku < 3, then the distribution is platykurtic

Ku = 3, then the distribution is mesokurtic

Ku > 3, then the distribution is leptokurtic

compilation of Dr. Libot

emerjoytvale
Joined: August 12th, 2010
Articles Posted: 10

## Related Articles

Also See: Yes Yes, Student Dropouts, Probability Sampling, Frequency Distribution, Yes, Year, Variable

More articles at Topsitenet.com