STATISTICS MADE EASY

Posted by emerjoytvale in Society on August 12th, 2010

THE STUDY OF STATISTICS

 

WHY STATISTICS?

·        important in empirical studies

·        aids in decision making

·        helps to forecast or predictfuture outcomes

·        estimates unknown values

·        aids in making inferences,comparison or establishing relationship

·        summarizes data

 

DEFINITIONS

·        (plural sense) – a set ofnumerical data or observations

Ex.

o  Vital statistics in a beautypageant

o  Yearly income

o  Monthly expenses

o  First semester grades

·        (singular sense) – a branch ofscience which deals with the collection, presentation, analysisand interpretationof data.

 

MAIN DIVISION OFSTATISTICS

·        DescriptiveStatistics – pertains tothe methods dealing with the collection, organization and analysis of a set ofdata without making conclusions, predictions or inferences about a larger set.

o  the main goal is simply toprovide a description of a particular data set

o  the conclusions or theimportant characteristics apply only to the data on hand

·        Inferential Statistic – pertains to the methods dealing withmaking inferences, estimation or prediction about a larger set of data usingthe information gathered from a subset of this larger set

o  the main goal is not merely toprovide a description of a particular data set but also to make prediction and inferencesbased on the available information gathered

o  the conclusions or theimportant characteristics apply to a larger set from which the data on hand isonly a subset

 

Collection of Data – refers to the method of data gathering

Presentationof Data – refers to the process of organizing data suchas tabulation, presenting through the use of charts, graph or paragraphs

Analysisof Data – refers to the methods of obtaining necessary,relevant and noteworthy information from the given data

Interpretationof Data – refers to the tools of drawing of conclusionsor inferences from the analyzed data

 

 

 

Examples:


Descriptive Statistics

·        A college dean wants todetermine the average semestral enrolment in the past 5 school years.

·        An instructor wants to know theexact number of students who pass in his subject.

·        A school president wants todetermine the number of student-dropouts for this current school year

 

 

InferentialStatistics

·        A college dean wants toforecast the average semestral enrolment based on the enrolment for the last 5school years

·        An instructor would like topredict the number of students who will pass in his subject based on the numberof failures last year

·        A school president would liketo estimate the number of student dropouts next school year based on thecurrent data.


 

 

 

DESCRIPTIVE STATISTICS

 

POPULATION AND SAMPLE

 

·        Population – collection of allcases in which the researcher is interested in a statistical study

-         the entity that the researcherwished to understand

 

Examples:

·        All subjects of University ofBohol

·        All department heads in UB

 

·        Sample – a portion or a subset ofthe population from which the information is gathered

 

-         a representation of thepopulation

 

Examples:

·        All students of University ofBohol  coming from the rural areas

·        All department heads in UB whohave finished Ph.D. Degree

 

·        Parameter – a numerical characteristicof a population

-         denoted by small Greek letters

 

Examples:

·        ΅ - population mean

·        σ – population standard deviation

 

·        Statistic – a numericalcharacteristic of a sample

-         Denoted by lower case lettersof the English alphabet

Examples:

·        X – sample mean

·        SD – sample standard deviation

 

 

TYPESOF DATA

 

·        Variable – a characteristic orattribute of persons or objects which assumes different values or label

·        Measurement – process ofassigning the value or label of a particular variable for a particularexperimental unit

·        Experimental unit – the person orthe object on which a variable is measured

 

·        Classification of Variables

Ψ Qualitative Variable – yieldscategorical or qualitative responses

Examples:       Civil Status(Single, Married, Widow, etc.)

                        ReligiousAffiliation (Catholic, Protestant, etc.)

 

Ψ Quantitative Variable – yieldsnumerical responses representing an amount or quantity

Examples:       Height,Weight, no. of children

 

Typesof Quantitative Variable

Ψ Discrete Variable – assumesfinite or countable infinite values such as 1,2,3,…

Example:         no. ofchildren (0,1,2,3,…)

                        no. ofstudent-dropouts

 

Ψ Continuous Variable – cannot takeon finite values but the values are related/associated with points on aninterval of the real line

Examples:       Height (5’4”)

                        Weight (130.42kilos)

                        Temperature(32.5°C)

 

Levelsof Measurement

 

Ψ  NOMINAL LEVEL

-         Crudest form of measurement

-         Numbers or symbols are used forthe purpose of categorizing subjects into groups

-         The categories are mutuallyexclusive, that is being in one category automatically excludes inclusion inanother

-         The categories are exhaustive,that is all possible categories of a variable should be included

Examples:       Sex:                             1 – Male                      0 – Female

                        Faculty Tenure:           1 – Tenured                 0 – Non-Tenured

 

Ψ  ORDINAL LEVEL

-         Improvement of nominal level

-         Order/rank the data in asomewhat “bottom to top” or “low to high” manner

Examples:       ClassStanding (Excellent, Good, Poor)

                        Teacher Evaluation     1 – Poor

                                                            2– Fair

                                                            3– Good

                                                            4– Very Good

 

Ψ  INTERVAL LEVEL

-         Possesses the properties of thenominal and ordinal levels

-         Distances between any twonumbers on the scale are known

-         Does not have a stable startingpoint (an absolute zero)

Example: Consider the IQ scores of four students

                  70, 140, 75 and 145

 

Here,we can say that the difference between 70 and 140 is the same as thedifferences between 75 and 145 but we cannot claim that the second student istwice as intelligent as the first.

 

Ψ  RATIO LEVEL

-         Possesses all the properties ofthe nominal, ordinal and interval levels and in addition, this has or absolutezero point

-         We can classify it, place it inproper order

-         We can also compare magnitudes

Examples:             Age, Income, ExamScores

 

 

SUMMARY CHARACTERISTICS OF LEVELS

OF MEASUREMENT

            Levels of Measurement           Classify           Order               EqualLimits    Absolute Zero

 

            NOMINAL                             Yes                  No                   No                   No

            ORDINAL                             Yes                  Yes                  No                   No

            INTERVAL                            Yes                  Yes                  Yes                  No

            RATIO                                    Yes                  Yes                  Yes                  Yes

 

 

·        Other Classification of Data

Ψ  Raw Data                    -data in their original form and structure

Ψ  Grouped Data             - data placed in tabular form

Ψ  Primary Data              - measured and gathered by the researcher that published it

Ψ  Secondary Data          - anyrepublication of data by another researcher or agency

 

 

METHODSOF DATA COLLECTION

 

·        Observation Method

-         Data can be obtained byobserving the behavior of persons or objects but only at a particular time ofoccurrence

-         The data obtained is called anobservational data

 

·        Experimental Method

-         Especially useful when onewants to collect data for cause and effects studies

-         There is actual humaninterference with the conditions or situations that can affect the variableunder study

-         Prevalent in scientific researches

 

·        Use of Existing Studies

-         CHED or DECS enrollment data

-         Census Data

 

·        Registration Method

-         Respondents provide thenecessary information in observance and compliance with existing laws

-         Our registration, birthregistration, student registration, voter’s registration

 

·        Survey Method

-         The desired information isobtained through asking questions

 

 

CommonForms of Survey Method

 

Ψ  Personal InterviewMethod

-         There is a person-to-personcontact between the interviewer and the interviewee

-         Considered as one of the mosteffective methods of data collection because accurate and precise informationcan be directly obtained and verified from the respondents

-         Higher response rate

-         Can be administered to therespondents one at a time

Ψ  Questionnaire Method

-         Considered the easiest methodof data collection

-         Utilizes an instrument which isthe questionnaire as a tool

-         Lower response rate

-         Can be administered to a largenumber of respondents simultaneously

 

GeneralClassification of Data Collection

Ψ  Census or CompleteEnumeration

-         Method of gathering data fromevery unit in the population

-         Not always possible because ofmoney, time and effort

Ψ  Survey Sampling

-         Method of gathering data fromevery unit in the selected sample

-         Reduces cost, greater speed,scope and accuracy

 

 

PROBABILITYAND NON-PROBABILITY SAMPLING

·        Probability Sampling – samplingprocedure in which every element in the population has known non-zero chance ofbeing included in the sample

 

CommonMethods of Probability Sampling

·        Simple Random Sampling (SRS)

-         Sampling procedure in whichevery element in the population has an equal chance of being included in the sample

-         Select n units out of N unitsin the population

-         The selection is throughlottery or the use of the table of random numbers

 

·        Stratified Random Sampling

-         The population of N units intosubpopulations called strata which consists of more or less homogeneous units

-         Perform simple random from eachstrata, the selection of which is independent in different strata

-         Requires a so-called “stratification variable”

 

Example: Consider a population consisting of all UB students

§  Stratify the population by colleges (stratification variable)

 

Commerce

Education

Liberal Arts

Nursing

Eng’g

 

 

§  Stratify the population by year level (stratificationvariable).

 

I

V

IV

III

II

 

 

 

·        Systematic Sampling

-         Done with a random start

-         Select the sample by takingevery kth unit from an ordered population

-         k is called the samplinginterval and 1/k is the sampling fraction

 

Example: Suppose we select n=12 students from a population of N=50.

            To employ systematicsampling, divide N by n to get k, that is

 

 

K = N/n = 50/12 = 4.667 ≈ 5

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

 

 

We chose every 5th unit. Thus if the random start r = 9thunit, then the sample comprise of students number

9,         14,       19,       24,       29,       34,       39,       44,       49 and 4.

 

·        Cluster Sampling

-         A variation of stratified samplingwhere the strata correspond to clusters, however with heterogeneouscharacteristics or attributes.

 

Example: A college unit may be considered as one cluster.

 

·        Multi-stage Sampling

-         A sampling procedure whereinthe population is divided into a sequence of sampling units corresponding tothe different sampling stages.

 

Common Methods of Non-Probability Sampling

 

·        Purposive Sampling

·        Quota Sampling

·        Convenience Sampling

 

 

PRESENTATION OF DATA

 

            Data can be represented in variousmodes such as the textual, tabular and graphical displays.

 

·        Textual

-         Here, the data are presented byuse of texts, phrases or paragraphs

-         Very common among newspaperstories, depicting only the salient findings

·        Tabular

-         This is a more reliable andeffective way of showing relationships or comparisons of data through the useof tables

-         In many cases, the tables areaccompanied by a short narrative explanation to make the facts clearer and moreunderstandable

·        Graphical

-         Most effective way ofpresenting statistical findings by use of statistical graphs

-         Attracts attention to thereader

-         Sets the tone for clearerinterpretation of findings especially in making comparisons

 

CommonForm of Graphs

·        Bar Graph – uses rectangular barsthe length of which represents the quantity or frequency for each type category

Example: We are given the ff. enrolment data of XYZ College for the AcademicYear 1995-1996

 

 

 

 

 

 

Year Level

Males

Females

Total

Percentage

First Year

682

435

1117

10.17

Second Year

496

536

1032

27.88

Third Year

314

419

733

19.80

Fourth Year

435

385

820

22.15

 

 

 

 

 

Total

1927

1775

3702

100.00

 

            We can draw the bar graph by placingthe year levels on the horizontal axis and the frequency on the vertical axisas follows:

 

e

n

r

o

l

m

e

n

t


(Year Level)

·        Multiple Bar Graph     – useful when the researcher wants tocompare figures

– uses legend to guide the viewer inanalyzing the data

            Example: For the same dataset, the multiple bar graph is shown below.

 

 

·        Pie Chart         - used to present quantities that makeup a whole

- the slices of the pie are drawn in proportion to the different values each class,item, group or category

- the total area of the pie is 100%

 

·        Line Chart – especially useful inshowing trends over a period of time

Example: The following data shows the number of student-dropouts of UB from 1995-1999

 

 

Year

Number of Student-Dropouts

(Male)

Number of Student-Dropouts

(Female)

1995

40

26

1996

38

43

1997

53

55

1998

48

32

1999

62

49

 

 

Theline chart of the above data set is shown below:

 

 

THE FREQUENCY DISTRIBUTIONTABLE

 

Datain its original form and structure are called raw data.

 

Example: Consider the following final exam scores of 40 students

 

79

83

84

62

62

43

72

48

46

59

93

64

59

32

54

45

55

45

76

72

40

51

72

83

49

62

85

74

40

74

49

65

38

85

77

63

38

43

63

69

            When these scores are arrangedeither in ascending or descending magnitude, then such an arrangement is calledan array. It isusually helpful to put the raw data in an array because it is easy to identifythe extreme values or the values where the scores most cluster.

            When these data are placed into asystem wherein they are organized, then these partake the nature of groupeddata. This procedure of organizing data into groups is called a frequency distribution table (FDT).

 

Example: The following presents a frequency distribution table of the scoresof seventy-five Statistics students.

 

Scores

Frequency

10-19

5

20-29

14

30-39

23

40-49

22

50-59

11

 

75

 

Some Basic Terms

 

·        Class Interval              - the numbers defining the class

- consists of the end numberscalled the class limits namely the upper limit and the lower limit

·        Class Frequency          - shows the number of observationfalling in the class

·        Class Boundaries        - these are true class limits

                                    -LOWER CLASS BOUNDARY (LCB) is defined as the middle value of the lower classlimits of the class and the upper class boundary of the preceding class

                                    -UPPER CLASS BOUNDARY (UCB) is the middle value between the upper class limitsof the class and the lower limit of the next class

·        Class size                     - the difference between the upper limits of theclass and the preceding class

·        Class Mark                  - midpoint of a class interval

·        Open-ended class        - one which has no lower limit or upperlimit

·        Cumulative frequency - shows the accumulated frequencies ofsuccessive classes

                                          - GREATERTHAN CF (

                                          - LESSTHAN CF (

 

 

 

 

 

 

 

Example:

 

Class Interval

Freq

LCB

UCB

CM

>CF

10-19

5

9.5

19.5

14.5

5

75

20-29

14

19.5

29.5

24.5

19

70

30-39

23

29.5

39.5

34.5

42

56

40-49

22

39.5

49.5

44.5

64

33

50-59

11

49.5

59.5

54.5

75

11

 

 

Steps in Constructing aFrequency Distribution Table (FDT)

 

Step 1: Determine the number of classes. For first approximation, itis suggested to use the STURGES APPROXIMATION FORMULA.

 

                                   K= 1 + 3.322 log n

                        where

                                   K= approximate number of classes

                                   n= number of cases

 

Step 2: Determine the ranges R.

                                   R= maximum value – minimum value

 

Step 3: Determine the approximate size, i using the formula.

                                   i = R/K

                        It isusually convenient to round off i to a convenient number.

 

Step 4: Determine the lowest class limit (or the first class). Thisclass should include the minimum value in the data set.

 

Step 5: Determine all class limits by adding the class size i tothe limits of the previous class.

 

Step 6: Tally the scores/observations falling in each class.

 

 

 

 

 

 

 

 

 

 

 

 

 

Example: Constructing the FDT of the final exam scores of 40 students.

 

Step 1:             K        =           1+ 3.322 log n

                                   =           1 + 3.322 log (40)

                                   =           1 + 3.322 (1.60205)

                                   =           1 + 5.32204

                                   =           6.3220

                        K       =           6

 

Step 2:             R        =           max– min

                                   =           93 – 32

                        R        =           61

 

Step 3:             i          =           R/K

                                   =           61/6

                                   =           10.167

                        i          =           10

 

Step 4: Let us decide to start at the minimum value. Thus, thelowest class is the class 32 – 41

 

Step 5: The classes are constructed by adding 10 to each class limit

32 – 41

42 – 51

52 – 61

62 – 71

72 – 81

82 – 91

92 – 101

 

Step 6: Determine now the frequency of each class by tallying thescores falling in each class.

 

Classes

Tally

Frequency

32 – 41

 | | | |

5

42 – 51

| | | |   | | | |

9

52 – 61

| | | |

4

62 – 71

| | | |   | | |

8

72 – 81

| | | |   | | |

8

82 – 91

| | | |

5

92 – 101

|

1

 

 

n = 40

 

 

 

 

 

 

We now proceed constructing the complete frequency distributiontable.

 

Classes

Frequency

LCB

UCB

CM

>CF

32 – 41

5

31.5

41.5

36.5

5

40

42 – 51

9

41.5

51.5

46.5

14

35

52 – 61

4

51.5

61.5

56.5

18

26

62 – 71

8

61.5

71.5

66.5

26

22

72 – 81

8

71.5

81.5

76.5

34

14

82 – 91

5

81.5

91.5

86.5

39

6

92 – 101

1

91.5

101.5

96.5

40

1

 

 

Graphs Associated withthe Frequency Distribution Table

 

·                    Histogram – a bar graph inwhich the class boundaries are plotted (on the horizontal axis) against theclass frequencies (on the vertical axis)

 

            The followingdepicts the histogram of the frequency distribution table of the final exam scoresof 40 students.

                Frequency

Class Boundaries

 

·                    Frequency Polygon     – a line chart that is constructed byplotting the class marks against the class frequencies.

– The graph is obtained by connecting the consecutive points by useof straight lines.

– The polygon is closed by adding additional classmarks at each endwith a frequency of zero.

                Frequency

Class Marks

(Scores)

 

 

·                    Ogives – graphs associated withcumulative frequencies

 

Cumulative Frequency

Class Boundaries

(Class Scores)

 

 

·                    Ogive – graph where the > CFis plotted against the LCB

 

Cumulative Frequency

ClassBoundaries

(Class Scores)

 

 

MEASURES OF CENTRALTENDENCY

 

Measures of CentralTendency

·                    Show the centrality of the data

·                    Measures of the average

·                    Common measures are the mean,the median and the mode

 

THE MEAN

 

Ψ  For Ungrouped Data

·                 Population Mean (΅)

 

      _________

           N

 

 

·                 Sample Mean ( X )

 

      _________

           n

 

Example: The following are scores of 10 sample students:

43,      32,       72,       31,       28

25,      45,       38,       42,       38

 

The sample mean X is computed as:

 

           

            43 + 32 + 72 + 31 + 28 + 25 + 45 +38 + 42 + 38                  394

X =----------------------------------------------------------------         =          -----      =          39.4

                                                10                                                        10

 

 

Ψ Approximate the mean forGrouped Data

 

      _________

           n

 

Where:             X         =          sample mean

                        fi             =          frequency of the ith class

                        xi         =          midpointof the ith class

                       n          =          numberof cases

 

Example:       Consider theFDT scores of 75 Statistics students

 

Classes

Frequency

10 – 19

5

20 – 29

14

30 – 39

33

40 – 49

22

50 – 59

11

 

Then to compute forX, we construct the following table

 

Classes

frequency (fi)

Classmark (xi)

fixi

10 – 19

5

14.5

72.5

20 – 29

14

24.5

343

30 – 39

23

34.5

793.5

40 – 49

22

44.5

979

50 – 59

11

54.5

599.5

 

n = 75

 

∑fixi = 2787.5

 

 

Thus,

 

      _________

           n

                       

                  = 2787.5 / 75

 

 

 

Properties of the Mean

·        The mean is the most widelyused measure, which applies only to interval/ratio data.

·        It is affected by extremevalues.

·        Since it is a calculatedaverage, and its value is determined in every observation, then the mean maynot be the actual value or number in the data set.

·        The sum of the deviations aboutthe mean is zero.

·        If a constant k is added (orsubtracted) to every observation in the data set, then the mean of the new dataset increases (or decreases) by the same constant k.

·        If a constant k is multipliedto every observation in the data set, the mean of the new data set is aconstant multiple of the original mean.

 

THE MEDIAN

Ψ For Ungrouped Data

-         Denoted by Me

-         The middle most value when theobservation are arranged either in ascending or descending order

-         If a data set has an evennumber of observations, then the median is the average (the Mean) of the twomost middle values.

Examples:

a.)   Consider the ff. set of scores

            12, 34, 45, 72, 38,49, 65

Putting the scores in an array;

            12, 34, 38, 45, 49,65, 72

Observe that Me = 45

 

b.)   Consider the next ff. set ofscore

            8, 16, 24, 7, 21,17, 19, 18, 5, 26

Putting the scores in an array;

            5, 7, 8, 16, 17,18, 19, 21, 24, 26

Observe that the two middle most values are 17 and 18. Thus, themedian is

 

 

            17+18     35

Me = ---------- = ---- = 17.5

               2           2

 

 

Ψ Approximating the median forGrouped Data

 

 

 

where:

LCBme             =          lower class boundary of the medianclass

c                      =          class size

n                      =          number of cases

me-1            =          less than cumulative frequency of theclass preceding the median class

fme                    =          frequency of the median class

median class    =          class where the

 

Example:         Consider again thefollowing FDT:

 

Classes

Frequency

10 – 19

5

20 – 29

14

30 – 39

23

40 – 49

22

50 – 59

11

 

n = 75

 

           

 

 

 

 

 

To approximate the median, we construct first the

 

Classes

Frequency

 

10 – 19

5

5

 

20 – 29

14

19

 

30 – 39

23

42

Median Class

40 – 49

22

64

 

50 – 59

11

75

 

 

n=75

 

 

 

            Then we locate nextthe median class. As defined, the median class is the class where the

 

Thus,

            LCBme            =           29.5

            i                      =           10

            n/2                  =           37.5

            me-1              =           19

            fme                        =           23

 

Substituting these values to the formula, we obtain:

 

 

 

 

 

 

                                                

 

 

Properties of the Median

  • The median is an ordinal and positional measure
  • Not affected by extreme values compared to the mean

 

 

THE MODE

Ψ  For Ungrouped Data

·        Denoted by Mo

·        Value in the data set has thehighest frequency

Example: Consider the ff. data set whose values are IQ scores

                        DataSet A      :           77, 83, 91, 85, 83, 100

                        DataSet B      :           88, 92, 71, 88, 71, 36

                        DataSet C      :           96, 43, 79, 68, 83, 110

 

            Notice that themode for Data Set A is 83; the modes for Data Set B are 88 and 71 and we saythat this data set is bi-modal. Data Set C does not have a mode.

 

Ψ  Approximating the Mode for Grouped Data

 

a.   

 

 

b.  

 

where:

LCBme             =          lower class boundary of the medianclass

i                       =          class size

fme                    =          frequency of the modal class

fi                      =          frequency of the class preceding themodal class

f2                            =          frequency of the class following themodal class

 

Example: Consider again the following FDT:

 

Classes

Frequency

 

10 – 19

5

 

20 – 29

14

 

30 – 39

23

Modal Class

40 – 49

22

 

50 – 59

11

 

 

n=75

 

 

 

 

 

 

 

            To approximate themode, let us first locate the modal class, the class which has the highestfrequency. Thus, 30 – 39 is the modal class and

 

LCBme             =          29.5

i                       =          10

fme                    =          23

fi                      =          14

f2                            =          22

 

Substitutingthe values we obtain:

 

  1. Crude Mode – rough

 

a.

 

 

 

 

 

 

b.

 

 

 

 

 

 

  1. Refined Mode

For larger cases, n ≥ 100

a.      Mo = 3Me - 2

b.     Mo =  - 3 (  - Me)

 

OtherKinds of Mean

 

I.                  Weighted Mean (WM) = ∑xw / N

II.               Geometric Mean, (GM)

·        Used to derived the average ofindexes, relative values and percentages

·        nth root of theproduct of n number of values

·        antilog of the logarithms ofthe middle values multiplied by the frequencies divided by the total frequencydistribution

 

GM            =

                                    =(X1 ∙ X2 ∙ X3 ∙ ∙ ∙Xn)1/n

                  GM            =

 

III.            Harmonic Mean, (HM)

·        Used for spatial measurements,lengths, areas and volumes.

·        The reciprocal of thearithmetic mean of the reciprocals of the values.

HM                  =         

HM                  =         

Example: 10, 15, 8, 10, 13, 12, 10, 14


Solve the following:

1.     GM            = (10 ∙ 5 ∙ 8 ∙ 10 ∙ 13 ∙ 12 ∙ 10 ∙ 14)1/8

                  =(262080000)1/8

                  =11.28

GM            =

 

2.     HM            =

 

                  =

                 

                  =

 

                  =11.06

 

MEASURES OF DISPERSION ORVARIABILITY

 

·        Measure of Dispersion – indicate how thedata are dispersed or scattered about the average

·        Classifications:

o  Measures of Absolute Dispersion

-         Expressed in the original unitsof the original observations

 

THE RANGE

-         Difference between the largestand the smallest values

-         Maximum value minus minimumvalue

-         For grouped data, the range isdefined as the difference between the upper class limit of the highest classand the lower class limit of the lowest class

-         Simplest/roughest measure

 

Example:                     Consider the FDT

 

Classes

Frequency

10 – 19

5

20 – 29

14

30 – 39

23

40 – 49

22

50 – 59

11

 

n = 75

 

The Range                   R =59 – 10 = 49         (for discrete)

                                    R= 59.5 – 9.5 = 50     (for continuous)

 

Properties of the Range

·        It is a weak measure because ittells only the extreme values and does not provide information on the valuesbetween

·        It is greatly affected byoutliers

·        For open-ended frequencydistributions, the range cannot be computed

 

 

THE STANDARD DEVIATION

Ψ  For Population

 

Ψ  For Sample

 

Example: Consider the following scores of 5 students taken as samples:

 

                                    8,        6,         3,         4,         4

 

Then  = 8 + 6 + 3 + 4 + 4 = 25 = 5

                           5                  5

 

Thus,

 

x1

x1

(x1 2

8

5

3

9

6

5

1

1

3

5

-2

4

4

5

-1

1

4

5

-1

1

 

 

 

(x1 2 = 6

 

The Standard deviation is therefore:

 

 

S =

 

S =

 

S =

 

S =

 

 

Computational Formula for the Sample Standard Deviation forUngrouped Data

 

 

 

 

Example: To verify the computed standard deviation in the previous sample,we have

                                                                                                                              

x1

xi2

 

8

 

64

6

36

3

9

4

16

4

16

∑x1 = 25

∑x12 = 141

                                                                      

Thus,

 

 

 

 

 

s = 2

 

 

Ψ Approximating the StandardDeviation for Grouped Data

 

                                    where

                                    f1= frequency of the ith class

                                    x1= classmark of the ith class

                                    x = mean ofthe frequency distribution

                                    n = numberof cases

 

 

 

Example: Consider again the FDT:

 

Classes

Frequency

10 – 19

5

20 – 29

14

30 – 39

33

40 – 49

22

50 – 59

11

 

n = 75

 

Themean of this FDT was computed to be x = 37.17. The standard deviation iscomputed as follows:

 

Classes

fi

xi

(x1- )

(x1- )2

f1(x1- )2

10 – 19

5

14.5

37.17

-22.67

513.9289

2569.6445

20 – 29

14

24.5

37.17

-12.67

160.5289

2247.4046

30 – 39

23

34.5

37.17

-2.67

7.1289

163.9647

40 – 49

22

44.5

37.17

7.33

53.7289

1182.0358

50 – 59

11

54.5

37.17

17.33

300.3289

3303.6179

                                                                                                                     ∑f1(x1- ) 2 = 9466.6675

 

Hence,

 

 

 

 

 

 

Ψ Computational Formula forApproximating the Standard Deviation for Grouped Data

 

 

Properties of StandardDeviation

 

·        Since it is a function of themean, the standard deviation is affected by every value of the data set. Thus,it is sensitive against the presence of few extreme values.

·        If each observation of a dataset is added or subtracted by the same amount k, then the standard deviation ofthe new data set is the same as the standard deviation of the original dataset.

·        If each of the data set ismultiplied by a constant k, then the standard deviation of the new data set isequal to k times the standard deviation of the standard deviation of theoriginal data set.

 

OTHER MEASURES OFDISPERSION

·        Coefficient of Variation (CV)

-         Ratio of the standard deviationto its mean

-         Especially useful when onecompares the variability of one data set with another data set having differentunits

 

CV = s/  x 100%

 

·        Index of Qualitative Variation(IQV)

-         Dispersion measures forqualitative nominal or ordinal variable

-         If all the values f a variableare in one category, then there is no variation and IQV = 0

-         If all the values aredistributed evenly across the categories, IQV = 1, maximum value

 

IQV = k(N2 - ∑f2) / N2 (k-1)

 

                        where

                                    k = numberof categories

                                    N = numberof observations

 

 

 

 

 

 

 

 

 

 

 

MEASURES OF SKEWNESS

 

-         Shows the degree of asymmetryor departure from symmetry of a distribution

-         Indicates also the direction ofskewness

 

§  Types of Skewness

o  Positively Skewed

§  Longer tail to the right

§  More concentration of values below thanabove the mean

§  This happens when  > Me > Mo

 

 

 

 

 

 

 

 

 


                                                   MoMe

 

Example: If a teacher gives a very hard exam, then one can expect that thedistribution of scores will be positively skewed.

 

o  Negatively Skewed

§ Longer tail to the left

§ More concentration of values above thanbelow the mean

§ This happens when  < Me < Mo

 

 

 

 

 

 

 

 


                  Me Mo

 

Example: A very easy exam will result to a distribution of scores which isnegatively skewed.

 

Ψ Pearson’s Coefficients ofSkewness

 

1.      sk=x-Mox

2.      sk=3(x-Me) SD

 

where

            x           =mean

            Mo       =mode

            Me       =median

            SD       =standard deviation

 

We prefer to useFormula 2 over Formula 1 in as much as the mode does not always exist.

 

            Example:Recall the Statistics obtained from the FDT of scores of fifteen statisticsstudents.

 

x           =37.17

Me       = 37.54

            s           =11.31

 

sk=3(x-Me)s

 

sk=3(37.17-37.54)11.31

 

sk=3(-0.37)11.31

 

sk=-1.1111.31

 

sk=-0.0981

 

Thus, we say that thedistribution of scores is skewed to the left.

 

MEASURES OF KURTOSIS

Ψ  Describes the relative flatness or peakness of a distribution

·        Platykurtic – relatively flat

 

 

 


·         Leptokurtic – usuallypeaked

 

 

 

 

 

 

 

 

 

 

·        Mesokurtic – is between the platykurticand leptokurtic curves; approaches or look like the “normal curve”

 

 

 


 

 

 

 

 

 

 

Ψ  Pearson’s Coefficient of Kurtosis

 

 

Ku=x1- x 4nSD4

 

where

      x1         =          observationsin a data set

      x          =          thesample mean

      n          =          no.of cases

      SD       =          standarddeviation

 

Observe that if:

Ku < 3, then the distribution isplatykurtic

Ku = 3, then the distribution is mesokurtic

Ku > 3, then the distribution is leptokurtic



compilation of Dr. Libot

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Related Articles

How can a Statistics Tutor in New York Help You Succeed in Sta
Statistics is the study of analysis, collection, organization, and prescription of data. The study on this subject can be really helpful and can be implemented to the different sets of data, ranging everywhere from topics. The study of statistics needs the statistical vocabulary’s knowledge like Z-scores, T-scores, normal distribution, standard deviati...

Statistics Assignment Help in Canada: How to Write Quality Sta
Studying statistics needs a higher level of analytical skill to solve robust mathematical and statistical problems. For many students completing statistics assignments are challenging and tedious. Reasonable explanations are there on why students hate statistic assignment. The amount of thought and analytical skill in drafting a quality statistic assignment ...

STATISTICS MADE EASY
THE STUDY OF STATISTICS WHY STATISTICS?·        important in empirical studies·        aids in decision making·        helps to forecast or predictfuture outcomes·      ...

Also See: Yes Yes, Student Dropouts, Probability Sampling, Frequency Distribution, Yes, Year, Variable

Top Searches - Trending Searches - New Articles - Top Articles - Trending Articles - Featured Articles - Top Members

Copyright © 2010 - 2017 Uberant.com All Rights Reserved.