Statistical Geography

Quantitative methods, spatial statistics, and data analysis for geographic research.

Author

Geography Team

Official Syllabus

NEP-2020 Syllabus

NoteCore I Paper IV — Cartography and Geo-Spatial Techniques

*(Note: Quantitative topics are distributed in practical sections of various NEP papers)

**Cartography & Data:* - Scientific basis of Cartography, needs of map making, characteristics of maps - Geographical Coordinates, Graticules, Types of Scales (Plain, Diagonal) - Types of Map Projection, Transformation of area, Distance and Direction - Drawing of Choropleth and isopleth maps - Traffic flow diagram and Drawing of Isochrones, Isotims, Isodapanes - Slope Analysis (Wentworth’s method and Smith) - Determination of agricultural efficiency (Kendall and Bhatia method) - Delineation of crop combination regions (Weaver and Doi’s method) - Nearest Neighbour Analysis, Rank Size Rule

UGC NET Syllabus

TipUnit IX/X — Statistical Methods and Cartography

**I. Statistical Methods* - Sources of Geographic Information and Data (spatial and non-spatial) - Applications of Measures of Central Tendency, Dispersion and Inequalities - Sampling, Sampling Procedure and Hypothesis Testing (chi square, t-test, ANOVA) - Time Series Analysis, Correlation and Regression Analysis - Measurement of Indices, Making Indicators Scale Free, Computation of Composite Index - Principal Component Analysis and Cluster Analysis - Morphometric Analysis: Stream ordering, Bifurcation ratio, Drainage density, Slope Analysis

**II. Cartography* - Types of Maps, Techniques of Map Making - Data Representation on Maps (Pie diagrams, Bar diagrams, Line Graph) - Thematic maps: Choropleth, Isarithmic, Dasymetric, Chorochromatic, Flow Maps

NET Statistics for Geography — Detailed Syllabus (Pulakesh Pradhan)

ImportantSyllabus Topics
  • Data sources and types of data
  • Statistical diagrams
  • Study of frequency distribution and cumulative frequency
  • Measures of central tendency
  • Selection of class interval for mapping
  • Measures of dispersion and concentration
  • Standard deviation
  • Lorenz curve
  • Methods of measuring association among different attributes
  • Simple and multiple correlation
  • Regression

Exam: 5 Questions = 10 Marks

Important Topics for NET

TipHigh-Yield Topics
  • Source of data
  • Types of data
  • Statistical diagrams — Histogram, Ogive, Circle
  • Mean, Median, Mode
  • Mean Deviation, Standard Deviation
  • Coefficient of Variation
  • Quartile Deviation, Range
  • Nearest Neighbour Analysis (NNI) & Rn value
  • Lorenz Curve
  • Correlation Coefficient
  • Regression Analysis
  • Normal Distribution
  • Factor Analysis
  • Chi-square, Z-Test, T-Test, F-Test
  • Types of scale (Nominal, Ordinal, Interval, Ratio)

Welcome to the Statistical Geography module of Geography OpenCourseWare.


Part A: Common Topics (NEP-2020 & UGC NET)

These topics are covered in both the NEP-2020 undergraduate syllabus and the UGC NET syllabus.

Geographic Data Sources and Types

Warning📘 Syllabus Coverage
Syllabus Topic Details
NEP-2020 Geo-spatial techniques data handling
UGC NET Sources of Geographic Information and Data (spatial and non-spatial)

Get the Presentation ↗   |   Watch the Video ↗

NoteKey Concepts
  • Spatial Data: Information tied to a specific location on Earth’s surface (coordinates, addresses, administrative boundaries).
  • Non-Spatial (Attribute) Data: Characteristics or qualities of the spatial features (population count, soil type name, temperature reading).
  • Data Sources:
    • Primary: Field survey, interviews, observations, GPS data collection.
    • Secondary: Census reports, meteorological data, satellite imagery, topographical maps, statistical abstracts.
  • Measurement Scales: Nominal (categories), Ordinal (rank), Interval (no true zero, e.g., Temp in C), Ratio (true zero, e.g., population, distance).

Data Collection (NET Notes — Pulakesh Pradhan)

NoteKey Definitions
Term Definition
Investigator Person who conducts the statistical inquiry
Respondent Person from whom information is collected
Statistical Unit Item on which measurements are taken

Steps for Data Collection

  1. Define objectives and scope of enquiry
  2. Determine statistical units to be used
  3. Identify source of information (data)
  4. Choose methods of data collection
  5. Determine degree of accuracy required
  6. Decide type of enquiry

Types of Statistical Units

**Unit of Collection:*

  • Unit of enumeration — whether by sample or census method
  • Unit of recording — Kilogram, Quintal, Metre, km, etc.

**Unit of Analysis and Interpretation:*

  • Rates, ratios, percentages and coefficients
  • Examples: CBR (Crude Birth Rate), IMR (Infant Mortality Rate)

Sources of Data (NET Notes)

NotePrimary Data
  • Collected by the investigator originally
  • More accurate but time-consuming and expensive

Secondary Data

  • Collected from earlier published or collected sources

Methods of Collecting Primary Data

  1. Direct personal investigation
  2. Indirect oral interviews
  3. Information received through local agencies
  4. Mailed questionnaire method
  5. Schedules sent through enumerators

Sources of Secondary Data

**Official Publications (Central Government):*

  • Office of the Registrar General and Census Commissioner of India
  • Directorate General of Commercial Intelligence and Statistics
  • Labour Bureau — Ministry of Labour
  • Directorate of Economics and Statistics
  • Indian Army Statistical Organization
  • Central Statistical Organization (CSO)

**Semi-Government Publications:*

  • Statistical Department of RBI (Mumbai)
  • Economic Department of RBI
  • Institute of Economic Growth
  • Gokhale Institute of Politics and Economics
  • Institute of Foreign Trade

**Research Institutes:*

  • Statistical Institute of Kolkata
  • Indian Agricultural Statistical Research Institute
  • NCERT

**International Publications:*

  • UNO Statistical Yearbook
  • UN Statistical Abstract
  • Demographic Yearbook
  • IMF, ILO, World Bank publications

Types of Enquiry (NET Notes)

Type Description
Official / Semi-official / Unofficial Collected through sponsoring agencies (e.g., ICAR, IASRI)
Initial / Repetitive Initial = first time; Repetitive = continuation of previous enquiry
Confidential / Non-confidential Confidential = not made public; Non-confidential = publicly available
Direct / Indirect Direct = quantitative phenomena (age, weight, income); Indirect = qualitative phenomena (honesty, intelligence)
Regular / Ad-hoc Regular = at fixed intervals; Ad-hoc = as and when required

Quality of a Good Questionnaire

  1. Size should be as small as possible
  2. Questions should be clear, brief, unambiguous and non-offending
  3. Questions arranged in natural logical sequence
  4. Avoid vague and multiple-meaning words
  5. Questions should be readily comprehensible
  6. Avoid sensitive and personal questions
  7. Avoid leading questions
  8. Include internal checks on accuracy
  9. Use pre-tested questionnaires
  10. Include covering letter from organizers

Types of Scales (NET Notes)

Scale Nature Examples
Nominal Classification only; no order Gender, religion, land use type
Ordinal Order/rank exists; intervals not equal Rank of cities, soil quality classes
Interval Equal intervals; no true zero Temperature in °C, dates
Ratio Equal intervals + true zero Height, weight, distance, income

Types of Data / Variables (NET Notes)

NoteQualitative vs. Quantitative
  • Qualitative (categorical) — not measured numerically (religion, land use)
  • Quantitative (numerical) — measured numerically (height, temperature)

Discrete vs. Continuous

  • Discrete — whole numbers only (number of farms, population count)
  • Continuous — can take any value within a range (rainfall, temperature)

Measures of Central Tendency and Dispersion

Warning📘 Syllabus Coverage
Syllabus Topic Details
NEP-2020 Basic statistical analysis required for practical geography
UGC NET Applications of Measures of Central Tendency, Dispersion and Inequalities

Get the Presentation ↗   |   Watch the Video ↗

NoteKey Concepts
  • Central Tendency: Identifying the ‘center’ of a dataset.
    • Mean: Arithmetic average. Sensitive to outliers. Spatial mean (mean center) for spatial data.
    • Median: The score in the middle of a distribution. Robust to outliers.
    • Mode: Most frequent value.
  • Z-score: Also known as the Standard Score, it is used for data transformation and comparison. The formula is \(Z = (X - \bar{X}) / \sigma\), where \(X\) is the value, \(\bar{X}\) is the mean, and \(\sigma\) is the standard deviation.
  • Dispersion: How spread out the data is around the center.
    • Range, Quartile Deviation.
    • Standard Deviation (SD): Average distance from the mean. Standard distance in spatial analysis.
    • Coefficient of Variation (CV): (SD/Mean)*100. Useful for comparing variability of datasets with different units.
  • Inequalities Measurement: Lorenz Curve (graphical), Gini Coefficient (numerical, 0 to 1). Gini’s coefficient is a widely used technique to show Income Inequality.
  • Shape of Distribution: Skewness and Kurtosis are used for determining the Shape of a frequency distribution.
  • Normal Distribution: In a normal distribution, the percentage of the area between the mean \(\pm 1 \sigma\) (Standard Deviation) is 68.27%.

Measures of Central Tendency — Detailed (NET Notes — Pulakesh Pradhan)

Types: Arithmetic Mean, Median, Mode, Geometric Mean, Harmonic Mean

1. Arithmetic Mean (x̄)

Sum of all observations divided by the number of observations

**Direct Method:* > x̄ = Σfx / Σf

**Step Deviation Method:* > x̄ = A + h(Σfd / N) > A = assumed mean, h = class interval

TipMerits
  • Easy to calculate
  • Based on all observations
  • Suitable for further mathematical treatment
  • Most stable average (least affected by sampling)
Demerits
  • Cannot be used for open-end classes
  • Cannot be located graphically
  • Very much affected by extreme observations
  • Not suitable for qualitative data

Weighted Arithmetic Mean

Used when all items are not of equal importance Xw = (w1x1 + w2x2 + w3x3 + …) / (w1 + w2 + w3 + …)

2. Median

L.R. Connor: *“The median is that value of the variable which divides the group into two equal parts”

  • A positional average, not based on all items (unlike arithmetic mean)

**Ungrouped Data:* > Median = Arithmetic mean of two middle terms

**Grouped Data:* > Median = L + [(N/2 − cf) / f] × h

3. Mode

Value which occurs most frequently; point of highest concentration

**Formula:* > Mode = L + h × (f1 − f0) / [(f1 − f0) − (f2 − f1)]

**Relation:* > Mode = Mean − 3(Mean − Median) > Mode = 3 Median − 2 Mean

4. Geometric Mean (GM)

nth root of the product of n observations

**Formula:* > GM = (x1 × x2 × x3 × … × xn)^(1/n) > GM = Antilog [1/n × Σ log x]

TipMerits
  • Rigidly defined
  • Based on all observations
  • Suitable for further mathematical treatment
  • Has bias for smaller observations (unlike AM)
Demerits
  • Not easy to calculate
  • If any observation = 0 → GM = 0
  • If any observation is negative → GM becomes imaginary
Uses
  • Useful for averaging ratios, percentages and rates of increase
  • Used in construction of Index Numbers
  • Useful for river data analysis

5. Harmonic Mean (HM)

Reciprocal of the arithmetic mean of reciprocals of the given observations

**Formula:* > HM = 1 / [1/n × Σ(1/x)] = 2ab / (a + b)

TipMerits
  • Rigidly defined
  • Based on all observations
  • Gives greater weight to smaller observations
  • Not affected by sampling fluctuations
Demerits
  • Not easy to understand or calculate
  • Cannot be obtained if any observation = 0
  • May not be representative unless smaller items need higher weightage
Uses
  • Specifically useful for averaging rates and ratios where time factor is variable and **distance is constant*

Relation Among AM, GM, HM

AM ≥ GM ≥ HM (always)

Measure Two terms (a, b) Three terms (a, b, c)
AM (a+b)/2 (a+b+c)/3
GM √(ab) ∛(abc)
HM 2ab/(a+b) 3abc/(ab+bc+ca)

Measures of Dispersion — Detailed (NET Notes)

Dispersion = measures of variation of items in a distribution

Characteristics of a Good Measure of Dispersion

  • Rigidly defined
  • Easy to calculate and understand
  • Based on all observations
  • Amenable to further mathematical treatment
  • Not much affected by sampling fluctuations
  • Not much affected by extreme observations

Types

  1. Range
  2. Quartile Deviation (Semi-Inter-Quartile Range)
  3. Mean Deviation
  4. Standard Deviation
  5. Lorenz Curve

1. Range

R = X(max) − X(min)

Coefficient of Range = [X(max) − X(min)] / [X(max) + X(min)]

NoteMerits
  • Based on entire data
Demerits
  • Varies widely from sample to sample
  • Not used for open-end classes
  • Very sensitive to sample size
Uses
  • Stock market functions
  • Industrial statistical quality control
  • Day-to-day life measurements
  • Meteorological department

2. Quartile Deviation (QD)

Inter-Quartile Range = Q3 − Q1

QD = (Q3 − Q1) / 2

Coefficient of QD = (Q3 − Q1) / (Q3 + Q1)

NoteMerits
  • Easy to understand and calculate
  • Uses 50% of data
  • Not affected by extreme observations
  • Only measure of dispersion that can be used with **open-end classes*
Demerits
  • Ignores starting 25% and ending 25% of data
  • Affected by sampling fluctuations
  • Not suitable for further mathematical treatment

3. Mean Deviation (MD)

MD = (1/N) × Σ|x − A|

Coefficient of MD = MD / Average (about which it is calculated)

NoteMerits
  • Based on all observations
  • More accurate than Range or QD
Demerits
  • Ignores signs of deviations (mathematically unsound)
  • Not satisfactory for skewed distributions
  • Rarely used in social studies
  • Cannot be computed for open-end classes
Uses
  • Used in economics and statistics for simplicity
  • Useful for computing distribution of personal wealth
  • National Bureau of Economic Research (USA) uses it

4. Standard Deviation (σ)

First suggested by **Karl Pearson (1893)*

Positive square root of the arithmetic mean of squared deviations from the arithmetic mean

σ = √[1/N × Σ(x − x̄)²]

C.V. = 100 × σ / x̄

TipMerits
  • Most important and widely used measure of dispersion
  • Based on all observations
  • Removes drawback of ignoring signs (uses squared deviations)
  • Suitable for further mathematical treatment
  • Least affected by sampling fluctuations
Demerits
  • Not easy to understand for non-mathematical persons
  • Gives greater importance to extreme values
Interpretation Methods
  1. Empirical rule — for bell-shaped (normal) distribution
  2. Chebyshev’s Theorem
  3. Z-score / standard scores
Coefficient of Variation (CV)

CV = (σ / x̄) × 100

  • Relative measure of dispersion
  • “Coefficient of variation is the percentage variation in mean, standard deviation being considered as the total variation in the mean” — Karl Pearson
  • Used to compare variability between two different groups

5. Lorenz Curve

  • Graphic method of studying **dispersion in a distribution*
  • Both size of items (values of variable) and frequencies are cumulated
  • Provides relative idea of dispersion compared with the line of equal distribution
  • Cannot immediately show what % of persons corresponds to a given % of items
Gini Coefficient
  • Based on the coefficient of mean difference
  • **G = Δ1 / 2X̄*
  • Varies from **0 to 1*
  • G = 0 → perfect equality
  • G increases with increasing inequality
  • Gini = ratio of area of concentration to total area of lower triangle below line of equal distribution

Normal Distribution — Detailed (NET Notes)

Also known as **Bell Curve*

In normal distribution: **Mean = Median = Mode*

NoteProperties
  • Symmetric about the mean
  • Bell-shaped
  • Total area under curve = 1
  • 68% data within ±1σ
  • 95% data within ±2σ
  • 99.7% data within ±3σ (Empirical Rule)

Correlation and Regression Analysis

Warning📘 Syllabus Coverage
Syllabus Topic Details
NEP-2020 Applied in geographical data analysis
UGC NET Correlation and Regression Analysis

Get the Presentation ↗   |   Watch the Video ↗

NoteKey Concepts
  • Correlation: Measures the strength and direction of the linear relationship between two variables (x and y).
    • Pearson’s Product Moment (r): For interval/ratio data. Ranges from -1 to +1.
    • Spearman’s Rank Correlation (Rho): For ordinal (ranked) data.
  • Regression: Predictive modeling. Investigates the dependence of a dependent variable (Y) on one or more independent variables (X).
    • Linear Regression: \(Y = a + bX\). ‘a’ is intercept, ‘b’ is slope (regression coefficient).
    • Residuals: The difference between observed and predicted values. Mapping residuals (Spatial Regression) reveals spatial anomalies.

Correlation Analysis — Detailed (NET Notes — Pulakesh Pradhan)

Karl Pearson’s Correlation Coefficient (r)

Method for measuring intensity or magnitude of linear relationship between two variables

Also called **Product Moment Correlation Coefficient*

**Formula:* > r = Σ(x − x̄)(y − ȳ) / √[Σ(x − x̄)² × Σ(y − ȳ)²]

Value of r Interpretation
r = +1 Perfect positive correlation
r = −1 Perfect negative correlation
r = 0 No linear correlation
0 < r < 1 Positive correlation
−1 < r < 0 Negative correlation

Spearman’s Rank Correlation

Used when data is in ranks (ordinal scale)

rs = 1 − [6ΣD² / n(n² − 1)]

Regression Analysis — Detailed (NET Notes)

*“Regression is stepping back or returning to the average value”

  • Mathematical measure of average relationship between two or more variables in terms of original units of data

Simple Linear Regression

Y = a + bX Y = dependent variable; X = independent variable b = regression coefficient (slope) a = intercept

Types

  • Simple regression — one dependent, one independent variable
  • Multiple regression — one dependent, multiple independent variables

Map Projections and Scales

Warning📘 Syllabus Coverage
Syllabus Topic Details
NEP-2020 Types of Scales, Map Projection, Transformation
UGC NET Techniques of Map Making

Get the Presentation ↗   |   Watch the Video ↗

NoteKey Concepts
  • Map Scale: Ratio between map distance and ground distance. Types: Representative Fraction (RF), Statement, Graphical/Linear, Diagonal.
  • Map Projection: Mathematical transformation of the 3D Earth surface to a 2D plane. Always involves distortion.
  • Preserved Properties:
    • Conformal (Orthomorphic): Preserves exact local angles/shapes (e.g., Mercator). Used for navigation.
    • Equal-Area (Equivalent): Preserves area proportions (e.g., Mollweide, Peters). Used for thematic density distributions.
    • Equidistant: Preserves correct distance from a center point.
    • Azimuthal: Preserves correct direction from a center point.
  • Developable Surfaces: Cylinder, Cone, Plane (Zenithal).

Thematic Mapping Techniques

Warning📘 Syllabus Coverage
Syllabus Topic Details
NEP-2020 Drawing of Choropleth, Isopleth, Traffic flow diagrams
UGC NET Data Representation: Choropleth, Isarithmic, Dasymetric, Chorochromatic, Flow Maps

Get the Presentation ↗   |   Watch the Video ↗

NoteKey Concepts
  • Choropleth Map: Uses shading/colors within predefined administrative boundaries (states, districts) to represent derived data (densities, rates, percentages). Not suitable for absolute numbers.
  • Isopleth (Isarithmic) Map: Lines connecting points of equal value (e.g., isotherms, contours, isobars). Assumes continuous data mapped over an isotropic surface. It is the best method to represent the Rate of change (when data is given in ratios or percentages).
  • Chorochromatic Map: Color-patch map showing qualitative distribution without numerical value (e.g., soil types, land use zones).
  • Dasymetric Map: Advanced choropleth that ignores administrative boundaries in favor of actual geographical boundaries of the phenomenon (e.g., mapping population density excluding lakes and forests).
  • Flow Map: Lines of varying thickness show direction and volume of movement (trade, traffic, migration).
  • Diagrammatic Maps: Pie diagrams, bar graphs superimposed on maps to show multiple variables.

Spatial Statistics and Morphometry

Warning📘 Syllabus Coverage
Syllabus Topic Details
NEP-2020 Nearest Neighbour, Slope Analysis (Wentworth), Agricultural indices
UGC NET Morphometric Analysis, Measurement of Indices

Get the Presentation ↗   |   Watch the Video ↗

NoteKey Concepts
  • Nearest Neighbour Analysis (NNA): Measures spatial arrangement of points. Nearest Neighbour Index (R). R=0 (clustered), R=1 (random), R=2.15 (perfectly uniform/regular).
  • Slope Analysis: Wentworth’s method (based on contour crossings per grid cell) and Smith’s method (relative relief combined with drainage density).
  • Drainage Morphometry:
    • Stream Ordering: Strahler (only same orders combine to increase) vs. Horton.
    • Bifurcation Ratio: Ratio of number of streams of a given order to the number of next higher order.
    • Drainage Density: Total stream length / basin area.
  • Indices: Agricultural efficiency (Kendall’s ranking, Bhatia’s weighted output). Handling data of different scales requires normalization (Z-scores) to create composite indices.

Nearest Neighbour Analysis — Detailed (NET Notes — Pulakesh Pradhan)

Measures the pattern of point distribution — whether clustered, random or regular

**Formula:* > Rn = 2D̄ × √(n/A) > D̄ = mean nearest neighbour distance > n = number of points > A = total area

Rn Value Pattern
Rn = 0 Maximum clustering (all points coincide)
Rn = 1 Random distribution
Rn = 2.149 Maximum dispersion / perfect hexagonal

Part B: NEP-2020 Specific Topics

These topics are part of the NEP-2020 undergraduate programme only.

Agricultural Delineation Methods

Warning📘 Syllabus Coverage
Syllabus Topic Details
NEP-2020 Delineation of crop combination regions (Weaver and Doi’s method)

Get the Presentation ↗   |   Watch the Video ↗

NoteKey Concepts
  • Crop Combination Analysis: Identifying the dominant crop assemblages in a region to understand agricultural regionalization.
  • Weaver’s Method (1954): Calculates variance (\(\Sigma d^2 / n\)) between actual crop percentages and theoretical distributions (monoculture = 100%, 2-crop = 50% each, etc.). The combination with the lowest variance is chosen.
  • Doi’s Modification (1959): Simplified Weaver’s method by using \(\Sigma d^2\) without dividing by ‘n’ and providing a ready-to-use critical value table, making manual calculation much faster while yielding similar results.

Part C: UGC NET Specific Topics

These topics are part of the UGC NET syllabus only.

Sampling and Hypothesis Testing

Warning📘 Syllabus Coverage
Syllabus Topic Details
UGC NET Sampling, Sampling Procedure and Hypothesis Testing (chi square, t-test, ANOVA)

Get the Presentation ↗   |   Watch the Video ↗

NoteKey Concepts
  • Sampling: Selecting a subset from a population. Probability (Random, Stratified, Systematic, Cluster) vs. Non-probability (Purposive, Snowball, Quota).
  • Hypothesis Testing: Formulating Null (\(H_0\)) and Alternative (\(H_1\)) hypotheses. Type I error (reject true \(H_0\)) and Type II error (accept false \(H_0\)). Set significance level (\(lpha\), usually 0.05).
  • Standard Error (SE): Considered as the measure of the sampling error. It indicates how much the sample mean is likely to differ from the actual population mean.
  • Parametric Tests (assume normal distribution):
    • t-test: Compares means of two groups.
    • ANOVA (Analysis of Variance): Compares means of three or more groups.
  • Non-Parametric Tests (distribution-free):
    • Chi-Square (\(\chi^2\)) Test: Tests association between categorical variables (observed vs. expected frequencies). A value of 0 indicates that the Null hypothesis is accepted.

Statistical Tests — Detailed (NET Notes — Pulakesh Pradhan)

Parametric Tests — Assumptions

  • Based on a known distribution (usually normal)
  • Population parameters are known or estimated
  • Data measured on interval or ratio scale

1. T-Test (Student’s t-Test)

  • **Developed by William Gosset (1908)*
  • Also called Student’s t-test or **Welch’s t-test*
  • Small sample test — when sample size < 30
  • Degree of freedom: v = n − 1

**Formula:* > t = Deviation from population parameter / Standard error of sample statistic

Uses
  • Test of significance of regression coefficient
  • In multiple regression with individual variables
  • When population variance is **unknown*
  • When population parameters follow normal distribution

2. Z-Test

  • **Given by Fisher*
  • Large sample test — sample size n > 30
  • Used when population variance is **known*
  • Based on standard normal distribution
  • Critical value for 5% significance (two-tailed): **1.96*
Uses
  • To determine whether two population means are different
  • When correlation coefficient of population is zero → use Z-test
  • Types: (1) One sample (2) Two sample (3) Location test (4) Maximum likelihood test

3. F-Test (Variance Ratio Test)

  • **Given by Fisher*
  • Used to compare two independent estimates of population variance
  • Small sample test
  • Null hypothesis: H₀: S₁² = S₂²

**Formula:* > F = Larger estimate of variance / Smaller estimate of variance

Degrees of Freedom
  • v1 = n1 − 1 and v2 = n2 − 1
Key Points
  • F-test value is **never negative*
  • Used for testing overall significance in regression models
  • Value lies from **zero to infinity*

4. Chi-Square Test — Detailed

  • **Introduced by Karl Pearson*
  • **Non-parametric test*
  • Chi-square values lie between **0 and infinity*
Conditions for Use
  • Total frequency and sample size must be large
  • Samples must be independent
  • Expected frequency should not be small — if < 5, use **pooling technique*

5. ANOVA — Detailed Assumptions

  1. Independence of samples
  2. Normal population
  3. Same population variance
  4. Based on qualitative data

Summary Table of Tests

Test Type Sample Size Key Use
T-test Parametric n < 30 Means; unknown variance
Z-test Parametric n > 30 Means; known variance
F-test Parametric Small Variance comparison
Chi-square Non-parametric Large Goodness of fit
ANOVA Parametric Multiple group comparison

Advanced Multivariate Analysis

Warning📘 Syllabus Coverage
Syllabus Topic Details
UGC NET Principal Component Analysis and Cluster Analysis

Get the Presentation ↗   |   Watch the Video ↗

NoteKey Concepts
  • Principal Component Analysis (PCA): A statistical procedure that uses an orthogonal transformation to convert a set of correlated variables to a set of uncorrelated variables. PC1 accounts for the most variance. Used to create composite indices (e.g., developmental index from 15 variables).
  • ANOVA (Analysis of Variance): Preferred over the ‘t’ test where comparison between two or more population means are involved. This is because the probability of committing Type I error accumulates when the number of comparisons in the ‘t’ test increases.
  • Cluster Analysis: Grouping a set of objects such that objects in the same group (cluster) are more similar to each other than to those in other groups. Used in regionalization (e.g., grouping districts based on socio-economic similarity).
  • Time Series Analysis: Analyzing sequence of data points over time to extract meaningful statistics. Components: Trend (long term), Cyclical, Seasonal, Irregular/Random variations. Used for climate data or population forecasting.

Factor Analysis — Detailed (NET Notes — Pulakesh Pradhan)

A process in which values of observed data are expressed as functions of a number of possible causes to find the most important ones

  • Reduces large number of variables to fewer underlying factors
  • Used to identify spatial patterns of development, soil properties, urban structure, etc.


Quick Reference

Statistical Geography Quick Reference

Key Books and Authors

Book Author
Statistical Geography David Gregory
Quantitative Geography J.P. Cole & C.A.M. King
Statistical Methods and the Geographer S. Gregory

Key Tests and Techniques

Technique / Test Application
Chi-Square (\(\chi^2\)) Test Tests the goodness of fit or independence between categorical variables.
Student’s t-test Compares means of two groups.
ANOVA (F-test) Compares means of three or more groups.
Correlation (\(r\)) Measures strength and direction of linear relationship (Pearson/Spearman).
Regression Analysis Predicts the value of a dependent variable based on independent variables.
Standard Distance Spatial equivalent of standard deviation.
Lorenz Curve Graphical representation of inequality (e.g., land ownership).
Gini Coefficient Numerical measure of inequality derived from Lorenz Curve.

Descriptive Statistics

  • Mean: Average value.
  • Median: Middle value in a sorted list.
  • Mode: Most frequent value.
  • Standard Deviation: Measure of data dispersion around the mean.
  • Skewness: Measure of asymmetry in a distribution.
  • Kurtosis: Measure of the “peakedness” of a distribution.

Notes compiled by Geography Team