Statistical Geography

Quantitative methods, spatial statistics, and data analysis for geographic research.

Author

Geography Team

Official Syllabus

NEP-2020 Syllabus

Core I Paper IV — Cartography and Geo-Spatial Techniques

*(Note: Quantitative topics are distributed in practical sections of various NEP papers)

**Cartography & Data:* - Scientific basis of Cartography, needs of map making, characteristics of maps - Geographical Coordinates, Graticules, Types of Scales (Plain, Diagonal) - Types of Map Projection, Transformation of area, Distance and Direction - Drawing of Choropleth and isopleth maps - Traffic flow diagram and Drawing of Isochrones, Isotims, Isodapanes - Slope Analysis (Wentworth’s method and Smith) - Determination of agricultural efficiency (Kendall and Bhatia method) - Delineation of crop combination regions (Weaver and Doi’s method) - Nearest Neighbour Analysis, Rank Size Rule

UGC NET Syllabus

Unit IX/X — Statistical Methods and Cartography

**I. Statistical Methods* - Sources of Geographic Information and Data (spatial and non-spatial) - Applications of Measures of Central Tendency, Dispersion and Inequalities - Sampling, Sampling Procedure and Hypothesis Testing (chi square, t-test, ANOVA) - Time Series Analysis, Correlation and Regression Analysis - Measurement of Indices, Making Indicators Scale Free, Computation of Composite Index - Principal Component Analysis and Cluster Analysis - Morphometric Analysis: Stream ordering, Bifurcation ratio, Drainage density, Slope Analysis

**II. Cartography* - Types of Maps, Techniques of Map Making - Data Representation on Maps (Pie diagrams, Bar diagrams, Line Graph) - Thematic maps: Choropleth, Isarithmic, Dasymetric, Chorochromatic, Flow Maps

NET Statistics for Geography — Detailed Syllabus (Pulakesh Pradhan)

Syllabus Topics

Data sources and types of data
Statistical diagrams
Study of frequency distribution and cumulative frequency
Measures of central tendency
Selection of class interval for mapping
Measures of dispersion and concentration
Standard deviation
Lorenz curve
Methods of measuring association among different attributes
Simple and multiple correlation
Regression

Exam: 5 Questions = 10 Marks

Important Topics for NET

High-Yield Topics

Source of data
Types of data
Statistical diagrams — Histogram, Ogive, Circle
Mean, Median, Mode
Mean Deviation, Standard Deviation
Coefficient of Variation
Quartile Deviation, Range
Nearest Neighbour Analysis (NNI) & Rn value
Lorenz Curve
Correlation Coefficient
Regression Analysis
Normal Distribution
Factor Analysis
Chi-square, Z-Test, T-Test, F-Test
Types of scale (Nominal, Ordinal, Interval, Ratio)

Welcome to the Statistical Geography module of Geography OpenCourseWare.

Part A: Common Topics (NEP-2020 & UGC NET)

These topics are covered in both the NEP-2020 undergraduate syllabus and the UGC NET syllabus.

Geographic Data Sources and Types

📘 Syllabus Coverage

Syllabus	Topic Details
NEP-2020	Geo-spatial techniques data handling
UGC NET	Sources of Geographic Information and Data (spatial and non-spatial)

Get the Presentation ↗ | Watch the Video ↗

Key Concepts

Spatial Data: Information tied to a specific location on Earth’s surface (coordinates, addresses, administrative boundaries).
Non-Spatial (Attribute) Data: Characteristics or qualities of the spatial features (population count, soil type name, temperature reading).
Data Sources:
- Primary: Field survey, interviews, observations, GPS data collection.
- Secondary: Census reports, meteorological data, satellite imagery, topographical maps, statistical abstracts.
Measurement Scales: Nominal (categories), Ordinal (rank), Interval (no true zero, e.g., Temp in C), Ratio (true zero, e.g., population, distance).

Data Collection (NET Notes — Pulakesh Pradhan)

Key Definitions

Term	Definition
Investigator	Person who conducts the statistical inquiry
Respondent	Person from whom information is collected
Statistical Unit	Item on which measurements are taken

Steps for Data Collection

Define objectives and scope of enquiry
Determine statistical units to be used
Identify source of information (data)
Choose methods of data collection
Determine degree of accuracy required
Decide type of enquiry

Types of Statistical Units

**Unit of Collection:*

Unit of enumeration — whether by sample or census method
Unit of recording — Kilogram, Quintal, Metre, km, etc.

**Unit of Analysis and Interpretation:*

Rates, ratios, percentages and coefficients
Examples: CBR (Crude Birth Rate), IMR (Infant Mortality Rate)

Sources of Data (NET Notes)

Primary Data

Collected by the investigator originally
More accurate but time-consuming and expensive

Secondary Data

Collected from earlier published or collected sources

Methods of Collecting Primary Data

Direct personal investigation
Indirect oral interviews
Information received through local agencies
Mailed questionnaire method
Schedules sent through enumerators

Sources of Secondary Data

**Official Publications (Central Government):*

Office of the Registrar General and Census Commissioner of India
Directorate General of Commercial Intelligence and Statistics
Labour Bureau — Ministry of Labour
Directorate of Economics and Statistics
Indian Army Statistical Organization
Central Statistical Organization (CSO)

**Semi-Government Publications:*

Statistical Department of RBI (Mumbai)
Economic Department of RBI
Institute of Economic Growth
Gokhale Institute of Politics and Economics
Institute of Foreign Trade

**Research Institutes:*

Statistical Institute of Kolkata
Indian Agricultural Statistical Research Institute
NCERT

**International Publications:*

UNO Statistical Yearbook
UN Statistical Abstract
Demographic Yearbook
IMF, ILO, World Bank publications

Types of Enquiry (NET Notes)

Type	Description
Official / Semi-official / Unofficial	Collected through sponsoring agencies (e.g., ICAR, IASRI)
Initial / Repetitive	Initial = first time; Repetitive = continuation of previous enquiry
Confidential / Non-confidential	Confidential = not made public; Non-confidential = publicly available
Direct / Indirect	Direct = quantitative phenomena (age, weight, income); Indirect = qualitative phenomena (honesty, intelligence)
Regular / Ad-hoc	Regular = at fixed intervals; Ad-hoc = as and when required

Quality of a Good Questionnaire

Size should be as small as possible
Questions should be clear, brief, unambiguous and non-offending
Questions arranged in natural logical sequence
Avoid vague and multiple-meaning words
Questions should be readily comprehensible
Avoid sensitive and personal questions
Avoid leading questions
Include internal checks on accuracy
Use pre-tested questionnaires
Include covering letter from organizers

Types of Scales (NET Notes)

Scale	Nature	Examples
Nominal	Classification only; no order	Gender, religion, land use type
Ordinal	Order/rank exists; intervals not equal	Rank of cities, soil quality classes
Interval	Equal intervals; no true zero	Temperature in °C, dates
Ratio	Equal intervals + true zero	Height, weight, distance, income

Types of Data / Variables (NET Notes)

Qualitative vs. Quantitative

Qualitative (categorical) — not measured numerically (religion, land use)
Quantitative (numerical) — measured numerically (height, temperature)

Discrete vs. Continuous

Discrete — whole numbers only (number of farms, population count)
Continuous — can take any value within a range (rainfall, temperature)

Measures of Central Tendency and Dispersion

📘 Syllabus Coverage

Syllabus	Topic Details
NEP-2020	Basic statistical analysis required for practical geography
UGC NET	Applications of Measures of Central Tendency, Dispersion and Inequalities

Get the Presentation ↗ | Watch the Video ↗

Key Concepts

Central Tendency: Identifying the ‘center’ of a dataset.
- Mean: Arithmetic average. Sensitive to outliers. Spatial mean (mean center) for spatial data.
- Median: The score in the middle of a distribution. Robust to outliers.
- Mode: Most frequent value.
Z-score: Also known as the Standard Score, it is used for data transformation and comparison. The formula is \(Z = (X - \bar{X}) / \sigma\), where \(X\) is the value, \(\bar{X}\) is the mean, and \(\sigma\) is the standard deviation.
Dispersion: How spread out the data is around the center.
- Range, Quartile Deviation.
- Standard Deviation (SD): Average distance from the mean. Standard distance in spatial analysis.
- Coefficient of Variation (CV): (SD/Mean)*100. Useful for comparing variability of datasets with different units.
Inequalities Measurement: Lorenz Curve (graphical), Gini Coefficient (numerical, 0 to 1). Gini’s coefficient is a widely used technique to show Income Inequality.
Shape of Distribution: Skewness and Kurtosis are used for determining the Shape of a frequency distribution.
Normal Distribution: In a normal distribution, the percentage of the area between the mean \(\pm 1 \sigma\) (Standard Deviation) is 68.27%.

Measures of Central Tendency — Detailed (NET Notes — Pulakesh Pradhan)

Types: Arithmetic Mean, Median, Mode, Geometric Mean, Harmonic Mean

1. Arithmetic Mean (x̄)

Sum of all observations divided by the number of observations

**Direct Method:* > x̄ = Σfx / Σf

**Step Deviation Method:* > x̄ = A + h(Σfd / N) > A = assumed mean, h = class interval

Merits

Easy to calculate
Based on all observations
Suitable for further mathematical treatment
Most stable average (least affected by sampling)

Demerits

Cannot be used for open-end classes
Cannot be located graphically
Very much affected by extreme observations
Not suitable for qualitative data

Weighted Arithmetic Mean

Used when all items are not of equal importance Xw = (w1x1 + w2x2 + w3x3 + …) / (w1 + w2 + w3 + …)

2. Median

L.R. Connor: *“The median is that value of the variable which divides the group into two equal parts”

A positional average, not based on all items (unlike arithmetic mean)

**Ungrouped Data:* > Median = Arithmetic mean of two middle terms

**Grouped Data:* > Median = L + [(N/2 − cf) / f] × h

3. Mode

Value which occurs most frequently; point of highest concentration

**Formula:* > Mode = L + h × (f1 − f0) / [(f1 − f0) − (f2 − f1)]

**Relation:* > Mode = Mean − 3(Mean − Median) > Mode = 3 Median − 2 Mean

4. Geometric Mean (GM)

nth root of the product of n observations

**Formula:* > GM = (x1 × x2 × x3 × … × xn)^(1/n) > GM = Antilog [1/n × Σ log x]

Merits

Rigidly defined
Based on all observations
Suitable for further mathematical treatment
Has bias for smaller observations (unlike AM)

Demerits

Not easy to calculate
If any observation = 0 → GM = 0
If any observation is negative → GM becomes imaginary

Uses

Useful for averaging ratios, percentages and rates of increase
Used in construction of Index Numbers
Useful for river data analysis

5. Harmonic Mean (HM)

Reciprocal of the arithmetic mean of reciprocals of the given observations

**Formula:* > HM = 1 / [1/n × Σ(1/x)] = 2ab / (a + b)

Merits

Rigidly defined
Based on all observations
Gives greater weight to smaller observations
Not affected by sampling fluctuations

Demerits

Not easy to understand or calculate
Cannot be obtained if any observation = 0
May not be representative unless smaller items need higher weightage

Uses

Specifically useful for averaging rates and ratios where time factor is variable and **distance is constant*

Relation Among AM, GM, HM

AM ≥ GM ≥ HM (always)

Measure	Two terms (a, b)	Three terms (a, b, c)
AM	(a+b)/2	(a+b+c)/3
GM	√(ab)	∛(abc)
HM	2ab/(a+b)	3abc/(ab+bc+ca)

Measures of Dispersion — Detailed (NET Notes)

Dispersion = measures of variation of items in a distribution

Characteristics of a Good Measure of Dispersion

Rigidly defined
Easy to calculate and understand
Based on all observations
Amenable to further mathematical treatment
Not much affected by sampling fluctuations
Not much affected by extreme observations

Types

Range
Quartile Deviation (Semi-Inter-Quartile Range)
Mean Deviation
Standard Deviation
Lorenz Curve

1. Range

R = X(max) − X(min)

Coefficient of Range = [X(max) − X(min)] / [X(max) + X(min)]

Merits

Based on entire data

Demerits

Varies widely from sample to sample
Not used for open-end classes
Very sensitive to sample size

Uses

Stock market functions
Industrial statistical quality control
Day-to-day life measurements
Meteorological department

2. Quartile Deviation (QD)

Inter-Quartile Range = Q3 − Q1

QD = (Q3 − Q1) / 2

Coefficient of QD = (Q3 − Q1) / (Q3 + Q1)

Merits

Easy to understand and calculate
Uses 50% of data
Not affected by extreme observations
Only measure of dispersion that can be used with **open-end classes*

Demerits

Ignores starting 25% and ending 25% of data
Affected by sampling fluctuations
Not suitable for further mathematical treatment

3. Mean Deviation (MD)

MD = (1/N) × Σ|x − A|

Coefficient of MD = MD / Average (about which it is calculated)

Merits

Based on all observations
More accurate than Range or QD

Demerits

Ignores signs of deviations (mathematically unsound)
Not satisfactory for skewed distributions
Rarely used in social studies
Cannot be computed for open-end classes

Uses

Used in economics and statistics for simplicity
Useful for computing distribution of personal wealth
National Bureau of Economic Research (USA) uses it

4. Standard Deviation (σ)

First suggested by **Karl Pearson (1893)*

Positive square root of the arithmetic mean of squared deviations from the arithmetic mean

σ = √[1/N × Σ(x − x̄)²]

C.V. = 100 × σ / x̄

Merits

Most important and widely used measure of dispersion
Based on all observations
Removes drawback of ignoring signs (uses squared deviations)
Suitable for further mathematical treatment
Least affected by sampling fluctuations

Demerits

Not easy to understand for non-mathematical persons
Gives greater importance to extreme values

Interpretation Methods

Empirical rule — for bell-shaped (normal) distribution
Chebyshev’s Theorem
Z-score / standard scores

Coefficient of Variation (CV)

CV = (σ / x̄) × 100

Relative measure of dispersion
“Coefficient of variation is the percentage variation in mean, standard deviation being considered as the total variation in the mean” — Karl Pearson
Used to compare variability between two different groups

5. Lorenz Curve

Graphic method of studying **dispersion in a distribution*
Both size of items (values of variable) and frequencies are cumulated
Provides relative idea of dispersion compared with the line of equal distribution
Cannot immediately show what % of persons corresponds to a given % of items

Gini Coefficient

Based on the coefficient of mean difference
**G = Δ1 / 2X̄*
Varies from **0 to 1*
G = 0 → perfect equality
G increases with increasing inequality
Gini = ratio of area of concentration to total area of lower triangle below line of equal distribution

Normal Distribution — Detailed (NET Notes)

Also known as **Bell Curve*

In normal distribution: **Mean = Median = Mode*

Properties

Symmetric about the mean
Bell-shaped
Total area under curve = 1
68% data within ±1σ
95% data within ±2σ
99.7% data within ±3σ (Empirical Rule)

Correlation and Regression Analysis

📘 Syllabus Coverage

Syllabus	Topic Details
NEP-2020	Applied in geographical data analysis
UGC NET	Correlation and Regression Analysis

Get the Presentation ↗ | Watch the Video ↗

Key Concepts

Correlation: Measures the strength and direction of the linear relationship between two variables (x and y).
- Pearson’s Product Moment (r): For interval/ratio data. Ranges from -1 to +1.
- Spearman’s Rank Correlation (Rho): For ordinal (ranked) data.
Regression: Predictive modeling. Investigates the dependence of a dependent variable (Y) on one or more independent variables (X).
- Linear Regression: \(Y = a + bX\). ‘a’ is intercept, ‘b’ is slope (regression coefficient).
- Residuals: The difference between observed and predicted values. Mapping residuals (Spatial Regression) reveals spatial anomalies.

Correlation Analysis — Detailed (NET Notes — Pulakesh Pradhan)

Karl Pearson’s Correlation Coefficient (r)

Method for measuring intensity or magnitude of linear relationship between two variables

Also called **Product Moment Correlation Coefficient*

**Formula:* > r = Σ(x − x̄)(y − ȳ) / √[Σ(x − x̄)² × Σ(y − ȳ)²]

Value of r	Interpretation
r = +1	Perfect positive correlation
r = −1	Perfect negative correlation
r = 0	No linear correlation
0 < r < 1	Positive correlation
−1 < r < 0	Negative correlation

Spearman’s Rank Correlation

Used when data is in ranks (ordinal scale)

rs = 1 − [6ΣD² / n(n² − 1)]

Regression Analysis — Detailed (NET Notes)

*“Regression is stepping back or returning to the average value”

Mathematical measure of average relationship between two or more variables in terms of original units of data

Simple Linear Regression

Y = a + bX Y = dependent variable; X = independent variable b = regression coefficient (slope) a = intercept

Types

Simple regression — one dependent, one independent variable
Multiple regression — one dependent, multiple independent variables

Map Projections and Scales

📘 Syllabus Coverage

Syllabus	Topic Details
NEP-2020	Types of Scales, Map Projection, Transformation
UGC NET	Techniques of Map Making

Get the Presentation ↗ | Watch the Video ↗

Key Concepts

Map Scale: Ratio between map distance and ground distance. Types: Representative Fraction (RF), Statement, Graphical/Linear, Diagonal.
Map Projection: Mathematical transformation of the 3D Earth surface to a 2D plane. Always involves distortion.
Preserved Properties:
- Conformal (Orthomorphic): Preserves exact local angles/shapes (e.g., Mercator). Used for navigation.
- Equal-Area (Equivalent): Preserves area proportions (e.g., Mollweide, Peters). Used for thematic density distributions.
- Equidistant: Preserves correct distance from a center point.
- Azimuthal: Preserves correct direction from a center point.
Developable Surfaces: Cylinder, Cone, Plane (Zenithal).

Thematic Mapping Techniques

📘 Syllabus Coverage

Syllabus	Topic Details
NEP-2020	Drawing of Choropleth, Isopleth, Traffic flow diagrams
UGC NET	Data Representation: Choropleth, Isarithmic, Dasymetric, Chorochromatic, Flow Maps

Get the Presentation ↗ | Watch the Video ↗

Key Concepts

Choropleth Map: Uses shading/colors within predefined administrative boundaries (states, districts) to represent derived data (densities, rates, percentages). Not suitable for absolute numbers.
Isopleth (Isarithmic) Map: Lines connecting points of equal value (e.g., isotherms, contours, isobars). Assumes continuous data mapped over an isotropic surface. It is the best method to represent the Rate of change (when data is given in ratios or percentages).
Chorochromatic Map: Color-patch map showing qualitative distribution without numerical value (e.g., soil types, land use zones).
Dasymetric Map: Advanced choropleth that ignores administrative boundaries in favor of actual geographical boundaries of the phenomenon (e.g., mapping population density excluding lakes and forests).
Flow Map: Lines of varying thickness show direction and volume of movement (trade, traffic, migration).
Diagrammatic Maps: Pie diagrams, bar graphs superimposed on maps to show multiple variables.

Spatial Statistics and Morphometry

📘 Syllabus Coverage

Syllabus	Topic Details
NEP-2020	Nearest Neighbour, Slope Analysis (Wentworth), Agricultural indices
UGC NET	Morphometric Analysis, Measurement of Indices

Get the Presentation ↗ | Watch the Video ↗

Key Concepts

Nearest Neighbour Analysis (NNA): Measures spatial arrangement of points. Nearest Neighbour Index (R). R=0 (clustered), R=1 (random), R=2.15 (perfectly uniform/regular).
Slope Analysis: Wentworth’s method (based on contour crossings per grid cell) and Smith’s method (relative relief combined with drainage density).
Drainage Morphometry:
- Stream Ordering: Strahler (only same orders combine to increase) vs. Horton.
- Bifurcation Ratio: Ratio of number of streams of a given order to the number of next higher order.
- Drainage Density: Total stream length / basin area.
Indices: Agricultural efficiency (Kendall’s ranking, Bhatia’s weighted output). Handling data of different scales requires normalization (Z-scores) to create composite indices.

Nearest Neighbour Analysis — Detailed (NET Notes — Pulakesh Pradhan)

Measures the pattern of point distribution — whether clustered, random or regular

**Formula:* > Rn = 2D̄ × √(n/A) > D̄ = mean nearest neighbour distance > n = number of points > A = total area

Rn Value	Pattern
Rn = 0	Maximum clustering (all points coincide)
Rn = 1	Random distribution
Rn = 2.149	Maximum dispersion / perfect hexagonal

Part B: NEP-2020 Specific Topics

These topics are part of the NEP-2020 undergraduate programme only.

Agricultural Delineation Methods

📘 Syllabus Coverage

Syllabus	Topic Details
NEP-2020	Delineation of crop combination regions (Weaver and Doi’s method)

Get the Presentation ↗ | Watch the Video ↗

Key Concepts

Crop Combination Analysis: Identifying the dominant crop assemblages in a region to understand agricultural regionalization.
Weaver’s Method (1954): Calculates variance (\(\Sigma d^2 / n\)) between actual crop percentages and theoretical distributions (monoculture = 100%, 2-crop = 50% each, etc.). The combination with the lowest variance is chosen.
Doi’s Modification (1959): Simplified Weaver’s method by using \(\Sigma d^2\) without dividing by ‘n’ and providing a ready-to-use critical value table, making manual calculation much faster while yielding similar results.

Part C: UGC NET Specific Topics

These topics are part of the UGC NET syllabus only.

Sampling and Hypothesis Testing

📘 Syllabus Coverage

Syllabus	Topic Details
UGC NET	Sampling, Sampling Procedure and Hypothesis Testing (chi square, t-test, ANOVA)

Get the Presentation ↗ | Watch the Video ↗

Key Concepts

Sampling: Selecting a subset from a population. Probability (Random, Stratified, Systematic, Cluster) vs. Non-probability (Purposive, Snowball, Quota).
Hypothesis Testing: Formulating Null (\(H_0\)) and Alternative (\(H_1\)) hypotheses. Type I error (reject true \(H_0\)) and Type II error (accept false \(H_0\)). Set significance level (\(lpha\), usually 0.05).
Standard Error (SE): Considered as the measure of the sampling error. It indicates how much the sample mean is likely to differ from the actual population mean.
Parametric Tests (assume normal distribution):
- t-test: Compares means of two groups.
- ANOVA (Analysis of Variance): Compares means of three or more groups.
Non-Parametric Tests (distribution-free):
- Chi-Square (\(\chi^2\)) Test: Tests association between categorical variables (observed vs. expected frequencies). A value of 0 indicates that the Null hypothesis is accepted.

Statistical Tests — Detailed (NET Notes — Pulakesh Pradhan)

Parametric Tests — Assumptions

Based on a known distribution (usually normal)
Population parameters are known or estimated
Data measured on interval or ratio scale

1. T-Test (Student’s t-Test)

**Developed by William Gosset (1908)*
Also called Student’s t-test or **Welch’s t-test*
Small sample test — when sample size < 30
Degree of freedom: v = n − 1

**Formula:* > t = Deviation from population parameter / Standard error of sample statistic

Uses

Test of significance of regression coefficient
In multiple regression with individual variables
When population variance is **unknown*
When population parameters follow normal distribution

2. Z-Test

**Given by Fisher*
Large sample test — sample size n > 30
Used when population variance is **known*
Based on standard normal distribution
Critical value for 5% significance (two-tailed): **1.96*

Uses

To determine whether two population means are different
When correlation coefficient of population is zero → use Z-test
Types: (1) One sample (2) Two sample (3) Location test (4) Maximum likelihood test

3. F-Test (Variance Ratio Test)

**Given by Fisher*
Used to compare two independent estimates of population variance
Small sample test
Null hypothesis: H₀: S₁² = S₂²

**Formula:* > F = Larger estimate of variance / Smaller estimate of variance

Degrees of Freedom

v1 = n1 − 1 and v2 = n2 − 1

Key Points

F-test value is **never negative*
Used for testing overall significance in regression models
Value lies from **zero to infinity*

4. Chi-Square Test — Detailed

**Introduced by Karl Pearson*
**Non-parametric test*
Chi-square values lie between **0 and infinity*

Conditions for Use

Total frequency and sample size must be large
Samples must be independent
Expected frequency should not be small — if < 5, use **pooling technique*

5. ANOVA — Detailed Assumptions

Independence of samples
Normal population
Same population variance
Based on qualitative data

Summary Table of Tests

Test	Type	Sample Size	Key Use
T-test	Parametric	n < 30	Means; unknown variance
Z-test	Parametric	n > 30	Means; known variance
F-test	Parametric	Small	Variance comparison
Chi-square	Non-parametric	Large	Goodness of fit
ANOVA	Parametric	—	Multiple group comparison

Advanced Multivariate Analysis

📘 Syllabus Coverage

Syllabus	Topic Details
UGC NET	Principal Component Analysis and Cluster Analysis

Get the Presentation ↗ | Watch the Video ↗

Key Concepts

Principal Component Analysis (PCA): A statistical procedure that uses an orthogonal transformation to convert a set of correlated variables to a set of uncorrelated variables. PC1 accounts for the most variance. Used to create composite indices (e.g., developmental index from 15 variables).
ANOVA (Analysis of Variance): Preferred over the ‘t’ test where comparison between two or more population means are involved. This is because the probability of committing Type I error accumulates when the number of comparisons in the ‘t’ test increases.
Cluster Analysis: Grouping a set of objects such that objects in the same group (cluster) are more similar to each other than to those in other groups. Used in regionalization (e.g., grouping districts based on socio-economic similarity).
Time Series Analysis: Analyzing sequence of data points over time to extract meaningful statistics. Components: Trend (long term), Cyclical, Seasonal, Irregular/Random variations. Used for climate data or population forecasting.

Factor Analysis — Detailed (NET Notes — Pulakesh Pradhan)

A process in which values of observed data are expressed as functions of a number of possible causes to find the most important ones

Reduces large number of variables to fewer underlying factors
Used to identify spatial patterns of development, soil properties, urban structure, etc.

Quick Reference

Statistical Geography Quick Reference

Key Books and Authors

Book	Author
Statistical Geography	David Gregory
Quantitative Geography	J.P. Cole & C.A.M. King
Statistical Methods and the Geographer	S. Gregory

Key Tests and Techniques

Technique / Test	Application
Chi-Square (\(\chi^2\)) Test	Tests the goodness of fit or independence between categorical variables.
Student’s t-test	Compares means of two groups.
ANOVA (F-test)	Compares means of three or more groups.
Correlation (\(r\))	Measures strength and direction of linear relationship (Pearson/Spearman).
Regression Analysis	Predicts the value of a dependent variable based on independent variables.
Standard Distance	Spatial equivalent of standard deviation.
Lorenz Curve	Graphical representation of inequality (e.g., land ownership).
Gini Coefficient	Numerical measure of inequality derived from Lorenz Curve.

Descriptive Statistics

Mean: Average value.
Median: Middle value in a sorted list.
Mode: Most frequent value.
Standard Deviation: Measure of data dispersion around the mean.
Skewness: Measure of asymmetry in a distribution.
Kurtosis: Measure of the “peakedness” of a distribution.

Notes compiled by Geography Team