Exploratory Data Analysis

Understanding data from the Election Commission of India

The 2004 Lok Sabha Elections

Indian Flag
The Lok Sabha is the lower house of Parliament of India (The Rajya Sabah is the upper house). Candidates are elected directly. The house is composed of proportionate numbers of representatives from each territory and state based on population. The Lok Sabha website states that "the total elective membership is distributed among the States in such a way that the ratio between the number of seats allotted to each State and the population of the State is, so far as practicable, the same for all States". This site also makes use of State Assembly Election data and turnout data in order to better understand how women have faired in past elections.
fig. 1
fig. 2
fig. 3
fig. 4

Background - 2004 General Election Turnout Data

Finding shapefiles for India is rough task. In the future, I plan on geo referencing a map of India's state boundaries in order to refine these maps. The shapefile used in this exercise can be found here, but as you can see, it is not ideal. The first figure shows us just how badly state boundaries have been rendered in our shapefile.

The maps below take election turnout data for 2004 General Elections into account. The the second figure attempts to illustrate the total number of female electors per state. The second shows female turnout percentage given to us by state. The last in an attempt to calculate percentage of total Female Electors.

R Code

#==================================================
# Mapping Female Electors: 2004 General Elections
#
# by Clint Newsom
# 4/20/2010
# hncewsom@gmail.com
#==================================================

# load ggplot2

library(ggplot2)

# load maptools

library(maptools)

# read in our shapefile containing state boundaries.

india ← readShapeSpatial("desktop/si618/indian_elections/regex_layers/shps/india_st.shp")

# read in electors data

electors ← read.delim("desktop/si618/data/nohead.txt", sep="\t", header=FALSE, col.names=c("STATE_CODE", "STATE", "PC_NO", "PC_NAME", "PCElectorsMale", "PCElectorsFemale", "ElectorsTotal", "VoterTurnoutMale.age", "VoterTurnoutFemale.age", "VoterTurnoutPC"))

head(electors)

# check all of our data.

head(india)

# create a new data.frame to manage x,y positions.
# Thanks again to Josh Steverman for doing some intense Googling to
# discover this method. Following code follows his tutorial.

lp ← data.frame(do.call(rbind, lapply(india@polygons, function(x) x@labpt)))

names(lp) ← c("x","y")

india@data ← data.frame(india@data, lp)

head(india@data)

# This a source file provided by Hadley Wickham.

source("desktop/si618/exercises/wk6/fortify.r")

ind ← fortify(india)

# merge our two files together

states ← merge(ind, electors, by.x="id", by.y="STATE")

head(states)

# start mapping!

p ← ggplot(states)

# A quick test to see where we stand with our maps.

p + aes(long, lat, width=".01", group=group, fill=c(states$STATES, states$PCElectorsFemale)) + geom_polygon() + coord_map(projection="lagrange")

# Not incredibly satisfying. We obviously need a better shapefile. GIS data for India is surprisingly hard to come by. Fig. 1

p + aes(long, lat, group=group, width=".01", color="black", fill=states$PCElectorsFemale) + geom_polygon() + coord_map(project="lagrange") + opts(title="Number of Female Electors Per State: 2004 Indian General Elections")

ggsave("desktop/si618/indian_elections/regex_layers/shps/images/first.png")

# Fig. 2.

p + aes(long, lat, group=group, width=".01", color="black", fill=states$VoterTurnoutFemale.age) + geom_polygon() + coord_map(project="lagrange") + opts(title="Percent Female Voter Turnout: 2004 Indian General Elections")

ggsave("desktop/si618/indian_elections/regex_layers/shps/images/second.png")

# Fig. 3 try calculating the percentage of total Female Electors manually.

states$percentFemOfTot ← states$PCElectorsFemale / states$ElectorsTotal *100

p + aes(long, lat, group=group, width=".01", color="black", fill=states$percentFemOfTot) + geom_polygon() + coord_map(project="lagrange") + opts(title="Percent Female Electors: 2004 Indian General Electinos")

ggsave("desktop/si618/indian_elections/regex_layers/shps/images/third.png")