Lädt...


🔧 Outlier Detection in Election Data Using Geospatial Analysis - AKWA IBOM


Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to

Introduction

The aim of this project is to uncover potential election irregularities to enable the electoral commission to ensure transparency of election results. In this project , I will identify outlier polling units where the voting results deviate significantly from neighbouring units.

Data Understanding

The dataset used in this analysis, represents polling units in the state of Akwa Ibom only.The data used can be found here. I conducted this analysis in Python as follows

from google.colab import drive, files
drive.mount('/content/drive')
#Import Libraries
import pandas as pd
from geopy.geocoders import OpenCage
#path = '/content/drive/MyDrive/Colab Notebooks/Nigeria_Elections/'
data = pd.read_csv(path + "AKWA_IBOM_crosschecked.csv")

Here is a summary about columns in the data set

  1. State: The name of the Nigerian state where the election took place (e.g., “AKWA IBOM”).
  2. LGA (Local Government Area): The specific local government area within the state (e.g., “ABAK”).
  3. Ward: The electoral ward within the local government area (e.g., “ABAK URBAN 1”).
  4. PU-Code (Polling Unit Code): A unique identifier for the polling unit (e.g., “3/1/2001 0:00”).
  5. PU-Name (Polling Unit Name): The name or location of the polling unit (e.g., “VILLAGE SQUARE, IKOT AKWA EBOM” or “PRY SCH, IKOT OKU UBARA”).
  6. Accredited Voters: The number of voters accredited to participate in the election at that polling unit.
  7. Registered Voters: The total number of registered voters in that polling unit.
  8. Results Found: Indicates whether results were found for this polling unit (usually TRUE or FALSE).
  9. Transcription Count: The count of how many times the results were transcribed (may be -1 if not applicable).
  10. Result Sheet Stamped: Indicates whether the result sheet was stamped (TRUE or FALSE).
  11. Result Sheet Corrected: Indicates whether any corrections were made to the result sheet (TRUE or FALSE).
  12. Result Sheet Invalid: Indicates whether the result sheet was deemed invalid (TRUE or FALSE).
  13. Result Sheet Unclear: Indicates whether the result sheet was unclear (TRUE or FALSE).
  14. Result Sheet Unsigned: Indicates whether the result sheet was unsigned (TRUE or FALSE).
  15. APC: The number of votes received by the All Progressives Congress (APC) party.
  16. LP: The number of votes received by the Labour Party (LP).
  17. PDP: The number of votes received by the People’s Democratic Party (PDP).
  18. NNPP: The number of votes received by the New Nigeria People’s Party (NNPP).

I then created the Address column by concatenating the Polling unit Name, Ward, the Local government Area and State, which will be useful during geocoding:

data['Address'] = data['PU-Name'] + ',' + data['Ward'] + ',' + data['LGA'] + ',' + data['State']

To obtain the Latitude and Longitude columns, I utilized geospatial encoding techiniques.
I generated an API key on OpenCage Geocoding API, and defined a function geocode_address to geocode our new Address column to obtain the Latitude and Longitude columns

def geocode_address(Address):
  try:
    location = geolocator.geocode(Address)
    return location.latitude, location.longitude
  except:
    return None, None

data[['Latitude', 'Longitude']] = data['Address'].apply(lambda x: pd.Series(geocode_address(x)))

A quick at our dataset:

Image description

Looks like our function works and I was able to obtain the Latitude and Longitude column.
As there are still null values in these 2 columns, I will Impute them using the Simple Imputer, which will replace the missing values with the mean.

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy = 'mean')
data[['Latitude', 'Longitude']] = imputer.fit_transform(data[['Latitude', 'Longitude']])
data.to_csv('AKWA_IBOM_geocode.csv', index = False)

Identifying Neighbours

I defined a radius of 1 km to identify which polling units are considered neighbours

#Calculate distance and find neighbours
from geopy.distance import geodesic
neighbours= {}
def neighbouring_pu(data, radius = 1.0):
  for i, row in data.iterrows():
    neighbours[i] = []
    for j, row2 in data.iterrows():
      if i != j:
        distance = geodesic((row['Latitude'],row['Longitude']), (row2['Latitude'],row2['Longitude'])).km
        if distance <= radius:
          neighbours[i].append(j)
  return neighbours

neighbours = neighbouring_pu(data, radius =1.0)

Outlier Calculation - Score
I will define a function, get_outlier_scores, that calculates the outlier scores for voting data in this dataset. It does so by comparing the votes each row received for various parties (APC, LP, PDP, NNPP) to the average votes received by its neighboring rows, which are specified in a dictionary, neighbours.
For each row, the function computes the absolute difference between the votes in that row and the average votes of its neighbors for each party, and stores these differences as outlier scores. Finally, it returns a new DataFrame that combines the original voting data with the calculated outlier scores. This allows for the identification of rows with voting patterns that significantly differ from their neighbors.

def get_outlier_scores(data, neighbours):
  outlier_scores = []
  parties = ['APC', 'LP', 'PDP', 'NNPP']
  for i, row in data.iterrows():
    scores = {}
    for party in parties:
      votes = row[party]
      neighbour_votes = data.loc[neighbours[i], party].mean() if neighbours[i] else 0
      scores[party + '_outlier_score'] = abs(votes - neighbour_votes)
    outlier_scores.append(scores)
    outlier_scores_data = pd.DataFrame(outlier_scores)
  return pd.concat([data, outlier_scores_data], axis = 1)

outlier_scores_df = get_outlier_scores(data, neighbours)

Sorting and Reporting
I sorted the data by the outlier scores for each party and obtained the following detailed report that includes the top five outliers for each party, with the 'PU-Code', number of votes, and the outlier score.

: All Progressives Congress (APC) party

PU-Code APC APC_outlier_score
03-05-11-009 324 228.52
03-29-05-013 194 167.334
03-30-07-001 180 153.325
03-05-09-014 194 152.149
03-28-05-003 180 138.132

: Labour Party (LP)

PU-Code LP LP_outlier_score
03-05-11-009 59 45.451
03-29-05-013 42 6.65894
03-30-07-001 29 6.34942
03-05-09-014 3 26.5831
03-28-05-003 91 61.5261

: People’s Democratic Party (PDP)

PU-Code PDP PDP_outlier_score
03-05-11-009 7 27.3627
03-29-05-013 181 145.232
03-30-07-001 17 18.8739
03-05-09-014 36 24.2221
03-28-05-003 12 48.2519

: New Nigeria People’s Party - NNPP

PU-Code NNPP NNPP_outlier_score
03-05-11-009 0 0.27451
03-29-05-013 6 4.14865
03-30-07-001 0 1.85521
03-05-09-014 0 2.36104
03-28-05-003 0 2.36104

Visualize the neighbours

Generate scatterplots to visualize the geographical distribution of polling units based on their outlier scores for four political parties (APC, LP, PDP, NNPP).
Each point represents a polling unit plotted by its latitude and longitude.
Each plot provides a clear visual representation of how the outlier scores are geographically distributed, making it easier to identify patterns or anomalies in the data.

import matplotlib.pyplot as plt
import seaborn as sns

parties = ['APC', 'LP', 'PDP', 'NNPP']
for party in parties:
  plt.figure(figsize=(10, 6))
  sns.scatterplot(data=outlier_scores_df, x='Latitude', y='Longitude', hue=party + '_outlier_score', palette='viridis')
  plt.title(f'Polling Units by {party} Outlier Score')
  plt.xlabel('Latitude')
  plt.ylabel('Longitude')
  plt.legend(title=party + ' Outlier Score')
  plt.savefig(f'polling_units_{party}_outlier_score.png')
  plt.show()

Image description

Image description

Image description

Image description

Deliverables

  1. Find the full Notebook here
  2. Full Report - Top five outliers for each party.
  3. File with Latitude and Longitude - CSV
  4. File with sorted polling units by outlier scores - CSV
...

📰 Counts Outlier Detector: Interpretable Outlier Detection


📈 56.39 Punkte
🔧 AI Nachrichten

📰 Interpretable Outlier Detection: Frequent Patterns Outlier Factor (FPOF)


📈 56.39 Punkte
🔧 AI Nachrichten

🕵️ Cacti up to 1.1.15 spikekill.php avgnan/outlier-start/outlier-end privilege escalation


📈 47.77 Punkte
🕵️ Sicherheitslücken

🕵️ Cacti bis 1.1.15 spikekill.php avgnan/outlier-start/outlier-end erweiterte Rechte


📈 47.77 Punkte
🕵️ Sicherheitslücken

📰 Outlier Detection Using Principal Component Analysis and Hotelling’s T2 and SPE/DmodX Methods


📈 45.27 Punkte
🔧 AI Nachrichten

📰 Outlier Detection Using Distribution Fitting in Univariate Datasets


📈 37.16 Punkte
🔧 AI Nachrichten

📰 Spotting the Exception: Classical Methods for Outlier Detection in Data Science


📈 35.74 Punkte
🔧 AI Nachrichten

📰 Distance Metric Learning for Outlier Detection


📈 32.51 Punkte
🔧 AI Nachrichten

📰 3 Simple Statistical Methods for Outlier Detection


📈 32.51 Punkte
🔧 AI Nachrichten

📰 Multilingual RAG, Algorithmic Thinking, Outlier Detection, and Other Problem-Solving Highlights


📈 32.51 Punkte
🔧 AI Nachrichten

📰 Comparing Outlier Detection Methods


📈 32.51 Punkte
🔧 AI Nachrichten

📰 Outlier Detection with Scikit-Learn and Matplotlib: a Practical Guide


📈 32.51 Punkte
🔧 AI Nachrichten

📰 This Paper Explains the Impact of Dimensionality Reduction on Outlier Detection


📈 32.51 Punkte
🔧 AI Nachrichten

📰 Top 5 Geospatial Data APIs for Advanced Analysis


📈 30.31 Punkte
🔧 AI Nachrichten

🔧 Geospatial Data Analysis in SQL


📈 30.31 Punkte
🔧 Programmierung

📰 The Power of Geospatial Intelligence and Similarity Analysis for Data Mapping


📈 30.31 Punkte
🔧 AI Nachrichten

🍏 GRASS GIS 8.2.1 - Geospatial data management, visualization and analysis.


📈 30.31 Punkte
🍏 iOS / Mac OS

📰 Time Series Analysis of Geospatial Data


📈 30.31 Punkte
🔧 AI Nachrichten

📰 Cobwebs’ geospatial data and spatial analysis enable orgs to get location intelligence


📈 30.31 Punkte
📰 IT Security Nachrichten

📰 SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets


📈 27.59 Punkte
🔧 AI Nachrichten

🔧 Outlier Identification in Continuous Data Streams With Z-Score and Modified Z-Score in a Moving Window


📈 27.12 Punkte
🔧 Programmierung

📰 Artificial Intelligence for Geospatial Analysis with Pytorch’s TorchGeo (Part 3)


📈 27.08 Punkte
🔧 AI Nachrichten

📰 Artificial Intelligence for Geospatial Analysis with Pytorch’s TorchGeo (part 2)


📈 27.08 Punkte
🔧 AI Nachrichten

📰 Artificial Intelligence for Geospatial Analysis with Pytorch’s TorchGeo (Part 1)


📈 27.08 Punkte
🔧 AI Nachrichten

📰 Use mobility data to derive insights using Amazon SageMaker geospatial capabilities


📈 26.87 Punkte
🔧 AI Nachrichten

📰 The Municipal Bond Market Is Using Geospatial Data For Climate Risk Evaluation


📈 26.87 Punkte
📰 IT Security Nachrichten

🔧 Podcast: Geospatial Data, Data Science and More!


📈 25.44 Punkte
🔧 Programmierung

📰 Doping: A Technique to Test Outlier Detectors


📈 23.89 Punkte
🔧 AI Nachrichten

📰 Enhancing Neural Network Generalization with Outlier Suppression Loss


📈 23.89 Punkte
🔧 AI Nachrichten

📰 Creative Outlier Air V2 im Test: Mehr Akkulaufzeit, Touch und Super X-Fi für 70 Euro


📈 23.89 Punkte
📰 IT Nachrichten

matomo