๐งพ End-to-End Audit Analytics Project
This project simulates the role of a Auditor Data Analyst and walks through a complete end-to-end audit analytics workflow using synthetic financial data.
We identify fraud, policy violations, and financial anomalies using Python, SQL, and Excel, and generate a full audit report/dashboard in Excel for stakeholders.
๐ Project Overview
This project covers:
- General Ledger Analysis
- Vendor Payment Analysis
- Employee Expense Reimbursement Analysis
- Audit Dashboard & Reporting in Excel (INCOMPLETE)
Each section uses synthetic data generated using the Faker
library and applies common audit checks performed during external or internal audits.
๐ ๏ธ Tools & Technologies
Tool | Purpose |
---|---|
Python (Pandas, Faker) | Data generation, cleaning, audit logic |
SQLite (via pandas/sqlite3) | Run SQL queries and filters |
Excel | Final dashboard, pivot tables, formatting |
OpenPyXL | Export DataFrames to multi-tab Excel |
Jupyter Notebook | Interactive data exploration/reporting |
โ Audit Checks Performed
๐ General Ledger
- Round-value entries (estimates/manual)
- Weekend entries (outside normal ops)
- Suspicious descriptions (misc, test, etc.)
- High-value journal entries
- Unusual accounts usage
๐ธ Vendor Payments
- Duplicate payments (vendor + date + amount)
- Large one-off payments (potential fraud)
- Payments made outside working days
- Top vendors by spend
- Descriptive anomalies
๐งพ Employee Expenses
- Duplicate claims (employee + date + amount)
- Policy violations (meals > 100, etc.)
- Weekend or late-night claims
- Round amounts (fabrication risks)
- Vague descriptions
- Frequent small claims (threshold abuse)
- Top spenders by amount
๐ Deliverables
- โ Cleaned datasets (Ledger, Payments, Expenses)
- โ Flagged risk entries with Boolean indicators
- โ Summary Dashboard (Excel) with (INCOMPLETE):
- Key metric tables
- Pivot tables
- Charts (Top vendors, category distribution)
- Conditional formatting to highlight red flags
๐ Use Cases
- โ Resume & Portfolio project for Data Analyst / Audit Analyst roles
- โ Client work showcase for data services agencies
- โ Practice case for audit interview prep
- โ Internal training tool for audit/data teams
# Import Libraries
import pandas as pd
# Importing Datasets
# General Ledger dataset
gl_df = pd.read_csv('../data/general_ledger.csv', parse_dates=["date"])
# Vendors Dataset
vendors_df = pd.read_csv('../data/vendors.csv')
# Payments Dataset
payments_df = pd.read_csv('../data/vendor_payments.csv', parse_dates=["payment_date"])
# Expenses Dataset
expenses_df = pd.read_csv('../data/employee_expenses.csv', parse_dates=["date"])
Weekend/Holidays Postings
# Extracting Weekdays (0= Monday)
gl_df['weekday'] = gl_df['date'].dt.weekday
weekend_postings = gl_df[gl_df['weekday'] >= 5]
f"{len(weekend_postings)} entries were posted on weekends"
'290 entries were posted on weekends'
weekend_postings[['date', 'time', 'user', 'account', 'amount']].head()
Missing/Vague Descriptions
# Define Vague Keywords
vague_keywords = ['misc', 'adjustment', 'test', 'unknown']
# Filter entries with vague keywords or missing descriptions
gl_df['desc_clean'] = gl_df['description'].fillna('').str.lower()
vague_entries = gl_df[
gl_df['desc_clean'].str.strip().eq('') | # description missing
gl_df['desc_clean'].str.contains('|'.join(vague_keywords)) # description including vague_keywords
]
f"{len(vague_entries)} entries have vague or missing descriptions"
'2 entries have vague or missing descriptions'