Contact Information:
Location: Reading, UK
LinkedIn: linkedin.com/in/perusha-moodley
Website: www.perusha.dev
Highly experienced developer, technical team lead and AI researcher with over 18 years in software development and consulting. PhD in Deep Reinforcement Learning, with strong foundations in AI model alignment, evaluation and technical governance. Enthusiastic lifelong learner, adept at analysing and coding research papers, adopting and adapting new technologies. Proven track record leading projects and developing innovative solutions in high profile and regulated production systems. Seeking an impactful role driving responsible AI adoption.
Software Engineering: Python, PyTorch, Inspect, API development, data engineering, Java, C++, SQL, enterprise web services and message queues.
LLMs: fine-tuning (using PEFT/QLoRA) SFT & DPO in HuggingFace ecosystem; synthetic dataset generation; prompt engineering, agentic scaffolding, evaluations, LangChain
Reinforcement Learning (RL/DRL): SB3, Gym/Gymnasium, Ray, RLLib, CleanRL, experiment tracking in Wandb. Algorithms DQN, PPO, A3C, multimodal Decision Transformers.
Soft skills: Leadership & team management (Agile), consulting & client engagement
Role: Founder & Lead Consultant
Duration: 2006 - Present
Managing a small consultancy (<10) specialising in MLOps, AI research, and software development. We consult on ML projects, architect integration pipelines and develop applications for clients across sectors, with significant experience in the QA critical pharma and energy sectors.
Designed and developed ML algorithms spanning supervised, unsupervised and RL; experienced with training and customisation of multimodal transformer models and agent-based development (LangChain).
LLM fine-tuning (using PEFT/QLoRA) SFT and DPO/RLHF preference training.
Developing skills in AI safety and alignment including synthetic dataset generation, benchmark design, model evaluations (capabilities and alignment) and AI control protocols to support technical governance processes.
Steering committee member for the Thames Valley AI Hub (TVAI), an initiative supporting collaboration between academia and industry; driving the skills events workgroup.
AISF Certification: Blue Dot AI Safety Alignment course.
Duration: 2006 - 2017
Designed and developed integration frameworks, complex business process engines, web services and logging & monitoring tools in enterprise environments for automated warehousing/CRM systems. Deployment of code from unit, functional and regression testing to highly regulated (FDA) production systems in the pharma industry.
Technical team lead for successful EU-wide project delivery
Business analyst for supply chain, warehousing, and financial processes.
Developed backend applications using C++/Java (Weblogic servers), SQL databases (Oracle), web services and enterprise messaging tools (Message Queues).
Duration: 2010 - 2015
Co-designed a product for automating payments using OCR integrated with ERP systems (Web services/Weblogic server).
Duration: 2005 - 2006
Development Lead responsible for global development standards and support.
Duration: 2001 - 2005
Developer and functional analyst for finance, supply chain procurement and planning during EU rollout.
Duration: 2000 - 2001
Provided consulting on business systems.
Duration: 1998 - 2000
Provided consulting on business systems, leveraging expertise as a developer and whitehat hacker.
Ph.D. in Computer Science
University of Reading, UK
Focus: Deep Reinforcement Learning (RL/DRL)
Researched methods for learning and exploiting relational action structure in Deep Reinforcement Learning (DRL) problems in online (Relational Network with PPO) and offline (Decision Transformers) settings, with supervised and unsupervised techniques. My work improved transfer in multi-task settings using methods such as clustering, contrastive methods and auxiliary signals.
Decision Transformers: generated datasets of trajectory data (each dataset in excess of 300GB) using PPO for offline RL training. Pre-trained a transformer model from scratch over multimodal dataset; modified data-loader improving efficiency by a factor of 6; designed alternate action space tokenisations and position encoding to improve token communication; used Mechanistic Interpretation techniques to analyse models.
M.Sc. in Mechanical Engineering
University of Natal, South Africa
Focus: Manufacturing automation
B.Sc. in Mechanical Engineering
University of Natal, South Africa
Focus: Robotics and automation
Interpreting Decision Transformer: Insights from Continuous Control Tasks, Accepted for Conference 2024.
Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces, arXiv [Cs.LG], Pending Review.
A Conservative Q-Learning Approach for Handling Distribution Shift in Sepsis Treatment Strategies, NeurIPS 2021 Workshop.
Understanding structure of concurrent actions, Springer 2019
Blue Dot AISF Alignment project: Jan 2025 - Feb 2025
Researching alignment impact on Constitutional AI (CAI) models trained with alternative country and cultural principles.
Generated synthetic datasets for SFT fine-tuning and DPO based on alternative country and cultural constitutions.
Designed capability and alignment evaluations for models. Implementing in Inspect framework.
AI Safety Camp Project Team: AI Safety Scientist: Jan 2025 - Present
Extending Sakana’s AI Scientist for AI Safety to include tasks such as synthetic data generation, evaluation automation, risk assessments and other scaffolding tools and wrappers.
Implementing agents in Inspect to support more complex tasks for benchmarking and automating evals.
Building an AI control protocol setup for automated agent-based tests.
Multi-Agent feature steering and analysis in a multi-agent setting (APART Hackathon Mar 2025):
Participated in the Women in AI Safety hackathon from APART Research (https://github.com/moodlep/apart_gf_hackathon).
Our team is continuing to extend the preliminary experiments from Prisoner’s Dilemma to more real-world scenarios using Concordia.
APART Hackathon (Mar 2025):
Participating in the AI Control hackathon from APART research, developing and testing new protocols for the DTT (defer-to-trusted) setting.
Code is implemented using the Control Arena framework from AISI and will be released after the hackathon.
Community engagement:
Former organizer of Google Developer Group events from 2015-2021; grew the group to >2.5k people, ran monthly technical talks and workshops; organised yearly conferences attracting up to 300 people. Actively helped people skill-up and retrain for new careers.
Moderator for ML Collective RL group; organized paper reading groups and coding sessions.
Mentoring:
Mentored ~10 women in tech on a regular basis; continue to support and mentor on an ad hoc basis
Mentored 2 DL Indaba students
Teaching:
Former teaching assistant for ML postgraduate courses.
Facilitated courses and workshops for Google Developers.
Undergrad Math lecturer at ML Sultan Technikon (1996-1997)
Open Source:
Early contributor to the OpenMined open-source project.