Author Image

Hi, I am Jeremy

Jeremy DiBattista

Founding Machine Learning Engineer at Bronson

I am an analytical, innovative, outgoing, and creative leader specializing in innovating in the deep learning, data analytics, statistics, and cloud development space. I utilizes my cloud-first artificial intelligence expertise to apply unique design and cutting-edge technologies to lead development and build businesses from the ground up.

Skills

Experiences

1
Bronson

May 2024 - Present

Remote / San Juan

Legal technology startup utilizing AI to help protect against health and environmental harm

Founding Machine Learning Engineer

May 2024 - Present

Responsibilities:
  • Leading machine learning capabilities in legal technology by integrating modern AI capabilities into existing lawfirms to automate data entry, extraction, and organization - saving firms 30% in administrative costs.
  • Utilizing OCR, a custom LLM implementation, and Multi-modal RAG to parse and organize millions of unstructured legal, environmental, and medical data files.

Spiny.ai

November 2020 - May 2024

Remote

Spiny aims to change the way publishers interact with their customers by providing in depth analytics on the heartbeat of their business

Data Analytics Lead / Machine Learning Engineer

September 2021 - May 2024

Senior Machine Learning Engineer

November 2020 - September 2021

Responsibilities:
  • Team lead for the data analytics team, overseeing 8+ unique products spanning a complete data analytics suite including data ingestion, customer management, ETL, pre-aggregation, reporting, predictive analytics, and natural language/ML products.
  • Coordinate delivery of our built-from-scratch data products with team leads and stakeholders, lead team meetings, backlog refinement, and sprint planning for a team of 5 engineers, while upholding engineering standards allowing us to maintain a 99.9% uptime SLA.
  • Boosting model created to mimic advertising bids and properly attribute revenue to individually triggered ad events in real-time with 97+% accuracy. First-of-its-kind model used to provide granular advertising insights and determine customer RFM (recency, frequency, monetary value). Product is the core value proposition of the company.
  • Developed a proprietary natural language model utilizing an ensemble of NLP methodologies that can determine the SEO tags of an article, custom fit to each customer’s individual writing.
  • Coordinating and creating complex ETL pipelines utilizing Snowflake, DBT, AWS, and Cube to reliably deliver, process, and store data.
  • Developed ON3 Sports’s proprietary NIL model - predicting the approximate NIL value for every prominent college athlete. The algorithm has become the industry standard for evaluating student athlete endorsements and has been used by top brand CEOs like Nike.
  • Vendor Management (Snowflake, Deepchecks, Cube, DBT, etc.).
2

3
Healthbridge

August 2019 - Febuary 2020

Atlanta, GA

South Aftican healthcare company providing technology solutions to modernize patient care

Cloud Machine Learning Engineer

August 2019 - Febuary 2020

Responsibilities:
  • Developed a proprietary multi-layered perceptron by augmenting multi-label binarizers with cross-correlation data to predict patient diagnoses given the drugs and tests prescribed to the patient. Methodology increased 5-shot accuracy by 36%.
  • Predicting ICD10-3 Codes Given Doctor Encounter of ATC3 Codes (Based on unpublished Cross-Correlation CNN research).

Chick-fil-A

January 2017 - May 2019

Atlanta, GA

Innovation Engineer (co-op)

January 2017 - May 2019

Responsibilities:
  • Leveraged LDA to deploy an unsupervised ML model for classifying and visualizing call center audio files.
  • Created a customer segmentation engine that utilizes PySpark in an EMR environment to build a K Clustering model to process, group, and visualize customer data to drive customer market insights.
  • Created a speech detection bot that can assist, display, respond, and access in-store operations to increase team member productivity.
  • Speech bot leveraged to gain innovation funding leading to the opening of the Midtown innovation center.
4

5
Capital One

June 2019 - August 2019

Richmond, VA

Data Engineer Intern

June 2019 - August 2019

Responsibilities:
  • Created a real-time Python Dash application that utilizes Pandas and Kafka streams to monitor, alert, and provide analytics into customer indirect payments throughout each step of file processing.

Citi

June 2018 - August 2018

Dallas, TX

Machine Learning Engineer Intern

June 2018 - August 2018

Responsibilities:
  • Built model to project future internal server usage and alerting system for detecting anomalies in usage and to assist in future server scaling.
6

Education

M.S. Machine Learning
GPA: 3.84 out of 4
B.S. in Computer Science
GPA: 3.73 out of 4
Taken Courses:
  • Computational Linguistics
  • Deep Learning
  • Knowledge Based AI
  • Machine Learning for Trading
Extracurricular Activities:
  • Data Visualization Researcher
Minor in Computing in Business (Denning Technology and Management Program)
Extracurricular Activities:
  • Capstone research project with Equifax regarding lifetime value of subprime auto loans

Projects

Grammatical Error Correction Course
Creator 2023

Course building students knowledge of NLP covering from simple spellcheck through Transformers

Documentation Automation
Creator 2022

An Open Source Python Script for automated documentation

Data Visualization lectures
Creator 2018

Data Visualization lectures in colaboration with Georiga Tech’s data viz lab

Featured and Recent Articles

Set Up a Local ChatGPT-Like Interface + Copilot in Less Than 10 Minutes

Using Ollama, Llama3, Continue, and Open WebUI to bring a safe, local, open source, and free virtual assistant experience

NLP Demystified, Using the Viterbi Algorithm for Part-of-Speech Tagging
Geek Culture 3 May 2024

Understand with examples how Hidden Markov Models perform part-of-speech tagging

Choosing the Best ML Time Series Model for Your Data

ARIMA, Prophet, LSTMs, CNNs, GPVAR, Seasonal Decomposition, DeepAR, and more. When it comes to time series models, there are a plethora of methods, meaning it is important to consider your options before committing to a model.

Easily Automate Your Documentation and Never Touch It Again
Towards Data Science 12 April 2022

Completely automated documentation process in Python using MkDocs

The most Efficient way to Merge/Join Pandas Dataframes

Why almost everyone is writing them inefficiently, and a few tips for optimal data science!

Teaching and Volenteering

Volenteer Co-Teacher AP CSP
Microsoft Foundation August 2022 - May 2024

Partner with local schools and teachers lacking technical resources to bring equitable CS programs to the Atlanta area. Currently teaching AP Computer Science for Drew High School.

Educative Course Writer
Educative December 2022 - September 2023

Developed a course for advanced NLP topics starting with simple topics like Norvig spell checking and covering topics like Markov Models, Viterbi, and Levenstein distance. The course culminates by teaching transformer-based spell checking and grammar correction (ex. GECToR).

Head Graduate Teaching Assistant
Georgia Institute of Technology May 2020 – Dec 2020

Worked closely with the professor to create a curriculum of assignments while assisting students in ensuring they develop content mastery. Head GTA for Computer Networking, GTA for Game Design.

Data Visualization Researcher and Lecturer
Georgia Institute of Technology Jan 2018 – Dec 2018

Created workshops for students and faculty detailing how to use Python libraries in order to create complex visualizations. Workshops spanned multiple offerings per semester and covered varios visualtion tools and levels of expertise.

Techincal Writer
Towards Data Science April 2020 - Present

Long-form technical writing covering a variety of subjects in deep learning, python, and data science; SEO-optimized articles have helped generate over 150 thousand readers.