Skip to main content

Virtual Datathon Draws More Than 100 Students From Universities Across N.C.

The event was organized by NC State Statistics graduate students and sponsored by John Deere.

A screen shot from one of the NC State Datathon entries
NC State graduate students Sam Galloway and Valliappan Muthukaruppan won first place in the graduate division for their project focusing on NC State campus safety.

This article was submitted by Emily Griffith, associate research professor in the Department of Statistics and director of the Statistical Consulting Core.

Thanks to generous funding from John Deere, the NC State Statistics Graduate Student Association organized and held a virtual Datathon the weekend of March 19 that drew more than 100 graduate and undergraduate students from universities across North Carolina.

During the event, participants received a dataset and spent the weekend analyzing, reporting and visualizing the data any way they chose. Thanks to John Deere’s sponsorship, each participant received a DoorDash gift card to feed themselves while they worked; participants also received an NC State Datathon swag bag. Judges viewed the recorded presentations of each submission while scoring each one according to a pre-established rubric. Students came from NC State, Duke University, UNC Chapel Hill, UNC Greensboro, UNC Wilmington and Wake Forest University. 

The kick-off event was a virtual meeting Friday afternoon where NC State graduate student organizers Jimmy Hickey, Anna Wojciechowski and Shakthi Unnithan unveiled the dataset and explained the project rules and guidelines. The dataset used was the Raleigh National Incident-Based Reporting System data, provided by Open Data Raleigh. Students were allowed to supplement the police dataset with any other publicly available dataset . 

Hickey shared some details about the data. Unnithan organized the presentation of resources that NC State University Libraries provided for the datathon participants. These included workshops in R for data science, data visualization in the R programming language, and Python and Tableau tutorials. A GitHub link contained more workshops in R and Python about machine learning and data cleaning. Student organizers also put together a suggested schedule for the undergraduate and graduate divisions, divided into beginner, intermediate and advanced levels. 

On Saturday morning, Jeff Essic, GIS and Data Librarian at NC State University Libraries, gave a helpful workshop on  mapping and visualizing the crime data using the open source QGIS software. More than 20 teams sent a member to participate in the workshop, and several of the presentations used QGIS to make maps of their final projects. 

On Sunday morning, 12 teams of graduate students and 16 teams of undergraduate students submitted their videos. The winning teams were announced Sunday afternoon and were awarded cash prizes. The top teams will have the invaluable opportunity to present their findings to officers of the Raleigh Police Department.

The winning teams are below and their projects can be viewed on the Datathon website.


1st Place: Sam Galloway (NC State) and Valliappan Muthukaruppan (NC State)
NC State Campus Safety

2nd Place: Risa Sayre (UNC Chapel Hill)
Does change in housing price predict change in crime?

3rd Place: Anna Yanchenko (Duke) and Eric Yanchenko (NC State)
Do Sporting Events Influence Crime in Raleigh?


1st Place: Trenton Wallis (NC State) and Lauren White (NC State)
A Cyclist’s Point of View: Hillsborough Street

2nd Place: Tejas Pruthi (UNC Chapel Hill), Andrew Shooman (UNC Chapel Hill) and Ben Dod (UNC Chapel Hill)
Exploring Neighborhoods With High Societal Crime

3rd Place: Bhrij Patel (Duke), Rohit Raguram (Duke), Jayesh Gupta (Duke) and Kevin Day (Duke)
Geographical analysis and visualization of police arrests in Raleigh and the surrounding area

Congratulations to these winning teams! Thanks to the generosity of John Deere and the hard work of the organizers and judges, all of the participants walked away with valuable experiences and a deeper understanding of how to extract meaningful insights from large complex data sets.