It Was DataFest Time Again

Apr 29, 2024

“DataFest is a data hackathon for students, founded at UCLA in 2011 as a way to make data analysis more fun and meaningful while incentivizing good scientific practice and presentation. Now supported by the American Statistical Association, ASA DataFest is run each year through several host institutions around the world”.

At California State University Fresno, the driving force behind DataFests is professor Earvin Balderama from the Mathematics department, statistics expert. Other faculty members help with workshops and mentoring.

  1. The hackathon starts with a Friday kick-off. There’s an R and a Python workshop in parallel: generally, these are the two main technology stacks the students are going with. For statistics curriculum reasons R has a very strong foundation, it is a smaller group that usually chooses my expertise: Python + Pandas + SciKit Learn + Seaborn.
  2. On Friday night also the dataset is revealed as well. The dataset is common among all the student campuses running the event. The dataset is always coming from a real-world setting and is always very interesting. Datasets should be retained only for the purpose and the time of the hackathon.
  3. Teams can start to work once the dataset is revealed and the hackathon grind begins.
  4. Teams work tirelessly all Saturday and Sunday until the submission deadline before noon on Sunday.
  5. Judging takes place and winners are announced in the afternoon.

During the two years I participated there were also students and mentors traveling from UC Merced, UC Santa Cruz, and other campuses from around California. In 2023 I helped Earvin to recruit judges on the board. I managed to mobilize more data science people than expected, so I gave up my judge seat and Agustin Rivera from nVidia and Andrew Sweet from Assemi Group hopped on to complement Marcela Alfaro Córdoba from UC Santa Cruz. I ended up featured as a speaker with a talk about MLOps and thoughts on LLM consciousness and problem-solving.

It was great to talk with data science meetup friends, Agustin gathered a ton of experience at nVidia, and Andrew Sweet does deep research mostly about time series machine learning and forecasting. He even published talks at large conferences.

Last year’s dataset was coming from the American Bar Association’s pro-bono help service for lower-income people dealing with legal issues. The dataset was anonymized and due to the legal domain, it contained a lot of text (user conversations with the agents) and paved the way for enhanced NLP (Natural Language Processing).

2024’s DataFest dataset was from the CourseKata online learning platform and the task was somewhat “meta” for the students: the aim was to find patterns that could help students’ learning. We could see interesting visualizations and patterns. In general students’ lives are hard and they could struggle to cover all commitments.

The best part is the socialization. For example, in 2024 I tried to retreat to a classroom for my talk about Cloud Sustainability for GDG Tucson. One team came and overheard my lecture. Later we spoke for hours about generative AI. I learned that those kids already had some projects in the making which was already generating some revenue. Mind-blowing!

I hope Fresno State will keep participating in more DataFests with the lead of Earvin Balderama and I can keep contributing as a mentor or in any other capacity.

Comments loading...