Rishiraj Acharya

Google Summer of Code 2022
Kaggle examples for TensorFlow Decision Forests

Contributor: Rishiraj Acharya, Mentor: Josh Gordon

Project Summary

Libraries like XGBoost and LightGBM dominated most Kaggle competitions on tabular data and achieved state-of-the-art results on a variety of datasets. The TensorFlow Decision Forests was also a very powerful new library with potential to give highly accurate predictions. However due to the lack of proper awareness and availability of detailed examples in the Kaggle community, it was not being used to its true potential. I solved this with a series of notebooks that demonstrated the usage of TensorFlow Decision Forests for a variety of datasets. I created beginner-friendly baseline examples and also dived into fine-tuned advanced examples that achieved some of the best scores in tabular competitions. As additional contribution, I’m working on a project that auto-trains, auto-tunes and auto-serves the best TF-DF model directly from CSV files.

Contributions

To demonstrate getting started and automated hyper-parameter tuning in TF-DF I worked on a real Kaggle competition with the Tabular Playground Series - Feb 2021 dataset. It is a tabular dataset with 300,000 rows and 26 columns in training (93.66 MiB .CSV training dataset + 58.85 MiB .CSV test set) that is suitable for training algorithms to solve regression problems. The feature columns, cat0 - cat9 are categorical, and the feature columns cont0 - cont13 are continuous. I chose a regression task as the official tutorial already had examples for classification tasks.

Merged PRs:

Future Contributions

Working on a project that auto-trains TF-DF models with k-fold Cross Validation directly from CSV files, auto-tunes them using Keras Tuner and auto-serves the best model using FastAPI: https://github.com/tensorflow/decision-forests/issues/123

Acknowledgements

I loved the way GSoC went for me this year all thanks to the helpful, friendly, safe, and smooth environment created for me by my mentor Josh Gordon. I got adequate space to think of my own and implement ideas without being pressurized. At the same time, I was being offered all kinds of guidance and assistance for staying focused on my goals. GSoC has made me realize that this is what makes me happy and it’s what I want to do all my life. Hence I’ll soon be applying for a job at Google. I would also like to thank Sayak Paul, GDE in ML, for introducing me to the TF-DF library which helped me become one of its earliest adopters.

References