I am looking for someone to help with regression analysis of a large dataset of golf related stats. I would like for the model output to be usable in excel. I am happy to pay by the hour or as a set fee for each project.
I have experimented with azure (including ensemble models) and google’s automl but that results haven’t been good enough; I suspect because of overfitting. I have then been making predictions through the endpoints provided within this service.
Therefore, I am looking for someone to: • Analyse the large dataset to create a better dataset by removing columns, adding extra columns, explaining links between columns etc. Many of the columns are highlight correlated to each other so this may be part of the reason of overfitting. • Run an improved dataset through several appropriate machine learning models. The model should be as complex as needed but no more complex than that. • Output a model that I can then easily use in excel, to make future predictions. Ideally, I would also like to integrate this fully into my SQL database.
I would like this performed on my smaller womens golf dataset, and if successful, I would like the process repeated on my larger mens golf dataset. In addition, in the future I will be creating more models for data subsets where even more data is available.
The main part of the database is the process which converts raw scores to adjusted scores. The raw scores are generally between 63-80 and you can see them here (R1-R4 columns)
I take these raw scores and convert them to adjusted scores in my SQL database. The process of converting them mainly involves calculating the field strength and then offsetting the raw scores accordingly. Field strength is a measure of the average ability of the players within a tournament. By offsetting raw scores to adjusted scores, I analyse the two different golfers on opposite sides of the world, even though they may have never played in the same tournament.
The adjusted scores are then converted to hundreds of different categories, over different time periods, along with a few other columns such as the date.
I have been performing simple regression on this data to make future predictions of a golfer’s ability but I would like to experiment with a more complex model in the hope of creating more accurate predictions.
Many thanks for taking your time to read through this and please let me know if you would like me to send you a more detailed spec as well as some sample data. After receiving the more detailed spec and sample data, please explain which ml model(s) you intend to use and why you think that is a good fit for my project. Please also explain how I will then use this model for future predictions.