Very interesting stuff, thanks for sharing! I have some experience in doing this kind of modeling for my work in football analytics. I do believe that the choice of variables and the way they are normalized (e.g. by minutes played, number of touches, share of team totals, % of an event subtype w.r.t. total etc.) is as (if not more) important than the technicalities of the clustering process in itself. Would you agree with that?
Very interesting stuff, thanks for sharing! I have some experience in doing this kind of modeling for my work in football analytics. I do believe that the choice of variables and the way they are normalized (e.g. by minutes played, number of touches, share of team totals, % of an event subtype w.r.t. total etc.) is as (if not more) important than the technicalities of the clustering process in itself. Would you agree with that?
Oh completely - how you represent variables dictates the dissimilarity and everything else downstream