2 Comments

Very interesting stuff, thanks for sharing! I have some experience in doing this kind of modeling for my work in football analytics. I do believe that the choice of variables and the way they are normalized (e.g. by minutes played, number of touches, share of team totals, % of an event subtype w.r.t. total etc.) is as (if not more) important than the technicalities of the clustering process in itself. Would you agree with that?

Expand full comment

Oh completely - how you represent variables dictates the dissimilarity and everything else downstream

Expand full comment