After skin classifier has tested all pixels in the image, we get isolated blobs of potential body parts. The task of combining them into complete shape is regarded as a merging or grouping problem. As a simple way to attack this task I used a morphological ‘close’ operation (which is dilate + erode).
Among other approaches, one that seems to work for me is a merge of close blobs with similar color signature. For two blobs we compute the clustering using Gaussian Mixture Model (GMM). Then we throw away the variance information and use only centers of each cluster. Intuitively, if we put the earth, with weight corresponding to cluster’s weight, in the center of a cluster, then Earth Mover Distance (EMD) may be treated as the distance between two color signatures and also as the distance between corresponding two blobs. Details are here .
One known weakness is that sometimes the color distance is small for completely unrelated blobs and we get the merging error. In extreme case, I am thinking of the children in the swimming pool, crowding near the border and listening to their coach. It will be the challenge to find the owner of each body part and this approach may not work.
Assume that we know that given two blobs are parts of the single owner. How do we merge them on a pixel level? As I am interested in catching whole color diapason for a given swimmer (for the sake of identification among other swimmers), I am connecting blobs with a ‘thin bridge’, expecting that it will cover parts of a swimming suite.
Also another weakness of this approach is the performance penalties from color conversion (RGB-Lab), GMM learning, EMD computing and actual blobs merging on a pixel level.
 “Visual tracking using the Earth Mover's Distance between Gaussian mixtures and Kalman filtering”, Vasileios Karavasilis.