PHPlab: Human segmentation algorithm
Claude Desjardins, Real!ty Medias
(++1524) In the search for an optimal way to enhance users medias experience, I have come to several technical realizations. One of them consists in an algorithm that would find and map human regions from a media in order to center and / or crop the given media providing to those regions coordnits. Technologies that enables human recognition are barely available for high-end platforms so finding one that would replicate the same process under a public system (such as a web server) was pratically impossible.
To construct a system capable of finding human on a media requires a design paradigm that is able to process quickly and efficiently a valuable pool of data. Keeping the process as simple as possible is the key - the easiest and simplest way to detect humans on a media is to find human skin.
Design paradigm; human skin segmentation
Due to the fact that there are different races, skin tints, media sources and ambient lightning, the media to process has to be flattened to a normalized scene by removing luminance from the RGB values (ref. MIT). This is done by converting the scene to chromatic (pure) representations of the colors in the absence of luminance. We can acchieve this by the following process: R = R/(RGB), G=G, B=B/(RGB)
The distribution of skin colors is found to be clustered in a small chromatic color space (ref. fig.2). Skin colors of different races are close enough to form a starting point in the analysis as they differ in intensity more than in color.
As shown on Fig.1, the skin chromatic R (red) values tend to stay clustered in the 40+/255 space, the B (blue) presence is shown in the 20+/255 space while the luminance stays over 105.
The luminance, as it is a perspective value (depending on the scene, transport and environement), we rely on a perseption rule to find its value. The general luminance rule is the sum of the R (red) value multiplicated by 0.3, G (green) plus 0.59, and B (blue) multiplicated by 0.11.
From those facts, we may now build a segmentation algorithm that would discriminate potentially non-skin regions of a media. Here is the segmentation algorithm;
R=RGB >> 10 & 0XFF
G=RGB >> 8 & 0XFF
B=RGB & 0XFF
L=(R*0.3) + (G+0.59) + (B*0.11)
N=R+G+B
we conclude that if ...
N > 0 AND R/(R+G+B) * 100 > 40 and
B > 0 AND B/(R+G+B) * 100 > 20 and
L > 105
a given (x/y) unit (pixel) is considered a potential skin segment.
Narrowing by connectivity analysis
Even though we have found potential skin zones in the media, it does not mean we actually have found skin. Compression technologies, diethering, quality of the original media and the scene may induce shifted values which will cause the algorithm to catch pixels that are not the ones we are looking for. A discriminatory process will remove unconsecutive entities from the pixels array as we process the previous algorithm. A residual stack of the previously analysed entities will reveal linear (left to right) contiguity between them. A linearity of 1.5% of the media width constitutes a fair contiguous group. Removing "lonesome" entities will not only help matching the final groups but it will also speed the future processes up.
Now that we have an array of all the potential skin segments of the media, we will analyse their neighborhood connectivity. A "synapse" system will check for enabled entitites in the pixel neighborhood; since the media is analysed by line then by column (left to right, then move one row down, etc...), testing connectivity is done by checking for the presence of enabled entities on the left, top/left, top, and top/right neighborhood.
The synapse system will attribute tags to pixels providing to the quantify of connected pixels found:
Zero connected pixel
Create and assign a new family code to the analyzed pixel
One connected pixel
Assign the analyzed pixel its neighborhood family code.
More than one connected pixel
The lowest neighborhood familly code is attributed to the analysed pixel and all the analyzed pixels
(left, top/left, top and top/right) family codes are injected into a reconnection table.
Once completed, if a reconnection table exist, it is analysed and merged; if any array key or member containing a matching family code is found, both arrays are merged into one. This will create a family code reconstruction array which will convert entities of given family codes to other family codes. This constructs the "zones". This is applied by a third loop.
Threshold boundaries
Now that groups are formed, to discriminate minorities, any group with a members count below the average units count per group is removed. By the pixel coordnits, we can now find the most left, top, right and bottom pixels which forms the threshold boundaries (Fig.4).
Finalize the operation
For processing speed and resources usage consideration, we, at the beginning of the process, resized the media to a ~100 pixels thumbnail. As the process is using percentages, resizing the media does not considerably alter the performance of the algorithms. The boundaries coordnits we have at this point applies to the resized media. We will re-calculate our boundaries proportionally to the original media and return those values as an array to the caller object and/or process.
License & Source code
License
Usage of this work is authorized for PERSONAL or EDUCATIONAL USES ONLY. Any commercial use or redistribution
of this source code is prohibited. "The SOFTWARE" is a Copyright of Real!ty Medias, 2007. All
rights are reserved.
If you want to use this software for commercial purposes, to integrate and/or redistribute it into another
software or to redistribute it, please contact the author.
Licensing fees may be applicable.
If you found this software useful, you might consider donating to the author. To do so, please click
the donation button below. Any amount is greatly appreciated.
Source code
© 2007, Real!ty Medias. All rights reserved.

Fig. 1 - Chromatic tendencies

Fig. 2 - Luminance segment

Fig. 3 - Connectivity clustering

Fig. 4 - Threshold boundaries