Dienstag, 21. Mai 2013

Parallel all permutations - non-recursive - GPU optimal

This post is a general one - not voxel related.

Some time ago I developed a a simple, fast and yet non-recursive method to directly compute the n'th permutation. Thats useful especially when working on the graphics card with CUDA for parallel evaluations. (it might have been published somewhere already - so if you know the reference, you can post it as a comment)

The algorithm is based on the following property. Lets say there are 6 characters in the string to permute. Then the result is 6! = 720 combinations. In the first column of the results, we get 720/6=120 lines of each character. This means if the index counts from 0 to 719, then the first character to be removed is computed as remove_index = index/120.

For the next column, 5 numbers are left. This means, that 120 = 5 * 24 lines of the same character - and so on. 

We the following list of removes:
column0: remove_index = (a/120)%5;
column1: remove_index = (a/24)%4;
column2: remove_index = (a/6)%3;
column3: remove_index = (a/2)%2;
column4: remove_index = (a/1)%1;

Code snipplet:

std::string default = "12345";
int perm=1, digits=default.size();
for (int i=1;i<=digits;perm*=i++);
for (int a=0;a
<perm;a++)
{
std::string avail=default;
for (int b=digits,div=perm;b>0; b--)
{
div/=b;
int index = (a/div)%b;
printf("%c", avail[index] );
//avail[index]=avail[b-1]; // non-lexigraphic but fast
avail.erase(index,1) ; // lexigraphically correct
}
printf("\n");
}
printf("permutations:%d\n",perm);

In the code, perm defines the Lehmer code (wiki)