Folks,
I have 2 questions:
- I noticed boundary alignment in Bullet (e.g. btPoolAllocator) by default is 16 bytes (I understand user can configure it). Wondering the reason for choosing 16 as a number.
- Another question is only the first address of the block is 16 bytes aligned, the rest of them will not be.
e.g. If I use pool allocator for 3 objects of 50 bytes each. The returned start address of the 150 bytes is 16 bytes aligned, but for the other 2 objects, they will not be 16 bytes aligned.
Let's take an example assuming 16 is the starting address of allocated 150 bytes.
The next object will be placed by btPoolAllocator at 16+50=66 and 66 is not 16 bytes aligned,
same holds true for 3rd object as it will be placed at 66+50=116, which is again not 16 bytes aligned.
So what is the use of aligning only the starting of the block and leaving rest of them.
Any help is high appreciable.
Thanks and Regards
Rachit
Boundary alignment in Bullet
-
- Posts: 7
- Joined: Tue Aug 21, 2012 6:07 pm
Re: Boundary alignment in Bullet
Several items (like 3D vectors, quaternions) are 16-bytes entries (4* 32-bit floats).
Modern porcessors have special instructions to deal with such kind of floating point data.
They are "SSE processor instructions", and they have special aligment requirement:
16-bytes data entries for such kind of instructions must be aligned on 16-byte boundaries.
This can be checked with assert(((size_t)(new int[4]) & 0xf) == 0);
When you will try to operate unaligned SSE data, you will get an runtime exception.
Unless you are using special MOVUPS instruction, which can be generated with _mm_loadu_ps()
Btw, i have discovered, that using tcmalloc library from google perf tools is very good practice.
It virtually solves all the problems with new() aligment, so there wlll be no need in non-standart magic allocators-aligners.
It always allocates aligned memory, except for cases when allocating 4- or 8-byte memory blocks.
Also it behaves pretty fine in multithreaded enviroment.
Modern porcessors have special instructions to deal with such kind of floating point data.
They are "SSE processor instructions", and they have special aligment requirement:
16-bytes data entries for such kind of instructions must be aligned on 16-byte boundaries.
This can be checked with assert(((size_t)(new int[4]) & 0xf) == 0);
When you will try to operate unaligned SSE data, you will get an runtime exception.
Unless you are using special MOVUPS instruction, which can be generated with _mm_loadu_ps()
Btw, i have discovered, that using tcmalloc library from google perf tools is very good practice.
It virtually solves all the problems with new() aligment, so there wlll be no need in non-standart magic allocators-aligners.
It always allocates aligned memory, except for cases when allocating 4- or 8-byte memory blocks.
Also it behaves pretty fine in multithreaded enviroment.