[Solved] Storing Vectormath SSE Matrix4 in a float[16] ?

dfalcone
Posts: 2
Joined: Wed Nov 21, 2012 10:21 am

[Solved] Storing Vectormath SSE Matrix4 in a float[16] ?

Post by dfalcone »

So since I want to use Bullet Physics for my project, I thought it would be nice to also use the included Vectormath library. The only problem I'm having with the library is that there seems to be a lack of an efficient way to get the float data out of a Matrix4 to pass to OpenGL/DirectX for transformation. This works fine with the Scalar library, but for the SSE library it stoes the values in a _m128, so you can't just pass the memory address of the matrix.

Other libraries have a simple method to get this data, for example M$oft's Math library uses the 'StoreFloat4x4' functions for passing the float values. Based on that implementation I've started doing this:

Code: Select all

static float rotY = 0.0f;
rotY += .0005f;
Vectormath::Aos::Matrix4 modelMatrix = Vectormath::Aos::Matrix4::rotationY(rotY); //Rotating Model Matrix for example

float tmpMtx[16];

_mm_store_ps( &tmpMtx[0] , modelMatrix .getCol(0).get128() );
_mm_store_ps( &tmpMtx[4] , modelMatrix .getCol(1).get128() );
_mm_store_ps( &tmpMtx[8] , modelMatrix .getCol(2).get128() );
_mm_store_ps( &tmpMtx[12] , modelMatrix .getCol(3).get128() );

unsigned int locModelMat = glGetUniformLocation(m_programID, "modelMat"); //Model Matrix Location
glUniformMatrix4fv(locModelMat, 1, GL_FALSE, &tmpMtx[0]);                 //Copy Float[16] model matrix to OpenGL Shader
Is there a better way to do this? If not, would it be possible to get functions that do this integrated into the library?
Last edited by dfalcone on Wed Nov 28, 2012 12:19 am, edited 1 time in total.
dfalcone
Posts: 2
Joined: Wed Nov 21, 2012 10:21 am

(Solved) Storing Vectormath SSE Matrix4 in a float[16] ?

Post by dfalcone »

After a lot of reading and re-reading, I finally figured this out.

All SSE data must be 16 byte aligned for the memory address. That means the pointer to the data must have a memory address dividable by 16. Since there is no code I can find on the net anywhere for using the Vectormath library with directX or OpenGL, I'll post both samples. And you can pass the SIMD Matrix4 (which is a struct of 4 _m128 Vectors) directly to the GPU to read in as multiple floats.

OpenGL:

Code: Select all

unsigned int locProjMat = glGetUniformLocation(m_programID, "projMat");
ATTRIBUTE_ALIGNED16(Vectormath::Aos::Matrix4 proj) = Vectormath::Aos::Matrix4::perspective(45.0f, 4.0f/3.0f, 0.1f, 100.0f);
glUniformMatrix4fv(locProjMat, 1, GL_FALSE, (GLfloat*)&proj);
DirectX 11:

Code: Select all

m_d3dDevice->CreateBuffer( &constDesc, 0, &m_cbChangeOnResize );
ATTRIBUTE_ALIGNED16(Vectormath::Aos::Matrix4 proj) = Vectormath::Aos::Matrix4::perspective(45.0f, 4.0f/3.0f, 0.1f, 100.0f);
m_d3dDeviceContext->UpdateSubresource( m_cbChangeOnResize, 0, 0, &proj, 0, 0 );