While porting this I found this...

d3x0r · Post by **d3x0r** » Thu Oct 15, 2015 8:25 am

This is in struct ClosestRayResultCallback.....

m_hitPointWorld.setInterpolate3 clears the value set in the IF above...

Code: Select all

		virtual	btScalar	addSingleResult(LocalRayResult& rayResult,bool normalInWorldSpace)
		{
			//caller already does the filter on the m_closestHitFraction
			btAssert(rayResult.m_hitFraction <= m_closestHitFraction);
			
			m_closestHitFraction = rayResult.m_hitFraction;
			m_collisionObject = rayResult.m_collisionObject;
			if (normalInWorldSpace)
			{
				m_hitNormalWorld = rayResult.m_hitNormalLocal;
			} else
			{
				///need to transform normal into worldspace
				m_hitNormalWorld = m_collisionObject->getWorldTransform().getBasis()*rayResult.m_hitNormalLocal;
			}
			/******** 
This line override either result from the prior IF 
*******************/
			m_hitPointWorld.setInterpolate3(m_rayFromWorld,m_rayToWorld,rayResult.m_hitFraction);
			return rayResult.m_hitFraction;
		}

xexuxjy · Post by **xexuxjy** » Thu Oct 15, 2015 11:36 am

does it? one seems to be point and one seems to be normal?

d3x0r · Post by **d3x0r** » Fri Oct 16, 2015 2:52 pm

Okay true... I triple checked it and thought it was the same.
----

lots of places use -FLT_MAX instead of FLT_MIN.

there's also several places that use 1e30 instead of BT_LARGE_FLOAT
as in..

Code: Select all

		btScalar planeEq = 1e30f;

or

Code: Select all

		btScalar minDist = -1e30f;

Many places define a EPSILON instead of using SIMD_EPSILON

-----
btConvexHullComputer doesn't use standard lists of btVertex3's instead tries to do it's own managment of double* or float* with a stride. (but it is implemented by a third party)

-----
any idea what neib means? as in...

Code: Select all

int &btHullTriangle::neib(int a,int b)

part of btConvexHull....

drleviathan · Post by **drleviathan** » Fri Oct 16, 2015 3:49 pm

FLT_MIN represents the smallest positive non-zero value that can be represented by 32-bit floats, so something on the order of 1e-37 whereas -FLT_MAX represents the most negative floating point value.

Yes, sometimes it makes sense to use a custom EPSILON rather than SIMD_EPSILON.

Meh, not using BT_LARGE_FLOAT is probably a little bit sloppy, as is using an unintelligible function named neib(). I looked at the code and tried to figure out what neib() was doing but failed to grasp it after about 5 min of scrutiny, which was all I was willing to throw at it.

xexuxjy · Post by **xexuxjy** » Fri Oct 16, 2015 4:36 pm

puzzled me as well to the point where I had to look it up

seems like neib is short for neighbour, maybe..
https://people.sc.fsu.edu/~jburkardt/c_ ... _neib.html

d3x0r · Post by **d3x0r** » Sat Oct 17, 2015 9:15 am

Neib = neighbor! Ok.
---
re FLT_MIN... my bad; C# has double.MaxValue and double.MinValue so I didn't look and thought MIN was equivalent... since there's FLT_EPSILON which is also FLT_MIN; while .NET's double.Epsilon is the minimum non zero value.
but isn't minimum and maximum float off by +/-1 like int MIN/MAX?
----

Box2dShape - isn't this just a single plane?

Code: Select all

	virtual int getNumPlanes() const
	{
		return 6;
	}	
	
	virtual int getNumEdges() const
	{
		return 12;
	}

while vertices is 4...

btTriangleShape is 1 plane and 3 edges....

d3x0r · Post by **d3x0r** » Sun Oct 18, 2015 11:36 am

For performance it's faster to assign than to use constructor

Code: Select all

class Vector{
	float _a,_b,_c;
public: 
	inline Vector( float a, float b, float c){
		_a = a;
		_b = b;
		_c = c;
	}
};
Vector Zero(0,0,0);

this takes almost twice as long as just assignment

Code: Select all

         Vector v(0,0,0);

Code: Select all

         Vector v2 = Zero;

creating a zero vector happens quite a bit.
Other simple constant vectors are the axis, all one's and all -one.

Vector VecXAxis(1,0,0);
Vector VecYAxis(0,1,0);
Vector VecZAxis(0,0,1);
Vector VecZero(0,0,0);
Vector VecOne( 1,1,1 );
Vector VecNegOne( -1,-1,-1 );

-----------

full test

Code: Select all

#include <string.h>
#include <stdio.h>
#include <ctime>



class Vector{
	float _a,_b,_c;
public: inline Vector( float a, float b, float c){
		_a = a;
		_b = b;
		_c = c;
	}
};

Vector Zero(0,0,0);

int main( int argc, char **argv )
{
	int n, m;
	if( argc <= 1 )
	{
		if( strcmp( argv[0], "new" ) == 0 );
		return 0;
	}
	if( strcmp( argv[1], "new" ) == 0 )
	{
		std::clock_t start = clock();
		for( m = 0; m < 1000; m++ )
		for( n = 0; n < 1000000; n++ )
		{
			Vector v(0,0,0);
		}
		std::clock_t end = clock();
		printf( "%g  %g\n", double(end-start), double(end-start)/CLOCKS_PER_SEC );
	}
	else
	{
		std::clock_t start = clock();
		for( m = 0; m < 1000; m++ )
		for( n = 0; n < 1000000; n++ )
		{
			Vector v2 = Zero;
		}
		std::clock_t end = clock();
		printf( "%g  %g\n", double(end-start), double(end-start)/CLOCKS_PER_SEC );
	}
	return 0;
}

d3x0r · Post by **d3x0r** » Sun Oct 18, 2015 12:31 pm

Another performance issue (at a cost of ease of development for sure)

Code: Select all

class Vector{
	float _a,_b,_c;
public:
	inline Vector( float a, float b, float c){
		_a = a;
		_b = b;
		_c = c;
	}
	inline Vector( ){
		_a = 0;
		_b = 0;
		_c = 0;
	}
	inline Vector operator+(const Vector& v1)
	{
      return Vector( v1._a + _a, v1._b + _b, v1._c + _c );
	}
	static inline void Add(Vector& out, const Vector& v1, const Vector& v2)
	{
		out._a = v1._a + v2._a;
		out._b = v1._b + v2._b;
		out._c = v1._c + v2._c;
	}
	void Add(const Vector& v2, Vector& out )
	{
		out._a = _a + v2._a;
		out._b = _b + v2._b;
		out._c = _c + v2._c;
	}
};

Vector a = Zero;
Vector b = Zero;
Vector c;

a.Add( b, c ); is nearly twice the speed of c = a + b;

M:\tmp\new_assign_test>test method
478ms (for 100 * 1000000 ops)

M:\tmp\new_assign_test>test op
919ms (for 100 * 1000000 ops)

----
Also means manual declaration of temporary variables for things like

double length = ( a + b ) .Length();

Matrix types benefit even more because of the huge size of the temporary variables declared auto-magically.

drleviathan · Post by **drleviathan** » Sun Oct 18, 2015 2:02 pm

It is true that avoiding copies can help optimize code. For this reason the Havok physics engine doesn't even provide operator+() in its hkVector3 API. Instead it has hkVector3::add() and forces the developer to explicitly call constructors for temp variables when doing vector math -- constructors that would otherwise be hidden by C++. And yes, it does make for less readable code... unless you're a dev who is hardcore about optimization in which case all of the explicit constructors provide a warm fuzzy confidence that your code will compile to a minimal set of instructions.

I've been hardcore about optimizations in the past, but I'm not concerned about Bullet's performance because its plenty fast enough for my current project.

The good news is that Bullet is open source. You have an opportunity to improve it. My experience is that the maintainer Erwin Coumans is quite responsive to sensible pull requests on github.

drleviathan · Post by **drleviathan** » Sun Oct 18, 2015 3:06 pm

I was curious to see how compiler optimization would affect your example code. I had to modify it a bit in order to prevent the compiler from optimizing your loops out entirely:

Code: Select all

#include <string.h>
#include <stdio.h>
#include <ctime>

class Vector {
public: 
    float _a,_b,_c;
    inline Vector( float a, float b, float c) {
        _a = a;
        _b = b;
        _c = c;
    }
    inline void add(const Vector& b) {
        _a += b._a;
        _b += b._b;
        _c += b._c;
    }
};

const Vector ONES(1,1,1);
const unsigned int MAX_I = 1000000000;

int main( int argc, char **argv ) {
    if( argc <= 1 ) {
        if( strcmp( argv[0], "new" ) == 0 );
        return 0;
    }
    Vector total = ONES;
    std::clock_t start = clock();
    if( strcmp( argv[1], "new" ) == 0 ) {
        for (unsigned int i = 0; i < MAX_I; ++i ) {
            Vector v(1,1,1);
            total.add(v);
        }
    } else {
        std::clock_t start = clock();
        for (unsigned int i = 0; i < MAX_I; ++i ) {
            Vector v = ONES;
            total.add(v);
        }
    }
    std::clock_t end = clock();
    printf( "%g  %g  total = < %f, %f, %f >\n", double(end-start), double(end-start)/CLOCKS_PER_SEC, total._a, total._b, total._c );
    return 0;
}

And here are the results of my tests. What is interesting is that the new method is about the same as the copy in the optimized case, but not the unoptimized. There was noise in the results and since the numbers were so close I ran the comparison several times and found that the optimized new still beat the optimized copy more than half the time. The lesson here is to not forget that the compiler is your friend and to always test your assumptions about what makes fast code.

Code: Select all

$ gcc -o speed-test-unoptimized speed-test.cpp
$ ./speed-test-unoptimized copy
4.71759e+06  4.71759  total = < 16777216.000000, 16777216.000000, 16777216.000000 >
$ ./speed-test-unoptimized new
7.33109e+06  7.33109  total = < 16777216.000000, 16777216.000000, 16777216.000000 >
$ gcc -O3 -o speed-test-optimized speed-test.cpp
$ ./speed-test-optimized copy
878747  0.878747  total = < 16777216.000000, 16777216.000000, 16777216.000000 >
$ ./speed-test-optimized new
878324  0.878324  total = < 16777216.000000, 16777216.000000, 16777216.000000 >

Note that the floating point values in the total are just wrong. This is due to floating point error. The error goes away, and the relative runtimes remain proportional, when dropping the number of loops by a factor of 100.

d3x0r · Post by **d3x0r** » Sun Oct 18, 2015 3:11 pm

drleviathan wrote: And here are the results of my tests. What is interesting is that the new method is about the same as the copy in the optimized case, but not the unoptimized. There was noise in the results and since the numbers were so close I ran the comparison several times and found that the optimized new still beat the optimized copy more than half the time. The lesson here is to not forget that the compiler is your friend and to always test your assumptions about what makes fast code.

Right; forgot the optimize flag when building... good to know for the difference between copying and implicit init.

doesn't help the operator vs method test though.

---
Edit; what optimize flags did you use?
MinGW -o3 copy still beats new.
---
Edit2: re-read your message and saw -O3.. guess I ended up with a program named '3' - using the right case helps too....
Edit3: okay helps operator vs method too.

Thanks for the feedback.

xexuxjy · Post by **xexuxjy** » Mon Oct 19, 2015 10:42 am

Out of interest , what are you porting to - saw a few references to c# in your earlier comments. If it's that , then you might want to have a look at :

https://code.google.com/p/bullet-xna/
and
https://code.google.com/p/bulletsharp/

d3x0r · Post by **d3x0r** » Mon Oct 19, 2015 11:17 am

xexuxjy wrote:Out of interest , what are you porting to - saw a few references to c# in your earlier comments. If it's that , then you might want to have a look at :

https://code.google.com/p/bullet-xna/
and
https://code.google.com/p/bulletsharp/

I did; was unimpressed since they required linking to native code; they were also fairly thick.. but I guess they'd have to be.

Also sort of wanted to learn how it works. Though I'm starting to feel that my desire was greater than my ability

... well not ability; but there's some tricky things and it's getting rather tedious to translate some concepts.

Edit: Though; I've already gone through most of the largest pieces of this...

xexuxjy · Post by **xexuxjy** » Mon Oct 19, 2015 11:46 am

Hmm, bullet xna is a pure c# implementation no native code. And yes some of the concepts are a real pain , you've already seen the pointer arithmetic, also the code for constraints is a real headache......

d3x0r · Post by **d3x0r** » Tue Oct 20, 2015 5:15 am

I see... I didn't check that because... XNA.
I looked; it's pretty stale; some 3 years behind in updates
It is a good resource to answer some of my questions. (like the pointer math comparison which seems pretty arbitrary... the sort comparer in .NET is 1/0/-1 instead of just true/false that the quicksort that bullet uses... )

.NET isn't as good at optimizing (either in compile phase or JIT phase). Even in release mode the prior two tests for new vs assignment and operator vs method are speed improving for mono and MS.net. (operator vs method is a 5x improvement)

------
There are some things I can definitely make use of like 'Dvbt' which only changed 3 lines in original source... (well I guess there's a few more changes in the .H part of that)

Real-Time Physics Simulation Forum

While porting this I found this...

While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...

Re: While porting this I found this...