I was curious to see how compiler optimization would affect your example code. I had to modify it a bit in order to prevent the compiler from optimizing your loops out entirely:
Code: Select all
#include <string.h>
#include <stdio.h>
#include <ctime>
class Vector {
public:
float _a,_b,_c;
inline Vector( float a, float b, float c) {
_a = a;
_b = b;
_c = c;
}
inline void add(const Vector& b) {
_a += b._a;
_b += b._b;
_c += b._c;
}
};
const Vector ONES(1,1,1);
const unsigned int MAX_I = 1000000000;
int main( int argc, char **argv ) {
if( argc <= 1 ) {
if( strcmp( argv[0], "new" ) == 0 );
return 0;
}
Vector total = ONES;
std::clock_t start = clock();
if( strcmp( argv[1], "new" ) == 0 ) {
for (unsigned int i = 0; i < MAX_I; ++i ) {
Vector v(1,1,1);
total.add(v);
}
} else {
std::clock_t start = clock();
for (unsigned int i = 0; i < MAX_I; ++i ) {
Vector v = ONES;
total.add(v);
}
}
std::clock_t end = clock();
printf( "%g %g total = < %f, %f, %f >\n", double(end-start), double(end-start)/CLOCKS_PER_SEC, total._a, total._b, total._c );
return 0;
}
And here are the results of my tests. What is interesting is that the
new method is about the same as the
copy in the optimized case, but not the unoptimized. There was noise in the results and since the numbers were so close I ran the comparison several times and found that the optimized
new still beat the optimized
copy more than half the time. The lesson here is to not forget that the compiler is your friend and to always test your assumptions about what makes fast code.
Code: Select all
$ gcc -o speed-test-unoptimized speed-test.cpp
$ ./speed-test-unoptimized copy
4.71759e+06 4.71759 total = < 16777216.000000, 16777216.000000, 16777216.000000 >
$ ./speed-test-unoptimized new
7.33109e+06 7.33109 total = < 16777216.000000, 16777216.000000, 16777216.000000 >
$ gcc -O3 -o speed-test-optimized speed-test.cpp
$ ./speed-test-optimized copy
878747 0.878747 total = < 16777216.000000, 16777216.000000, 16777216.000000 >
$ ./speed-test-optimized new
878324 0.878324 total = < 16777216.000000, 16777216.000000, 16777216.000000 >
Note that the floating point values in the
total are just wrong. This is due to floating point error. The error goes away, and the relative runtimes remain proportional, when dropping the number of loops by a factor of 100.