1000字范文 > Linux下opengl性能更高 OpenGL超级宝典学习笔记——性能比较

Linux下opengl性能更高 OpenGL超级宝典学习笔记——性能比较

时间：2019-11-23 06:28:34

本文通过包含许多顶点数据的复杂模型来比较使用glBegin()/glEnd立即模式，显示列表，以及顶点索引数组的性能与内存。

F-16 Thunderbird的飞机模型有3704个独立的三角形，通过Deep Exporation工具的索引模式编制后，共有1898个独立的顶点，2716个法线，2925个纹理坐标。

下面代码展示DrawBody函数，通过遍历索引来为每一个独立的三角形设置并发送纹理，法线和顶点坐标。

void DrawBody(void)

{

int iFace, iPoint;

glBegin(GL_TRIANGLES);

for(iFace = 0; iFace < 3074; iFace++) //遍历每一个三角形

for(iPoint = 0; iPoint < 3; iPoint++) //每一个顶点

{

//设置纹理

glTexCoord2fv(textures[face_indices[iFace][iPoint+6]]);

//设置法线

glNormal3fv(normals[face_indices[iFace][iPoint+3]]);

//设置顶点

glVertex3fv(vertices[face_indices[iFace][iPoint]]);

}

glEnd();

}

当你必须优化模型的存储空间时，这种方法是可以的。例如你在嵌入式中需要节省内存，或者在网络上传输时需要减少流量。但在实时应用中，这种方法的性能非常差，因为每一次都向OpenGL发送一个顶点数据，函数调用的次数也特别多。

显而易见的加速这些代码执行的速度就是使用显示列表的方式。我们把这些代码放到显示列表中。

glNewList(bodyList, GL_COMPILE);

DrawBody();

glEndList();

….

glCallList(bodyList);

下面我们来对比一下显示列表的方式和顶点索引数组的方式。

计算花费

首先计算一下这些经过包装过的顶点数据的所需要的内存。

// Thunderbird body

extern short face_indicies[3704][9];

extern GLfloat vertices [1898][3];

extern GLfloat normals [2716][3];

extern GLfloat textures [2925][2];

其中face_indicies包含了顶点，法线，和纹理的索引 short face_indicies[3704][9] = {{6,8,7 ,0,1,2 ,0,1,2 }, {6,9,8 ,0,3,1 ,0,3,1 }, {10,8,11 ,4,1,5 ,4,1,5 }....}

face_indicies 需要 3074*9*sizeof(short), 55332字节.类似地计算出vertices要22776字节，normals 32592字节。textures 23400字节。总计134100 约130KB.

在显示列表中，我们需要把这些数据拷贝一份到显示列表(显示列表中的命令和数据会经过优化后，放到命令缓冲区或者图形硬件中)我们没法计算显示列表具体使用多少内存，但可以对顶点的数据进行估算。每个三角形需要3个顶点，3个法线，2个纹理坐标，这些都是浮点数。假设sizeof(float)为4个字节。那么：

3704*3=11112个顶点。每个顶点包含3个成分(x,y,z)所以有11112*3=33336个浮点数值，同理法线有33336个浮点值，纹理坐标有22224个浮点值。把这些加起来再乘以4个字节为335,584个字节。那么加上之前的原始数据有469684个字节约460kb，不到0.5M.但是我们有11,112个顶点数据需要经过OpenGL的变换管道，这里面包含了许多矩阵运算。

创建合理的顶点索引数组

上面所存储的数据还不能直接用于OpenGL的顶点数组。因为OpenGL要求顶点数组，法线数组和纹理坐标数组必须是同样的大小，这样数组的遍历方式才能保持一致。顶点数组的第0个元素和法线数组的第0个元素是对应的。对于索引数组也有同样的要求。

在下面的例子中，我们使用一个类来处理现有的数组，并为其建立索引。下面是处理机身和玻璃座舱盖并建立索引的代码：

CTriangleMesh thunderBirdBody;

CTriangleMesh thunderBirdGalss;

//临时空间

M3DVector3f vVerts[3];

M3DVector3f vNorms[3];

M3DVector2f vTex[3];

//开始收集机身的网格，设置最大值

thunderBirdBody.BeginMesh(3074*3);

//循环所有面

for(int iFace = 0; iFace < 3074; iFace++)

{

for(int iPoint = 0; iPoint < 3; iPoint++)

{

memcpy(&vVerts[iPoint][0], &vertices[face_indices[iFace][iPoint][0]], sizeof(M3DVector3f));

memcpy(&vNorms[iPoint][0], &normals[face_indices[iFace][iPoint+3][0]], sizeof(M3DVector3f));

memcpy(&vTex[iPoint][0], &textures[face_indices[iFace][iPoint+6][0]], sizeof(M2DVector2f));

}

thunderBirdBody.AddTriangle(vVerts, vNorms, vTex);

}

//结束，并缩放顶点的值，以便屏幕的显示。

thunderBirdBody.EndMesh();

thunderBirdBody.Scale(fScale);

thunderBirdGlass.BeginMesh(352*3);

for(int iFace = 0; iFace < 352; iFace++)

{

for(int iPoint = 0; iPoint < 3; iPoint++)

{

memcpy(&vVerts[iPoint][0], &verticesGlass[face_indiciesGlass[iFace][iPoint]][0], sizeof(M3DVector3f));

memcpy(&vNorms[iPoint][0], &normalsGlass[face_indiciesGlass[iFace][iPoint+3]][0], sizeof(M3DVector3f));

memcpy(&vTex[iPoint][0], &texturesGlass[face_indiciesGlass[iFace][iPoint+6]][0], sizeof(M3DVector2f));

}

thunderBirdGlass.AddTriangle(vVerts, vNorms, vTex);

}

thunderBirdGlass.EndMesh();

thunderBirdGlass.Scale(fScale);

首先，我们声明了两个三角形网格类

CTriangleMesh thunderBirdBody;

CTriangleMesh thunderBirdGalss;

然后我们要告诉包含所有顶点所需要的大小的最大值，在最坏的情况下我们可能有3074个唯一的顶点，但一般情况下，许多顶点是共享的，值是一样的。

thunderBirdBody.BeginMesh(3074*3);

然后遍历集体所有的面，并收集每一个独立的三角形，并作为AddTriangle的参数，AddTriagnle会组织索引数组。在AddTriangle函数中把传进来的参数与之前的顶点数据进行比较看是否有重复的。如果是重复的在索引数组中就用同一个索引值。其内部处理代码：

for(GLuint iVertex = 0; iVertex < 3; iVertex++)

{

GLuint iMatch = 0;

for(iMatch = 0; iMatch < nNumVerts; iMatch++)

{

// If the vertex positions are the same

if(m3dCloseEnough(pVerts[iMatch][0], verts[iVertex][0], e) &&

m3dCloseEnough(pVerts[iMatch][1], verts[iVertex][1], e) &&

m3dCloseEnough(pVerts[iMatch][2], verts[iVertex][2], e) &&

// AND the Normal is the same...

m3dCloseEnough(pNorms[iMatch][0], vNorms[iVertex][0], e) &&

m3dCloseEnough(pNorms[iMatch][1], vNorms[iVertex][1], e) &&

m3dCloseEnough(pNorms[iMatch][2], vNorms[iVertex][2], e) &&

// And Texture is the same...

m3dCloseEnough(pTexCoords[iMatch][0], vTexCoords[iVertex][0], e) &&

m3dCloseEnough(pTexCoords[iMatch][1], vTexCoords[iVertex][1], e))

{

// Then add the index only

pIndexes[nNumIndexes] = iMatch;

nNumIndexes++;

break;

}

// No match for this vertex, add to end of list

if(iMatch == nNumVerts)

{

memcpy(pVerts[nNumVerts], verts[iVertex], sizeof(M3DVector3f));

memcpy(pNorms[nNumVerts], vNorms[iVertex], sizeof(M3DVector3f));

memcpy(pTexCoords[nNumVerts], &vTexCoords[iVertex], sizeof(M3DVector2f));

pIndexes[nNumIndexes] = nNumVerts;

nNumIndexes++;

nNumVerts++;

}

比较开销

现在让我们来比较这三种渲染模型的方式的开销。在CTriangleMesh类的统计中，Thunderbird的机身模型中共有3265个唯一的顶点(包含法线和纹理坐标)和11,112个索引。每个顶点和法线包含3个浮点值，纹理坐标包含两个浮点值。所以

3265*8=26120个浮点值。再乘以4有104,480字节，再加上使用short类型创建的索引数组11,112*2=22,224字节。总共有126,704个字节，约124kb

对比表格:

渲染模式

内存使用量

需要变换的顶点格式

立即模式

约130kb

11,112

显示列表

约460kb

11,112

顶点索引数组

约124kb

3,265

从上面的表格可以看出，顶点索引数组不但使用了更少的内存，而且仅仅需要处理其他模式三分之一的顶点。如果模型有许多尖锐的角或边那么共享的顶点数就较少，如果模型是平滑的表面那么共享的顶点数就多。使用顶点索引数组的方式能够极大的提升性能。

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。