1000字范文 > 《编译 - 编译杂记》GCC优化等级说明

《编译 - 编译杂记》GCC优化等级说明

时间：2022-07-10 20:40:21

GCC提供了大量的优化选项，通过不同的参数可以对编译时间、目标文件大小、执行效率三个维度进行平衡，不同的GCC版本器优化等级略有不同，笔者本文以GCC 7.5为例进行说明。

最新的版本是GCC 11.2.0。GCC 4.6.4以上的版本优化等级是一样的，只是优化选项略有差异。

1 优化等级说明

GCC 4.6.4以上版本，有-O0、-O1、-O2、-O3、-Os、-Ofast、-Og几个优化等级，参数 -O1、-O2、-O3 中，随着数字变大，代码的优化程度也越高，不过这在某种意义上来说，也是以牺牲程序的可调试性为代价的。

所有的优化选项如下：

-faggressive-loop-optimizations -falign-functions[=n]-falign-jumps[=n]-falign-labels[=n] -falign-loops[=n]-fassociative-math -fauto-profile -fauto-profile[=path]-fauto-inc-dec -fbranch-probabilities-fbranch-target-load-optimize -fbranch-target-load-optimize2-fbtr-bb-exclusive -fcaller-saves-fcombine-stack-adjustments -fconserve-stack-fcompare-elim -fcprop-registers -fcrossjumping-fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules-fcx-limited-range-fdata-sections -fdce -fdelayed-branch-fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively-fdevirtualize-at-ltrans -fdse-fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects-ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=style-fforward-propagate -ffp-contract=style -ffunction-sections-fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity-fgcse-sm -fhoist-adjacent-loads -fif-conversion-fif-conversion2 -findirect-inlining-finline-functions -finline-functions-called-once -finline-limit=n-finline-small-functions -fipa-cp -fipa-cp-clone-fipa-bit-cp -fipa-vrp-fipa-pta -fipa-profile -fipa-pure-const -fipa-reference -fipa-icf-fira-algorithm=algorithm-fira-region=region -fira-hoist-pressure-fira-loop-pressure -fno-ira-share-save-slots-fno-ira-share-spill-slots-fisolate-erroneous-paths-dereference -fisolate-erroneous-paths-attribute-fivopts -fkeep-inline-functions -fkeep-static-functions-fkeep-static-consts -flimit-function-alignment -flive-range-shrinkage-floop-block -floop-interchange -floop-strip-mine-floop-unroll-and-jam -floop-nest-optimize-floop-parallelize-all -flra-remat -flto -flto-compression-level-flto-partition=alg -fmerge-all-constants-fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves-fmove-loop-invariants -fno-branch-count-reg-fno-defer-pop -fno-fp-int-builtin-inexact -fno-function-cse-fno-guess-branch-probability -fno-inline -fno-math-errno -fno-peephole-fno-peephole2 -fno-printf-return-value -fno-sched-interblock-fno-sched-spec -fno-signed-zeros-fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss-fomit-frame-pointer -foptimize-sibling-calls-fpartial-inlining -fpeel-loops -fpredictive-commoning-fprefetch-loop-arrays-fprofile-correction-fprofile-use -fprofile-use=path -fprofile-values-fprofile-reorder-functions-freciprocal-math -free -frename-registers -freorder-blocks-freorder-blocks-algorithm=algorithm-freorder-blocks-and-partition -freorder-functions-frerun-cse-after-loop -freschedule-modulo-scheduled-loops-frounding-math -fsched2-use-superblocks -fsched-pressure-fsched-spec-load -fsched-spec-load-dangerous-fsched-stalled-insns-dep[=n] -fsched-stalled-insns[=n]-fsched-group-heuristic -fsched-critical-path-heuristic-fsched-spec-insn-heuristic -fsched-rank-heuristic-fsched-last-insn-heuristic -fsched-dep-count-heuristic-fschedule-fusion-fschedule-insns -fschedule-insns2 -fsection-anchors-fselective-scheduling -fselective-scheduling2-fsel-sched-pipelining -fsel-sched-pipelining-outer-loops-fsemantic-interposition -fshrink-wrap -fshrink-wrap-separate-fsignaling-nans-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops-fsplit-paths-fsplit-wide-types -fssa-backprop -fssa-phiopt-fstdarg-opt -fstore-merging -fstrict-aliasing-fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp-ftree-builtin-call-dce -ftree-ccp -ftree-ch-ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts-ftree-dse -ftree-forwprop -ftree-fre -fcode-hoisting-ftree-loop-if-convert -ftree-loop-im-ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns-ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize-ftree-loop-vectorize-ftree-parallelize-loops=n -ftree-pre -ftree-partial-pre -ftree-pta-ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra-ftree-switch-conversion -ftree-tail-merge-ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons-funit-at-a-time -funroll-all-loops -funroll-loops-funsafe-math-optimizations -funswitch-loops-fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt-fweb -fwhole-program -fwpa -fuse-linker-plugin

1.1 -O0

GCC编译的默认优化等级。如果没有指定上面的任何优化参数，则默认为 -O0，即没有任何选项优化。

1.2 -O1

这是最基本的优化的等级，该优化等级的目的是在短时间内生成可执行文件，主要对代码的分支、常量以及表达式等进行优化。该优化等级打开的选项如下：

-fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdse -fforward-propagate -fguess-branch-probability -fif-conversion2 -fif-conversion -finline-functions-called-once -fipa-pure-const -fipa-profile -fipa-reference -fmerge-constants -fmove-loop-invariants -freorder-blocks -fshrink-wrap -fshrink-wrap-separate -fsplit-wide-types -fssa-backprop -fssa-phiopt -ftree-bit-ccp -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-sink -ftree-slsr -ftree-sra -ftree-pta -ftree-ter -funit-at-a-time

1.3 -O2

与O1比较而言，O2优化增加了编译时间的基础上，提高了生成代码的执行效率。相对-O1打开了如下选项：

-fthread-jumps -falign-functions -falign-jumps -falign-loops -falign-labels -fcaller-saves -fcrossjumping -fcse-follow-jumps -fcse-skip-blocks -fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively -fexpensive-optimizations -fgcse -fgcse-lm -fhoist-adjacent-loads -finline-small-functions -findirect-inlining -fipa-cp -fipa-bit-cp -fipa-vrp -fipa-sra -fipa-icf -fisolate-erroneous-paths-dereference -flra-remat -foptimize-sibling-calls -foptimize-strlen -fpartial-inlining -fpeephole2 -freorder-blocks-algorithm=stc -freorder-blocks-and-partition -freorder-functions -frerun-cse-after-loop -fsched-interblock -fsched-spec -fschedule-insns -fschedule-insns2 -fstore-merging -fstrict-aliasing -fstrict-overflow -ftree-builtin-call-dce -ftree-switch-conversion -ftree-tail-merge -fcode-hoisting -ftree-pre -ftree-vrp -fipa-ra

1.4 -Os

-Os 是在 -O2 的基础上，去掉了那些会导致最终可执行程序增大的优化，如果想要更小的可执行程序，可选择这个参数。

在-O2基础上关闭了以下参数：

-falign-functions -falign-jumps -falign-loops -falign-labels -fprefetch-loop-arrays

1.5-O3

在-O2的基础上进行更多的优化，例如使用伪寄存器网络，普通函数的内联，以及针对循环的更多优化。该优化等级会延长编译时间，用-O3来编译所有的软件包将产生更大体积更耗内存的二进制文件，大大增加编译失败的机会或不可预知的程序行为，不建议使用。

该优化等级是在包含了O2所有的优化的基础上，打开了以下优化选项：

-finline-functions-funswitch-loops-fpredictive-commoning-fgcse-after-reload-ftree-loop-vectorize-ftree-loop-distribute-patterns-fsplit-paths -ftree-slp-vectorize-fvect-cost-model-ftree-partial-pre-fpeel-loops -fipa-cp-clone

1.6 -Ofast

-Ofast 是在 -O3 的基础上，添加了一些非常规优化，这些优化是通过打破一些国际标准（比如一些数学函数的实现标准）来实现的，所以一般不推荐使用该参数。

1.7 -Og

-Og 是在 -O1 的基础上，去掉了那些影响调试的优化，所以如果最终是为了调试程序，可以使用这个参数。不过光有这个参数也是不行的，这个参数只是告诉编译器，编译后的代码不要影响调试，但调试信息的生成还是靠 -g 参数的。

如果想看当前版本的GCC优化等级开启了何种选项，可以使用 gcc -Q --help=optimizers 命令来查询。

$gcc -Q --help=optimizers -O1

2 实例

接下来通过一个实例来说明优化等级的区别。

完整代码如下：

/********************************************************************************* @filemain.c* @author BruceOu* @version V1.0* @date-12-06* @blog/* @Official Accounts 嵌入式实验楼* @brief*******************************************************************************//**Includes*********************************************************************/#include <stdio.h>#include <stdio.h>#include <stdlib.h>#include <strings.h>/**Typedef**********************************************************************/typedef int data_t;typedef struct _node_{data_t data;struct _node_ *next;} linknode_t, *linklist;typedef struct{linklist front, rear;} linkqueue;/**Function********************************************************************/linkqueue *CreateEmptyLinkqueue();int EmptyLinkqueue(linkqueue *lqueue);int EnLinkqueue(linkqueue *lqueue, data_t x);int DeLinkqueue(linkqueue *lqueue,data_t *x);void Linkqueue_show(linkqueue *lqueue);void ClearLinkqueue(linkqueue *lqueue);void DestroyLinkqueue(linkqueue *lqueue);/*** @brief主函数* @paramNone* @retval None*/int main(int argc,char **argv){int i,x;linkqueue *lqueue;lqueue = CreateEmptyLinkqueue();for (i=1; i<=6; i++){EnLinkqueue(lqueue, i);}Linkqueue_show(lqueue);i = 3;while (i-- ){DeLinkqueue(lqueue,&x);printf("%d ",x);}printf("\n");Linkqueue_show(lqueue);ClearLinkqueue(lqueue);if(EmptyLinkqueue(lqueue)){printf("The lqueue is empty!\n"); }DestroyLinkqueue(lqueue);printf("The lqueue is destroyed!\n"); return 0;}/*** @brief创建链式队列函数* @paramNone* @retval 成功返回lq*/linkqueue *CreateEmptyLinkqueue(){linkqueue *lqueue;lqueue = (linkqueue *)malloc(sizeof(linkqueue));if(lqueue == NULL) return NULL; lqueue->front = lqueue->rear = (linklist)malloc(sizeof(linknode_t));if(lqueue->front == NULL) return NULL; lqueue->front->next = NULL;return lqueue;}/*** @brief判断链式队列是否为空函数* @paramlqueue* @retval 为空返回1，不为空返回0，失败返回-1*/int EmptyLinkqueue(linkqueue *lqueue){if(lqueue == NULL) return -1; return ((lqueue->front == lqueue->rear)?1:0);}/*** @brief链式队列入队函数* @paramlqueuex* @retval 成功返回0，失败返回-1*/int EnLinkqueue(linkqueue *lqueue, data_t x){linklist p;if(lqueue == NULL) return -1; p = (linklist)malloc(sizeof(linknode_t));if(p == NULL){return -1;}p->data = x;p->next = NULL;if(lqueue->front->next == NULL) {lqueue->front->next = lqueue->rear = p; } else {lqueue->rear->next = p; lqueue->rear = p; } return 0;}/*** @brief链式队列出队函数* @paramlqueuex* @retval 成功返回0，失败返回-1*/int DeLinkqueue(linkqueue *lqueue,data_t *x){linknode_t *node_remove; if(lqueue == NULL || lqueue->front->next == NULL) {return -1; }node_remove = lqueue->front->next; lqueue->front->next = node_remove->next; if(x != NULL) {*x = node_remove->data; }free(node_remove); return 0; }/*** @brief打印链式队列数据函数* @paramqueue* @retval None*/void Linkqueue_show(linkqueue *lqueue){linknode_t *p;if(lqueue->front) {p = lqueue->front->next;}while(p){printf("%d ",p->data);p = p->next;}printf("\n");}/*** @brief清空链式队列函数* @paramlqueue* @retval None*/void ClearLinkqueue(linkqueue *lqueue){linknode_t *qnode; while(lqueue->front) {qnode = lqueue->front; lqueue->front= qnode->next; free(qnode); } lqueue->rear = NULL;}/*** @brief摧毁链式队列函数* @paramlqueue* @retval None*/void DestroyLinkqueue(linkqueue *lqueue) {if(lqueue != NULL) {ClearLinkqueue(lqueue); free(lqueue); } }

默认的编译方式如下：

$gcc -O0 main.c -o main-O0

接下来选择更高的优化等级。

优化等级越高，所需时间越长，但程序运行起来一般会更高效。值得注意的是，-Os不仅优化了代码，而且优化了尺寸，因此相对其他优化方式尺寸更小，这个优化选项在嵌入式中就有尤为关键，毕竟MCU的资源比较稀缺。

看大小不好去比较，我们可以通过看看汇编文件。

GCC生成汇编 (Assembly)只需要加参数-S即可。。

$ gcc -O0 -S main.c -o main-O0.s
$ gcc -Os -S main.c -o main-Os.s

这里生成-O0和-Os不同优化等级的汇编文件。

【main-O0.s】

.file"main.c".text.section.rodata.LC0:.string"%d ".LC1:.string"The lqueue is empty!".LC2:.string"The lqueue is destroyed!".text.globlmain.typemain, @functionmain:.LFB5:.cfi_startprocpushq%rbp.cfi_def_cfa_offset 16.cfi_offset 6, -16movq%rsp, %rbp.cfi_def_cfa_register 6subq$48, %rspmovl%edi, -36(%rbp)movq%rsi, -48(%rbp)movq%fs:40, %raxmovq%rax, -8(%rbp)xorl%eax, %eaxmovl$0, %eaxcallCreateEmptyLinkqueuemovq%rax, -16(%rbp)movl$1, -20(%rbp)jmp.L2.L3:movl-20(%rbp), %edxmovq-16(%rbp), %raxmovl%edx, %esimovq%rax, %rdicallEnLinkqueueaddl$1, -20(%rbp).L2:cmpl$6, -20(%rbp)jle.L3movq-16(%rbp), %raxmovq%rax, %rdicallLinkqueue_showmovl$3, -20(%rbp)jmp.L4.L5:leaq-24(%rbp), %rdxmovq-16(%rbp), %raxmovq%rdx, %rsimovq%rax, %rdicallDeLinkqueuemovl-24(%rbp), %eaxmovl%eax, %esileaq.LC0(%rip), %rdimovl$0, %eaxcallprintf@PLT.L4:movl-20(%rbp), %eaxleal-1(%rax), %edxmovl%edx, -20(%rbp)testl%eax, %eaxjne.L5movl$10, %edicallputchar@PLTmovq-16(%rbp), %raxmovq%rax, %rdicallLinkqueue_showmovq-16(%rbp), %raxmovq%rax, %rdicallClearLinkqueuemovq-16(%rbp), %raxmovq%rax, %rdicallEmptyLinkqueuetestl%eax, %eaxje.L6leaq.LC1(%rip), %rdicallputs@PLT.L6:movq-16(%rbp), %raxmovq%rax, %rdicallDestroyLinkqueueleaq.LC2(%rip), %rdicallputs@PLTmovl$0, %eaxmovq-8(%rbp), %rcxxorq%fs:40, %rcxje.L8call__stack_chk_fail@PLT.L8:leave.cfi_def_cfa 7, 8ret.cfi_endproc.LFE5:.sizemain, .-main.globlCreateEmptyLinkqueue.typeCreateEmptyLinkqueue, @functionCreateEmptyLinkqueue:.LFB6:.cfi_startprocpushq%rbp.cfi_def_cfa_offset 16.cfi_offset 6, -16movq%rsp, %rbp.cfi_def_cfa_register 6subq$16, %rspmovl$16, %edicallmalloc@PLTmovq%rax, -8(%rbp)cmpq$0, -8(%rbp)jne.L10movl$0, %eaxjmp.L11.L10:movl$16, %edicallmalloc@PLTmovq%rax, %rdxmovq-8(%rbp), %raxmovq%rdx, 8(%rax)movq-8(%rbp), %raxmovq8(%rax), %rdxmovq-8(%rbp), %raxmovq%rdx, (%rax)movq-8(%rbp), %raxmovq(%rax), %raxtestq%rax, %raxjne.L12movl$0, %eaxjmp.L11.L12:movq-8(%rbp), %raxmovq(%rax), %raxmovq$0, 8(%rax)movq-8(%rbp), %rax.L11:leave.cfi_def_cfa 7, 8ret.cfi_endproc.LFE6:.sizeCreateEmptyLinkqueue, .-CreateEmptyLinkqueue.globlEmptyLinkqueue.typeEmptyLinkqueue, @functionEmptyLinkqueue:.LFB7:.cfi_startprocpushq%rbp.cfi_def_cfa_offset 16.cfi_offset 6, -16movq%rsp, %rbp.cfi_def_cfa_register 6movq%rdi, -8(%rbp)cmpq$0, -8(%rbp)jne.L14movl$-1, %eaxjmp.L15.L14:movq-8(%rbp), %raxmovq(%rax), %rdxmovq-8(%rbp), %raxmovq8(%rax), %raxcmpq%rax, %rdxsete%almovzbl%al, %eax.L15:popq%rbp.cfi_def_cfa 7, 8ret.cfi_endproc.LFE7:.sizeEmptyLinkqueue, .-EmptyLinkqueue.globlEnLinkqueue.typeEnLinkqueue, @functionEnLinkqueue:.LFB8:.cfi_startprocpushq%rbp.cfi_def_cfa_offset 16.cfi_offset 6, -16movq%rsp, %rbp.cfi_def_cfa_register 6subq$32, %rspmovq%rdi, -24(%rbp)movl%esi, -28(%rbp)cmpq$0, -24(%rbp)jne.L17movl$-1, %eaxjmp.L18.L17:movl$16, %edicallmalloc@PLTmovq%rax, -8(%rbp)cmpq$0, -8(%rbp)jne.L19movl$-1, %eaxjmp.L18.L19:movq-8(%rbp), %raxmovl-28(%rbp), %edxmovl%edx, (%rax)movq-8(%rbp), %raxmovq$0, 8(%rax)movq-24(%rbp), %raxmovq(%rax), %raxmovq8(%rax), %raxtestq%rax, %raxjne.L20movq-24(%rbp), %raxmovq-8(%rbp), %rdxmovq%rdx, 8(%rax)movq-24(%rbp), %raxmovq(%rax), %raxmovq-24(%rbp), %rdxmovq8(%rdx), %rdxmovq%rdx, 8(%rax)jmp.L21.L20:movq-24(%rbp), %raxmovq8(%rax), %raxmovq-8(%rbp), %rdxmovq%rdx, 8(%rax)movq-24(%rbp), %raxmovq-8(%rbp), %rdxmovq%rdx, 8(%rax).L21:movl$0, %eax.L18:leave.cfi_def_cfa 7, 8ret.cfi_endproc.LFE8:.sizeEnLinkqueue, .-EnLinkqueue.globlDeLinkqueue.typeDeLinkqueue, @functionDeLinkqueue:.LFB9:.cfi_startprocpushq%rbp.cfi_def_cfa_offset 16.cfi_offset 6, -16movq%rsp, %rbp.cfi_def_cfa_register 6subq$32, %rspmovq%rdi, -24(%rbp)movq%rsi, -32(%rbp)cmpq$0, -24(%rbp)je.L23movq-24(%rbp), %raxmovq(%rax), %raxmovq8(%rax), %raxtestq%rax, %raxjne.L24.L23:movl$-1, %eaxjmp.L25.L24:movq-24(%rbp), %raxmovq(%rax), %raxmovq8(%rax), %raxmovq%rax, -8(%rbp)movq-24(%rbp), %raxmovq(%rax), %raxmovq-8(%rbp), %rdxmovq8(%rdx), %rdxmovq%rdx, 8(%rax)cmpq$0, -32(%rbp)je.L26movq-8(%rbp), %raxmovl(%rax), %edxmovq-32(%rbp), %raxmovl%edx, (%rax).L26:movq-8(%rbp), %raxmovq%rax, %rdicallfree@PLTmovl$0, %eax.L25:leave.cfi_def_cfa 7, 8ret.cfi_endproc.LFE9:.sizeDeLinkqueue, .-DeLinkqueue.globlLinkqueue_show.typeLinkqueue_show, @functionLinkqueue_show:.LFB10:.cfi_startprocpushq%rbp.cfi_def_cfa_offset 16.cfi_offset 6, -16movq%rsp, %rbp.cfi_def_cfa_register 6subq$32, %rspmovq%rdi, -24(%rbp)movq-24(%rbp), %raxmovq(%rax), %raxtestq%rax, %raxje.L29movq-24(%rbp), %raxmovq(%rax), %raxmovq8(%rax), %raxmovq%rax, -8(%rbp)jmp.L29.L30:movq-8(%rbp), %raxmovl(%rax), %eaxmovl%eax, %esileaq.LC0(%rip), %rdimovl$0, %eaxcallprintf@PLTmovq-8(%rbp), %raxmovq8(%rax), %raxmovq%rax, -8(%rbp).L29:cmpq$0, -8(%rbp)jne.L30movl$10, %edicallputchar@PLTnopleave.cfi_def_cfa 7, 8ret.cfi_endproc.LFE10:.sizeLinkqueue_show, .-Linkqueue_show.globlClearLinkqueue.typeClearLinkqueue, @functionClearLinkqueue:.LFB11:.cfi_startprocpushq%rbp.cfi_def_cfa_offset 16.cfi_offset 6, -16movq%rsp, %rbp.cfi_def_cfa_register 6subq$32, %rspmovq%rdi, -24(%rbp)jmp.L32.L33:movq-24(%rbp), %raxmovq(%rax), %raxmovq%rax, -8(%rbp)movq-8(%rbp), %raxmovq8(%rax), %rdxmovq-24(%rbp), %raxmovq%rdx, (%rax)movq-8(%rbp), %raxmovq%rax, %rdicallfree@PLT.L32:movq-24(%rbp), %raxmovq(%rax), %raxtestq%rax, %raxjne.L33movq-24(%rbp), %raxmovq$0, 8(%rax)nopleave.cfi_def_cfa 7, 8ret.cfi_endproc.LFE11:.sizeClearLinkqueue, .-ClearLinkqueue.globlDestroyLinkqueue.typeDestroyLinkqueue, @functionDestroyLinkqueue:.LFB12:.cfi_startprocpushq%rbp.cfi_def_cfa_offset 16.cfi_offset 6, -16movq%rsp, %rbp.cfi_def_cfa_register 6subq$16, %rspmovq%rdi, -8(%rbp)cmpq$0, -8(%rbp)je.L36movq-8(%rbp), %raxmovq%rax, %rdicallClearLinkqueuemovq-8(%rbp), %raxmovq%rax, %rdicallfree@PLT.L36:nopleave.cfi_def_cfa 7, 8ret.cfi_endproc.LFE12:.sizeDestroyLinkqueue, .-DestroyLinkqueue.ident"GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0".section.note.GNU-stack,"",@progbits

【main-Os.s】

.file"main.c".text.globlCreateEmptyLinkqueue.typeCreateEmptyLinkqueue, @functionCreateEmptyLinkqueue:.LFB26:.cfi_startprocpushq%rbx.cfi_def_cfa_offset 16.cfi_offset 3, -16movl$16, %edicallmalloc@PLTtestq%rax, %raxjne.L2.L4:xorl%ebx, %ebxjmp.L1.L2:movl$16, %edimovq%rax, %rbxcallmalloc@PLTtestq%rax, %raxmovq%rax, 8(%rbx)movq%rax, (%rbx)je.L4movq$0, 8(%rax).L1:movq%rbx, %raxpopq%rbx.cfi_def_cfa_offset 8ret.cfi_endproc.LFE26:.sizeCreateEmptyLinkqueue, .-CreateEmptyLinkqueue.globlEmptyLinkqueue.typeEmptyLinkqueue, @functionEmptyLinkqueue:.LFB27:.cfi_startprocorl$-1, %eaxtestq%rdi, %rdije.L10movq8(%rdi), %raxcmpq%rax, (%rdi)sete%almovzbl%al, %eax.L10:ret.cfi_endproc.LFE27:.sizeEmptyLinkqueue, .-EmptyLinkqueue.globlEnLinkqueue.typeEnLinkqueue, @functionEnLinkqueue:.LFB28:.cfi_startproctestq%rdi, %rdije.L25pushq%rbp.cfi_def_cfa_offset 16.cfi_offset 6, -16pushq%rbx.cfi_def_cfa_offset 24.cfi_offset 3, -24movq%rdi, %rbxmovl$16, %edimovl%esi, %ebpsubq$8, %rsp.cfi_def_cfa_offset 32callmalloc@PLTtestq%rax, %raxjne.L26orl$-1, %eaxjmp.L13.L26:movq(%rbx), %rdxmovq$0, 8(%rax)movl%ebp, (%rax)cmpq$0, 8(%rdx)jne.L17movq%rax, 8(%rbx)movq%rax, 8(%rdx)jmp.L24.L17:movq8(%rbx), %rdxmovq%rax, 8(%rdx)movq%rax, 8(%rbx).L24:xorl%eax, %eax.L13:popq%rdx.cfi_def_cfa_offset 24popq%rbx.cfi_def_cfa_offset 16popq%rbp.cfi_def_cfa_offset 8ret.L25:.cfi_restore 3.cfi_restore 6orl$-1, %eaxret.cfi_endproc.LFE28:.sizeEnLinkqueue, .-EnLinkqueue.globlDeLinkqueue.typeDeLinkqueue, @functionDeLinkqueue:.LFB29:.cfi_startprocorl$-1, %eaxtestq%rdi, %rdije.L36movq(%rdi), %rdxmovq8(%rdx), %rditestq%rdi, %rdije.L36subq$8, %rsp.cfi_def_cfa_offset 16movq8(%rdi), %raxtestq%rsi, %rsimovq%rax, 8(%rdx)je.L29movl(%rdi), %eaxmovl%eax, (%rsi).L29:callfree@PLTxorl%eax, %eaxpopq%rdx.cfi_def_cfa_offset 8ret.L36:ret.cfi_endproc.LFE29:.sizeDeLinkqueue, .-DeLinkqueue.section.rodata.str1.1,"aMS",@progbits,1.LC0:.string"%d ".text.globlLinkqueue_show.typeLinkqueue_show, @functionLinkqueue_show:.LFB30:.cfi_startprocpushq%rbp.cfi_def_cfa_offset 16.cfi_offset 6, -16pushq%rbx.cfi_def_cfa_offset 24.cfi_offset 3, -24subq$8, %rsp.cfi_def_cfa_offset 32movq(%rdi), %raxtestq%rax, %raxje.L40movq8(%rax), %rbx.L40:leaq.LC0(%rip), %rbp.L41:testq%rbx, %rbxje.L47movl(%rbx), %edxmovq%rbp, %rsimovl$1, %edixorl%eax, %eaxcall__printf_chk@PLTmovq8(%rbx), %rbxjmp.L41.L47:popq%rax.cfi_def_cfa_offset 24popq%rbx.cfi_def_cfa_offset 16popq%rbp.cfi_def_cfa_offset 8movl$10, %edijmpputchar@PLT.cfi_endproc.LFE30:.sizeLinkqueue_show, .-Linkqueue_show.globlClearLinkqueue.typeClearLinkqueue, @functionClearLinkqueue:.LFB31:.cfi_startprocpushq%rbx.cfi_def_cfa_offset 16.cfi_offset 3, -16movq%rdi, %rbx.L49:movq(%rbx), %rditestq%rdi, %rdije.L52movq8(%rdi), %raxmovq%rax, (%rbx)callfree@PLTjmp.L49.L52:movq$0, 8(%rbx)popq%rbx.cfi_def_cfa_offset 8ret.cfi_endproc.LFE31:.sizeClearLinkqueue, .-ClearLinkqueue.globlDestroyLinkqueue.typeDestroyLinkqueue, @functionDestroyLinkqueue:.LFB32:.cfi_startproctestq%rdi, %rdije.L53pushq%rbx.cfi_def_cfa_offset 16.cfi_offset 3, -16movq%rdi, %rbxcallClearLinkqueuemovq%rbx, %rdipopq%rbx.cfi_restore 3.cfi_def_cfa_offset 8jmpfree@PLT.L53:ret.cfi_endproc.LFE32:.sizeDestroyLinkqueue, .-DestroyLinkqueue.section.rodata.str1.1.LC1:.string"The lqueue is empty!".LC2:.string"The lqueue is destroyed!".section.text.startup,"ax",@progbits.globlmain.typemain, @functionmain:.LFB25:.cfi_startprocpushq%r12.cfi_def_cfa_offset 16.cfi_offset 12, -16pushq%rbp.cfi_def_cfa_offset 24.cfi_offset 6, -24movl$1, %ebppushq%rbx.cfi_def_cfa_offset 32.cfi_offset 3, -32subq$16, %rsp.cfi_def_cfa_offset 48movq%fs:40, %raxmovq%rax, 8(%rsp)xorl%eax, %eaxcallCreateEmptyLinkqueuemovq%rax, %rbx.L59:movl%ebp, %esimovq%rbx, %rdiincl%ebpcallEnLinkqueuecmpl$7, %ebpjne.L59leaq4(%rsp), %r12movq%rbx, %rdimovl$4, %ebpcallLinkqueue_show.L60:decl%ebpje.L69movq%r12, %rsimovq%rbx, %rdicallDeLinkqueuemovl4(%rsp), %edxleaq.LC0(%rip), %rsimovl$1, %edixorl%eax, %eaxcall__printf_chk@PLTjmp.L60.L69:movl$10, %edicallputchar@PLTmovq%rbx, %rdicallLinkqueue_showmovq%rbx, %rdicallClearLinkqueuemovq%rbx, %rdicallEmptyLinkqueuetestl%eax, %eaxje.L62leaq.LC1(%rip), %rdicallputs@PLT.L62:movq%rbx, %rdicallDestroyLinkqueueleaq.LC2(%rip), %rdicallputs@PLTxorl%eax, %eaxmovq8(%rsp), %rcxxorq%fs:40, %rcxje.L63call__stack_chk_fail@PLT.L63:addq$16, %rsp.cfi_def_cfa_offset 32popq%rbx.cfi_def_cfa_offset 24popq%rbp.cfi_def_cfa_offset 16popq%r12.cfi_def_cfa_offset 8ret.cfi_endproc.LFE25:.sizemain, .-main.ident"GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0".section.note.GNU-stack,"",@progbits

从汇编代码可以看出，不仅大小有所变化，其中有很多细节都是被优化过的。

关于GCC的的优化选项可参看GCC官方手册。

Optimize-Options

笔者本文是就gcc做的分析，arm-gcc也是差不多，不同版本有些许选项不同罢了。