@echo offrem call "D:/Microsoft Visual Studio 10.0/VC/bin/VCVARS32.BAT"call "C:/Program Files/Intel/Parallel Studio 2011/ips-vars.cmd"icl /c /o3 fftsg_h.c currTime.c
icl /c /o3 test_speedFFT.cpp
xilink /subsystem:console test_speedFFT.obj fftsg_h.obj currTime.obj
rem icl /help
del *.obj
pause
test_speedFFT.exe
用Intel C++编译器可以提升性能。
使用Intel C++编译得到的fft结果:
len fft time time/(n*log2(n) 64 0.00000057 0.000000001481 128 0.00000138 0.000000001544 256 0.00000269 0.000000001313 512 0.00000778 0.000000001688 1024 0.00001426 0.000000001393 2048 0.00003525 0.000000001565 4096 0.00007433 0.000000001512 8192 0.00016798 0.000000001577 16384 0.00033520 0.000000001461 32768 0.00079489 0.000000001617 65536 0.00160501 0.000000001531 131072 0.00372449 0.000000001672 262144 0.00771416 0.000000001635 524288 0.01943485 0.000000001951 1048576 0.05234874 0.000000002496 2097152 0.10692816 0.000000002428
使用vs2010中的cl编译的结果:
len fft time time/(n*log2(n) 64 0.00000295 0.000000007686 128 0.00000363 0.000000004052 256 0.00000418 0.000000002039 512 0.00001109 0.000000002407 1024 0.00002207 0.000000002155 2048 0.00005774 0.000000002563 4096 0.00011116 0.000000002262 8192 0.00027207 0.000000002555 16384 0.00053441 0.000000002330 32768 0.00125360 0.000000002550 65536 0.00258079 0.000000002461 131072 0.00585856 0.000000002629 262144 0.01220177 0.000000002586 524288 0.03064302 0.000000003076 1048576 0.07136638 0.000000003403 2097152 0.14600080 0.000000003315
命令:
icl /help 得到命令行参考
Intel(R) C++ Compiler Help ==========================
usage: icl [options] file1 [file2 ...] [/link linker_options]
where options represents zero or more compiler options
fileN is a C/C++ source (.c .cc .cpp .cxx .i), assembly (.asm), object (.obj), static library (.lib), or other linkable file linker_options represents zero or more linker options
Notes-----1. Most Microsoft* Visual C++* compiler options are supported; a warning is printed for most unsupported options. The precise behavior of performance options does not always match that of the Microsoft Visual C++ compiler.
2. Intel C++ compiler options may be placed in your icl.cfg file.
3. Most options beginning with /Q are specific to the Intel C++ compiler: (*I) indicates other options specific to the Intel C++ compiler (*M) indicates /Q options supported by the Microsoft Visual C++ compiler
Some options listed are only available on a specific system[press RETURN to continue] i32 indicates the feature is available on systems based on IA-32 architecture i64em indicates the feature is available on systems using Intel(R) 64 architecture
Compiler Option List --------------------
Optimization------------
/O1 optimize for maximum speed, but disable some optimizations which increase code size for a small speed benefit/O2 optimize for maximum speed (DEFAULT)/O3 optimize for maximum speed and enable more aggressive optimizations that may not improve performance on some programs/Ox enable maximum optimizations (same as /O2)/Os enable speed optimizations, but disable some optimizations which increase code size for small speed benefit (overrides /Ot)/Ot enable speed optimizations (overrides /Os)/Od disable optimizations/Oi[-] enable/disable inline expansion of intrinsic functions/Oy[-] enable/disable using EBP as a general purpose register (no frame pointer) (i32 only)[press RETURN to continue]/fast enable /QxHOST /O3 /Qipo /Qprec-div- options set by /fast cannot be overridden with the exception of /QxHOST, list options separately to change behavior/Oa[-] assume no aliasing in program/Ow[-] assume no aliasing within functions, but assume aliasing across calls
Code Generation---------------
/Qx<code> generate specialized code to run exclusively on processors indicated by <code> as described below Host generate instructions for the highest instruction set and processor available on the compilation host machine SSE2 Intel Pentium 4 and compatible Intel processors. Enables new optimizations in addition to Intel processor-specific optimizations SSE3 Intel(R) Core(TM) processor family with Streaming SIMD Extensions 3 (Intel(R) SSE3) instruction support SSSE3 Intel(R) Core(TM)2 processor family with Supplemental Streaming SIMD Extensions 3 (SSSE3) SSE4.1 Intel(R) 45nm Hi-k next generation Intel Core(TM) microarchitecture with support for Streaming SIMD Extensions 4 (Intel(R) SSE4) Vectorizing[press RETURN to continue] Compiler and Media Accelerator instructions SSE4.2 Can generate Intel(R) SSE4 Efficient Accelerated String and Text Processing instructions supported by Intel(R) Core(TM) i7 processors. Can generate Intel(R) SSE4 Vectorizing Compiler and Media Accelerator, Intel(R) SSSE3, SSE3, SSE2, and SSE instructions and it can optimize for the Intel(R) Core(TM) processor family. AVX Enable Intel(R) Advanced Vector Extensions instructions SSE3_ATOM Can generate MOVBE instructions for Intel processors and can optimize for the Intel(R) Atom(TM) processor./Qax<code1>[,<code2>,...] generate code specialized for processors specified by <codes> while also generating generic IA-32 instructions. <codes> includes one or more of the following: SSE2 Intel Pentium 4 and compatible Intel processors. Enables new optimizations in addition to Intel processor-specific optimizations SSE3 Intel(R) Core(TM) processor family with Streaming SIMD Extensions 3 (Intel(R) SSE3) instruction support SSSE3 Intel(R) Core(TM)2 processor family with Supplemental Streaming SIMD Extensions 3 (SSSE3) SSE4.1 Intel(R) 45nm Hi-k next generation Intel Core(TM) microarchitecture with support for Streaming SIMD Extensions 4 (Intel(R) SSE4) Vectorizing[press RETURN to continue] Compiler and Media Accelerator instructions SSE4.2 Can generate Intel(R) SSE4 Efficient Accelerated String and Text Processing instructions supported by Intel(R) Core(TM) i7 processors. Can generate Intel(R) SSE4 Vectorizing Compiler and Media Accelerator, Intel(R) SSSE3, SSE3, SSE2, and SSE instructions and it can optimize for the Intel(R) Core(TM) processor family. AVX Enable Intel(R) Advanced Vector Extensions instructions/arch:<code> generate specialized code to optimize for processors indicated by <code> as described below SSE Intel Pentium III and compatible Intel processors SSE2 Intel Pentium 4 and compatible Intel processors. Enables new optimizations in addition to Intel processor-specific optimizations SSE3 Intel(R) Core(TM) processor family. Code is expected to run properly on any processor that supports SSE3, SSE2 and SSE instruction sets SSSE3 Intel(R) Core(TM)2 processor family with Supplemental Streaming SIMD Extensions 3 (SSSE3) SSE4.1 Intel(R) 45nm Hi-k next generation Intel Core(TM) microarchitecture with support for Streaming SIMD Extensions 4 (Intel(R) SSE4) Vectorizing Compiler and Media Accelerator instructions[press RETURN to continue] IA32 generate generic IA-32 architecture code for Intel Pentium III and compatible Intel processors. Disables any default or previously set extended instruction setting/Qinstruction:<keyword> Refine instruction set output for the selected target processor
[no]movbe - Do/do not generate MOVBE instructions with SSE3_ATOM (requires /QxSSE3_ATOM)
/GR[-] enable/disable C++ RTTI/Qcxx-features enable standard C++ features (/GX /GR)/EHa enable asynchronous C++ exception handling model/EHs enable synchronous C++ exception handling model/EHc assume extern "C" functions do not throw exceptions/Qsafeseh[-] Registers exceptions for safe exception handling (DEFAULT)/Gd make __cdecl the default calling convention/Gr make __fastcall the default calling convention/Gz make __stdcall the default calling convention/Qregcall make __regcall the default calling convention/hotpatch[:n] generate padding bytes for function entries to enable image[press RETURN to continue] hotpatching. If specified, use 'n' as the padding.
Interprocedural Optimization (IPO)----------------------------------
/Qip[-] enable(DEFAULT)/disable single-file IP optimization within files/Qipo[n] enable multi-file IP optimization between files/Qipo-c generate a multi-file object file (ipo_out.obj)/Qipo-S generate a multi-file assembly file (ipo_out.asm)/Qip-no-inlining disable full and partial inlining/Qip-no-pinlining disable partial inlining/Qipo-separate create one object file for every source file (overrides /Qipo[n])/Qipo-jobs<n> specify the number of jobs to be executed simultaneously during the IPO link phase
Advanced Optimizations----------------------
/Qunroll[n][press RETURN to continue] set maximum number of times to unroll loops. Omit n to use default heuristics. Use n=0 to disable the loop unroller/Qunroll-aggressive[-] enables more aggressive unrolling heuristics/Qscalar-rep[-] enable(DEFAULT)/disable scalar replacement (requires /O3)/Qansi-alias[-] enable/disable(DEFAULT) use of ANSI aliasing rules optimizations; user asserts that the program adheres to these rules/Qansi-alias-check[-] enable(DEFAULT)/disable ANSI alias checking when using /Qansi-alias/Qcomplex-limited-range[-] enable/disable(DEFAULT) the use of the basic algebraic expansions of some complex arithmetic operations. This can allow for some performance improvement in programs which use a lot of complex arithmetic at the loss of some exponent range./Qalias-const[-] enable/disable(DEFAULT) a heuristic stating that if two arguments to a function have pointer type, a pointer to const does not alias a pointer to non-const. Also known as the input/output buffer rule, it assumes that input and output buffer arguments do not overlap./Qalias-args[-] enable(DEFAULT)/disable C/C++ rule that function arguments may be aliased; when disabling the rule, the user asserts that this is safe[press RETURN to continue]/Qopt-multi-version-aggressive[-] enables more aggressive multi-versioning to check for pointer aliasing and scalar replacement/Qopt-ra-region-strategy[:<keyword>] select the method that the register allocator uses to partition each routine into regions routine - one region per routine block - one region per block trace - one region per trace loop - one region per loop default - compiler selects best option/Qvec[-] enables(DEFAULT)/disables vectorization/Qvec-guard-write[-] enables cache/bandwidth optimization for stores under conditionals within vector loops/Qvec-threshold[n] sets a threshold for the vectorization of loops based on the probability of profitable execution of the vectorized loop in parallel/Qopt-malloc-options:{0|1|2|3|4} specify malloc configuration parameters. Specifying a non-zero <n> value will cause alternate configuration parameters to be set for how malloc allocates and frees memory/Qopt-jump-tables:<arg>[press RETURN to continue] control the generation of jump tables default - let the compiler decide when a jump table, a series of if-then-else constructs or a combination is generated large - generate jump tables up to a certain pre-defined size (64K entries) <n> - generate jump tables up to <n> in size use /Qopt-jump-tables- to lower switch statements as chains of if-then-else constructs/Qopt-block-factor:<n> specify blocking factor for loop blocking/Qfreestanding compile in a freestanding environment where the standard library may not be present/Qopt-streaming-stores:<arg> specifies whether streaming stores are generated always - enables generation of streaming stores under the assumption that the application is memory bound auto - compiler decides when streaming stores are used (DEFAULT) never - disables generation of streaming stores/Qipp[:<arg>] link some or all of the Intel(R) Integrated Performance Primitives (Intel(R) IPP) libraries and bring in the associated headers common - link using the main libraries set. This is the default value when /Qipp is specified[press RETURN to continue] crypto - link using the main libraries set and the crypto library/Qmkl[:<arg>] link to the Intel(R) Math Kernel Library (Intel(R) MKL) and bring in the associated headers parallel - link using the threaded Intel(R) MKL libraries. This is the default when /Qmkl is specified sequential - link using the non-threaded Intel(R) MKL libraries cluster - link using the Intel(R) MKL Cluster libraries plus the sequential Intel(R) MKL libraries/Qtbb link to the Intel(R) Threading Building Blocks (Intel(R) TBB) libraries and bring in the associated headers/Qopt-subscript-in-range[-] assumes no overflows in the intermediate computation of the subscripts/Quse-intel-optimized-headers[-] take advantage of the optimized header files/Qcilk-serialize run a Cilk program as a C/C++ serialized program/Qarray-notation[-] enable/disable(DEFAULT) C/C++ array extensions for data parallel programming/Qopt-matmul[-] replace matrix multiplication with calls to intrinsics and threading[press RETURN to continue] libraries for improved performance (DEFAULT at /O3 /Qparallel)/Qsimd[-] enables(DEFAULT)/disables vectorization using simd pragma/Qguide-opts:<arg> tells the compiler to analyze certain code and generate recommendations that may improve optimizations/Qguide-file[:<filename>] causes the results of guided auto-parallelization to be output to a file/Qguide-file-append[:<filename>] causes the results of guided auto-parallelization to be appended to a file/Qguide[:<level>] lets you set a level (1 - 4) of guidance for auto-vectorization, auto-parallelization, and data transformation (DEFAULT is 4 when the option is specified)/Qguide-data-trans[:<level>] lets you set a level (1 - 4) of guidance for data transformation (DEFAULT is 4 when the option is specified)/Qguide-par[:<level>] lets you set a level (1 - 4) of guidance for auto-parallelization (DEFAULT is 4 when the option is specified)/Qguide-vec[:<level>] lets you set a level (1 - 4) of guidance for auto-vectorization[press RETURN to continue] (DEFAULT is 4 when the option is specified)
Profile Guided Optimization (PGO)---------------------------------
/Qprof-dir <dir> specify directory for profiling output files (*.dyn and *.dpi)/Qprof-src-root <dir> specify project root directory for application source files to enable relative path resolution during profile feedback on sources below that directory/Qprof-src-root-cwd specify the current directory as the project root directory for application source files to enable relative path resolution during profile feedback on sources below that directory/Qprof-src-dir[-] specify whether directory names of sources should be considered when looking up profile records within the .dpi file/Qprof-file <file> specify file name for profiling summary file/Qprof-data-order[-] enable/disable(DEFAULT) static data ordering with profiling/Qprof-func-order[-] enable/disable(DEFAULT) function ordering with profiling[press RETURN to continue]/Qprof-gen[:keyword] instrument program for profiling. Optional keyword may be srcpos or globdata/Qprof-gen- disable profiling instrumentation/Qprof-use[:<arg>] enable use of profiling information during optimization weighted - invokes profmerge with -weighted option to scale data based on run durations [no]merge - enable(default)/disable the invocation of the profmerge tool/Qprof-use- disable use of profiling information during optimization/Qcov-gen instrument program for profiling/Qcov-dir <dir> specify directory for profiling output files (*.dyn and *.dpi)/Qcov-file <file> specify file name for profiling summary file/Qfnsplit[-] enable/disable function splitting (enabled with /Qprof-use)/Qopt-prefetch[:n] enable levels of prefetch insertion, where 0 disables. n may be 0 through 4 inclusive. Default is 2.[press RETURN to continue]
......