Parallel Studio 2011 中的Intel c++命令行

    技术2022-05-20  59

    @echo offrem call "D:/Microsoft Visual Studio 10.0/VC/bin/VCVARS32.BAT"call "C:/Program Files/Intel/Parallel Studio 2011/ips-vars.cmd"icl /c /o3 fftsg_h.c currTime.c

    icl /c /o3 test_speedFFT.cpp

    xilink /subsystem:console test_speedFFT.obj fftsg_h.obj currTime.obj

    rem icl /help

    del *.obj

    pause

    test_speedFFT.exe

     

     

    用Intel C++编译器可以提升性能。

     

    使用Intel C++编译得到的fft结果:

          len    fft time         time/(n*log2(n)      64  0.00000057  0.000000001481     128  0.00000138  0.000000001544     256  0.00000269  0.000000001313     512  0.00000778  0.000000001688    1024  0.00001426  0.000000001393    2048  0.00003525  0.000000001565    4096  0.00007433  0.000000001512    8192  0.00016798  0.000000001577   16384  0.00033520  0.000000001461   32768  0.00079489  0.000000001617   65536  0.00160501  0.000000001531  131072  0.00372449  0.000000001672  262144  0.00771416  0.000000001635  524288  0.01943485  0.000000001951 1048576  0.05234874  0.000000002496 2097152  0.10692816  0.000000002428

     

     

    使用vs2010中的cl编译的结果:

          len     fft time          time/(n*log2(n)      64  0.00000295  0.000000007686     128  0.00000363  0.000000004052     256  0.00000418  0.000000002039     512  0.00001109  0.000000002407    1024  0.00002207  0.000000002155    2048  0.00005774  0.000000002563    4096  0.00011116  0.000000002262    8192  0.00027207  0.000000002555   16384  0.00053441  0.000000002330   32768  0.00125360  0.000000002550   65536  0.00258079  0.000000002461  131072  0.00585856  0.000000002629  262144  0.01220177  0.000000002586  524288  0.03064302  0.000000003076 1048576  0.07136638  0.000000003403 2097152  0.14600080  0.000000003315

     

    命令:

    icl  /help   得到命令行参考

                              Intel(R) C++ Compiler Help                          ==========================

      usage: icl [options] file1 [file2 ...] [/link linker_options]

         where options represents zero or more compiler options

         fileN is a C/C++ source (.c .cc .cpp .cxx .i), assembly (.asm),     object (.obj), static library (.lib), or other linkable file     linker_options represents zero or more linker options

    Notes-----1. Most Microsoft* Visual C++* compiler options are supported; a warning is   printed for most unsupported options.  The precise behavior of performance   options does not always match that of the Microsoft Visual C++ compiler.

    2. Intel C++ compiler options may be placed in your icl.cfg file.

    3. Most options beginning with /Q are specific to the Intel C++ compiler:   (*I) indicates other options specific to the Intel C++ compiler   (*M) indicates /Q options supported by the Microsoft Visual C++ compiler

       Some options listed are only available on a specific system[press RETURN to continue]   i32    indicates the feature is available on systems based on IA-32          architecture   i64em  indicates the feature is available on systems using Intel(R) 64          architecture

                                 Compiler Option List                             --------------------

    Optimization------------

    /O1       optimize for maximum speed, but disable some optimizations which          increase code size for a small speed benefit/O2       optimize for maximum speed (DEFAULT)/O3       optimize for maximum speed and enable more aggressive optimizations          that may not improve performance on some programs/Ox       enable maximum optimizations (same as /O2)/Os       enable speed optimizations, but disable some optimizations which          increase code size for small speed benefit (overrides /Ot)/Ot       enable speed optimizations (overrides /Os)/Od       disable optimizations/Oi[-]    enable/disable inline expansion of intrinsic functions/Oy[-]    enable/disable using EBP as a general purpose register (no frame          pointer) (i32 only)[press RETURN to continue]/fast     enable /QxHOST /O3 /Qipo /Qprec-div-          options set by /fast cannot be overridden with the exception of          /QxHOST, list options separately to change behavior/Oa[-]    assume no aliasing in program/Ow[-]    assume no aliasing within functions, but assume aliasing across calls

    Code Generation---------------

    /Qx<code>          generate specialized code to run exclusively on processors          indicated by <code> as described below            Host generate instructions for the highest instruction set and                 processor available on the compilation host machine            SSE2 Intel Pentium 4 and compatible Intel processors.  Enables new                 optimizations in addition to Intel processor-specific                 optimizations            SSE3    Intel(R) Core(TM) processor family with Streaming SIMD                    Extensions 3 (Intel(R) SSE3) instruction support            SSSE3   Intel(R) Core(TM)2 processor family with Supplemental                    Streaming SIMD Extensions 3 (SSSE3)            SSE4.1  Intel(R) 45nm Hi-k next generation Intel Core(TM)                    microarchitecture with support for Streaming SIMD                    Extensions 4 (Intel(R) SSE4) Vectorizing[press RETURN to continue]                    Compiler and Media Accelerator instructions            SSE4.2  Can generate Intel(R) SSE4 Efficient Accelerated String                    and Text Processing instructions supported by Intel(R)                    Core(TM) i7 processors. Can generate Intel(R) SSE4                    Vectorizing Compiler and Media Accelerator, Intel(R) SSSE3,                    SSE3, SSE2, and SSE instructions and it can optimize for                    the Intel(R) Core(TM) processor family.            AVX     Enable Intel(R) Advanced Vector Extensions instructions            SSE3_ATOM Can generate MOVBE instructions for Intel processors and                      can optimize for the Intel(R) Atom(TM) processor./Qax<code1>[,<code2>,...]          generate code specialized for processors specified by <codes>          while also generating generic IA-32 instructions.          <codes> includes one or more of the following:            SSE2 Intel Pentium 4 and compatible Intel processors.  Enables new                 optimizations in addition to Intel processor-specific                 optimizations            SSE3    Intel(R) Core(TM) processor family with Streaming SIMD                    Extensions 3 (Intel(R) SSE3) instruction support            SSSE3   Intel(R) Core(TM)2 processor family with Supplemental                    Streaming SIMD Extensions 3 (SSSE3)            SSE4.1  Intel(R) 45nm Hi-k next generation Intel Core(TM)                    microarchitecture with support for Streaming SIMD                    Extensions 4 (Intel(R) SSE4) Vectorizing[press RETURN to continue]                    Compiler and Media Accelerator instructions            SSE4.2  Can generate Intel(R) SSE4 Efficient Accelerated String                    and Text Processing instructions supported by Intel(R)                    Core(TM) i7 processors. Can generate Intel(R) SSE4                    Vectorizing Compiler and Media Accelerator, Intel(R) SSSE3,                    SSE3, SSE2, and SSE instructions and it can optimize for                    the Intel(R) Core(TM) processor family.            AVX     Enable Intel(R) Advanced Vector Extensions instructions/arch:<code>          generate specialized code to optimize for processors indicated by          <code> as described below            SSE  Intel Pentium III and compatible Intel processors            SSE2 Intel Pentium 4 and compatible Intel processors.  Enables new                 optimizations in addition to Intel processor-specific                 optimizations            SSE3 Intel(R) Core(TM) processor family.  Code is expected to run                 properly on any processor that supports SSE3, SSE2 and SSE                 instruction sets            SSSE3   Intel(R) Core(TM)2 processor family with Supplemental                    Streaming SIMD Extensions 3 (SSSE3)            SSE4.1  Intel(R) 45nm Hi-k next generation Intel Core(TM)                    microarchitecture with support for Streaming SIMD                    Extensions 4 (Intel(R) SSE4) Vectorizing                    Compiler and Media Accelerator instructions[press RETURN to continue]            IA32    generate generic IA-32 architecture code for Intel Pentium                    III and compatible Intel processors.  Disables any default                    or previously set extended instruction setting/Qinstruction:<keyword>          Refine instruction set output for the selected target processor

                [no]movbe  - Do/do not generate MOVBE instructions with SSE3_ATOM                          (requires /QxSSE3_ATOM)

    /GR[-]    enable/disable C++ RTTI/Qcxx-features          enable standard C++ features (/GX /GR)/EHa      enable asynchronous C++ exception handling model/EHs      enable synchronous C++ exception handling model/EHc      assume extern "C" functions do not throw exceptions/Qsafeseh[-]          Registers exceptions for safe exception handling (DEFAULT)/Gd       make __cdecl the default calling convention/Gr       make __fastcall the default calling convention/Gz       make __stdcall the default calling convention/Qregcall          make __regcall the default calling convention/hotpatch[:n]          generate padding bytes for function entries to enable image[press RETURN to continue]          hotpatching. If specified, use 'n' as the padding.

    Interprocedural Optimization (IPO)----------------------------------

    /Qip[-]   enable(DEFAULT)/disable single-file IP optimization          within files/Qipo[n]  enable multi-file IP optimization between files/Qipo-c   generate a multi-file object file (ipo_out.obj)/Qipo-S   generate a multi-file assembly file (ipo_out.asm)/Qip-no-inlining          disable full and partial inlining/Qip-no-pinlining          disable partial inlining/Qipo-separate          create one object file for every source file (overrides /Qipo[n])/Qipo-jobs<n>          specify the number of jobs to be executed simultaneously during the          IPO link phase

    Advanced Optimizations----------------------

    /Qunroll[n][press RETURN to continue]          set maximum number of times to unroll loops.  Omit n to use default          heuristics.  Use n=0 to disable the loop unroller/Qunroll-aggressive[-]          enables more aggressive unrolling heuristics/Qscalar-rep[-]          enable(DEFAULT)/disable scalar replacement (requires /O3)/Qansi-alias[-]          enable/disable(DEFAULT) use of ANSI aliasing rules optimizations;          user asserts that the program adheres to these rules/Qansi-alias-check[-]          enable(DEFAULT)/disable ANSI alias checking when using /Qansi-alias/Qcomplex-limited-range[-]          enable/disable(DEFAULT) the use of the basic algebraic expansions of          some complex arithmetic operations.  This can allow for some          performance improvement in programs which use a lot of complex          arithmetic at the loss of some exponent range./Qalias-const[-]          enable/disable(DEFAULT) a heuristic stating that if two arguments to          a function have pointer type, a pointer to const does not alias a          pointer to non-const. Also known as the input/output buffer rule, it          assumes that input and output buffer arguments do not overlap./Qalias-args[-]          enable(DEFAULT)/disable C/C++ rule that function arguments may be          aliased; when disabling the rule, the user asserts that this is safe[press RETURN to continue]/Qopt-multi-version-aggressive[-]          enables more aggressive multi-versioning to check for pointer          aliasing and scalar replacement/Qopt-ra-region-strategy[:<keyword>]          select the method that the register allocator uses to partition each          routine into regions            routine - one region per routine            block   - one region per block            trace   - one region per trace            loop    - one region per loop            default - compiler selects best option/Qvec[-]  enables(DEFAULT)/disables vectorization/Qvec-guard-write[-]          enables cache/bandwidth optimization for stores under conditionals          within vector loops/Qvec-threshold[n]          sets a threshold for the vectorization of loops based on the          probability of profitable execution of the vectorized loop in          parallel/Qopt-malloc-options:{0|1|2|3|4}          specify malloc configuration parameters.  Specifying a non-zero <n>          value will cause alternate configuration parameters to be set for          how malloc allocates and frees memory/Qopt-jump-tables:<arg>[press RETURN to continue]          control the generation of jump tables            default - let the compiler decide when a jump table, a series of                      if-then-else constructs or a combination is generated            large   - generate jump tables up to a certain pre-defined size                      (64K entries)            <n>     - generate jump tables up to <n> in size          use /Qopt-jump-tables- to lower switch statements as chains of          if-then-else constructs/Qopt-block-factor:<n>          specify blocking factor for loop blocking/Qfreestanding          compile in a freestanding environment where the standard library          may not be present/Qopt-streaming-stores:<arg>          specifies whether streaming stores are generated            always - enables generation of streaming stores under the            assumption that the application is memory bound            auto   - compiler decides when streaming stores are used (DEFAULT)            never  - disables generation of streaming stores/Qipp[:<arg>]          link some or all of the Intel(R) Integrated Performance Primitives          (Intel(R) IPP) libraries and bring in the associated headers            common        - link using the main libraries set.  This is the                            default value when /Qipp is specified[press RETURN to continue]            crypto        - link using the main libraries set and the crypto                            library/Qmkl[:<arg>]          link to the Intel(R) Math Kernel Library (Intel(R) MKL) and bring          in the associated headers            parallel   - link using the threaded Intel(R) MKL libraries. This                         is the default when /Qmkl is specified            sequential - link using the non-threaded Intel(R) MKL libraries            cluster    - link using the Intel(R) MKL Cluster libraries plus                         the sequential Intel(R) MKL libraries/Qtbb     link to the Intel(R) Threading Building Blocks (Intel(R) TBB)          libraries and bring in the associated headers/Qopt-subscript-in-range[-]          assumes no overflows in the intermediate computation of the          subscripts/Quse-intel-optimized-headers[-]          take advantage of the optimized header files/Qcilk-serialize          run a Cilk program as a C/C++ serialized program/Qarray-notation[-]          enable/disable(DEFAULT) C/C++ array extensions for data parallel          programming/Qopt-matmul[-]          replace matrix multiplication with calls to intrinsics and threading[press RETURN to continue]          libraries for improved performance (DEFAULT at /O3 /Qparallel)/Qsimd[-]          enables(DEFAULT)/disables vectorization using simd pragma/Qguide-opts:<arg>          tells the compiler to analyze certain code and generate          recommendations that may improve optimizations/Qguide-file[:<filename>]          causes the results of guided auto-parallelization to be output to a          file/Qguide-file-append[:<filename>]          causes the results of guided auto-parallelization to be appended to          a file/Qguide[:<level>]          lets you set a level (1 - 4) of guidance for auto-vectorization,          auto-parallelization, and data transformation (DEFAULT is 4 when the          option is specified)/Qguide-data-trans[:<level>]          lets you set a level (1 - 4) of guidance for data transformation          (DEFAULT is 4 when the option is specified)/Qguide-par[:<level>]          lets you set a level (1 - 4) of guidance for auto-parallelization          (DEFAULT is 4 when the option is specified)/Qguide-vec[:<level>]          lets you set a level (1 - 4) of guidance for auto-vectorization[press RETURN to continue]          (DEFAULT is 4 when the option is specified)

    Profile Guided Optimization (PGO)---------------------------------

    /Qprof-dir <dir>          specify directory for profiling output files (*.dyn and *.dpi)/Qprof-src-root <dir>          specify project root directory for application source files to          enable relative path resolution during profile feedback on sources          below that directory/Qprof-src-root-cwd          specify the current directory as the project root directory for          application source files to enable relative path resolution during          profile feedback on sources below that directory/Qprof-src-dir[-]          specify whether directory names of sources should be          considered when looking up profile records within the .dpi file/Qprof-file <file>          specify file name for profiling summary file/Qprof-data-order[-]          enable/disable(DEFAULT) static data ordering with profiling/Qprof-func-order[-]          enable/disable(DEFAULT) function ordering with profiling[press RETURN to continue]/Qprof-gen[:keyword]          instrument program for profiling.          Optional keyword may be srcpos or globdata/Qprof-gen-          disable profiling instrumentation/Qprof-use[:<arg>]          enable use of profiling information during optimization            weighted  - invokes profmerge with -weighted option to scale data                        based on run durations            [no]merge - enable(default)/disable the invocation of the profmerge                        tool/Qprof-use-          disable use of profiling information during optimization/Qcov-gen          instrument program for profiling/Qcov-dir <dir>          specify directory for profiling output files (*.dyn and *.dpi)/Qcov-file <file>          specify file name for profiling summary file/Qfnsplit[-]          enable/disable function splitting (enabled with /Qprof-use)/Qopt-prefetch[:n]          enable levels of prefetch insertion, where 0 disables.          n may be 0 through 4 inclusive.  Default is 2.[press RETURN to continue]

    ......

     


    最新回复(0)