Having known that disabling L1 cache by "-Xptxas -dlcm=cg" option at compile time can reduce over-fetch (for example, in the case of scattered memory accesses), I did an experiment today to see whether it will make my programme faster.
However, I was annoyed by the fact that there's no place to add such an option in VS2010. Although I tried to add it in Project Properties -> Configuration Properties -> CUDA C/C++ -> Command Line -> Additional Options, the linker complained about not knowing the "-Xptxas -dlcm=cg" option and failed. This should be a bug of VS2010 or Nsight, since the Additional Options of the Command Line of NVCC shouldn't have anything to do with the linker command line.
At last, I have to compile the .cu file in cmd, compile .cpp files separately in VS2010, and link the executable using the Link Only feature in VS2010 (located in Build Menu -> Project Only). Oh yes, luckily VS2010 has the Link Only command, otherwise I'll have to link the executable in cmd as well.