Using a sampling profilers has benefits:
it will not affect the execution speed significantly, neither because of its own execution times, nor because it affects the CPU instruction or data cache by its instrumenting code (ie. you get a measure of actual performance like if there was no profiler running)
it is immune to the heisenbug of instrumenting profiler that inflate disproportionately the execution time of small procedures invoked in tight loops or from many contexts in an application’s code.
it is able to measure the time spent in other OS components or DLLs (like the video driver, OpenGL, etc.), not just the time spent in your application
profiling latencies won’t hide your application’s latencies (hard disk accesses, network accesses, video driver waits…), which can be particularly significant if your application makes asynchronous accesses.
it can pinpoint bottlenecks at the code-line level (not just procedure level), for the entire application.
it can be used to profile over long periods of time, like a full batch run of computations or a complete game level, you can literally have an application being profiled for days
being lightweight, you can profile multiple applications simultaneously (like a client and a server running on the same development machine)