Most likely the program is not waiting to get CPU - it is waiting for blocking IO. If you have 8 cores, the OS will schedule the other programs to the other CPUs.
Threading is most of the time a hack around blocking API calls. It is very rare that a normal desktop program actually have CPU work for more than one CPU.
And then you forget the cost in programming time and runtime of locking overhead: A single threaded program does not need atomic operations, which can be quite expensive on a multi cored platform. If, for instance, a program uses reference counting (std::shared_ptr), these operations have to be atomic. If you know the program is single threaded, that overhead is not needed. That kind of overhead might make the single threaded program run faster than the multi threaded, which very often can not use all 8 CPUs anyway due to lock contention.