On the other hand, counting cycles only makes sense for some embedded architectures these days. On x86_64, your speed is going to be most affected by cache misses and pipeline stalls, neither of which you can determine by looking at the assembly. For that matter, you don't even really know how many cycles any particular instruction will take or which instructions may overlap; when you're writing the code, the processors that will end up running it may not have even been selected yet, so you can't know their microarchitecture.
In any case, I think that learning assembly is a terrible thing to do first, but it's a good thing to do before learning algorithms. Assembly would be a good first language for people who are somehow already programmers, but its better for making someone a programmer to pick something that lets you do something practical with only a small amount of effort.