It's a non-privileged instruction, so it doesn't need any kernel mediation through /dev/hwrng. Applications can, if they want, use it directly. For example, OpenSSL has a random engine you can call with "openssl rand -engine rdrand <bits>" that just calls rdrand and returns the bits.
However, most applications get their entropy from /dev/[u]random, so for them to benefit you need to feed the kernel pools from it. Modern rngd does this by calling the instruction directly in user-mode and then pushing the entropy to the kernel. Additionally, as described in the article, if you have a 3.6+ kernel with "Architectural RNG" enabled, it xors rdrand output with all random and urandom reads.
You can check if rdrand is available with "grep rdrand /proc/cpuinfo".