Understanding The "Perfect" Sound
How to define a
perfect sound coming out of your software, operating system and device? Truth bomb: there is actually no such thing. But audiophiles like to mention always a special term for that - bit-perfect.
They like to believe that it's actually the audio quality the most true to the original [intended by the creators] experience. Without any artificial distortion of the original signal or while minimizing it to the lowest level possible.
This is just a very simplified version of how the recording and audio world really looks like.
Can this holy grail be actually achieved? In order to understand, you need to go to the very beginning: what is the actual path that the sound takes from recording it to hearing it. Where does everything start, where does everything end. Only after gaining this knowledge about the beginnings you will be able to understand the end.
It all began from the source audio file
The whole path of sound in our audio devices [PC, laptop, tablet, phone, CD Player, mp3 player etc.] starts from opening the source audio file. Therefore the source audio file is the most important factor for all resulting audio quality at the far end of the whole audio path. Even a whole high-end audiophile audio setup will sound terrible when you will play a terrible source file on it.
The whole path of sound ends finally in our ears - more specifically - in our brain. Don't think that the path of sound ends in your gear - speakers and headphones. Before your brain can interpret this sound, it still has to pass through the environment and your body after it leaves the speakers/headphones. But we will talk about this in a moment.
Source audio file - what it actually is?
In digital audio world in which we currently live in - the source audio file is shortly speaking: a pack of many data samples, represented by numbers, which can be saved in two main possible forms:
Form 1: lossless audio file
Form 2: lossy audio file
The differences between those two forms are specified in a separate information page:
FLAC vs mp3: Differences in lossless vs lossy audio sound formats
Now it's best to only focus on the sound path.
The source audio file - doesn't matter now if it's lossless or lossy - has to contain some kind of information about the sound recorded, generated, manipulated or saved in it. In other words: the source audio file also has it's source - or point of origin or beginning - somewhere. All those data numbers have to come from somewhere. It's logic - you can't have something that came from nothing.
#1 Step: Every source audio file also has it's source somewhere
If you want to record your voice - you first have to have a device capable of recording this sound - in audio it's a microphone. If you want to create a specific sound - you have to have something, that would create or generate this sound. It can be a musical instrument, a software sound generator [think about software like an instrument - also made by humans], your voice chords, even your hands. Even wind produces a sound - and all these sounds have to be recorded first, so that they can be played in the future.
So if you think about that all sound has to originate from somewhere, you can easily tell, that the most important things on which the resulting sound quality will depend from start till the end, are the quality of the instruments that create those sounds, quality of microphones that record those sounds and skill of all these engineers and musicians who worked on those recordings, instruments and microphones. By
instruments we can understand just about everything that creates a sound that we want to record. Including the musician or vocalist's skills. But this is of course not that simple.
Good instrument and good microphone won't alone create good sound
Like with all things, there is also a physical and human factor included: if the vocalist will have a sore throat or not enough skills, the instrument will be out of tune or damaged, and the microphone will be connected to a bad audio recording interface, or if all this will be recorded and worked with by a person with not enough knowledge about what they are doing, in an environment totally not suitable for recording anything [like in an empty garage], even the most perfect instruments and most perfect microphones won't help.
The same as even the most perfect audiophile audio setup won't help to improve the audio quality of those recordings if they are already bad at the start. It will only help play those tainted recordings with more details about how bad they were at the beginning.
#2 Step: The microphone has to be connected to an interface
Not only the instrument quality and microphone quality matters. The environment and general knowledge about audio physics is also a very important factor. But the microphone itself also has to be connected to some kind of a device to record that sound - this is the next step in our audio path.
Device used to record sound from a microphone is most often called an Audio Interface. Audio Interfaces are like external sound cards - they are complex devices which contain everything that is necessary to properly record sound, convert it into digital form as well as monitoring the signal. At the same time they also are playback devices for working on various output equipment - headphones, speakers.
The microphone itself has to be powered, so Audio Interfaces most often contain a microphone Pre-Amplifier [Pre-Amp for short]. If a totally perfect instrument and microphone will be connected to a bad quality Audio Interface and the mic will not be powered correctly, the resulting recording cannot be good.
All USB microphones for example include a very small and cheap audio interface which is built into them. It has to be there so that you won't have to buy and connect the mic into a separate (better) device. The audio quality coming from those microphones depends not only on the quality of microphone itself, but especially from the quality of the audio interface built inside them. With such USB microphones you are stuck only always to this built-in interface.
#3 Step: The Audio Interface has converters
All sounds that you hear are sound waves in physics. But computers don't recognize such thing as "waves", they work only on numbers. Therefore all sound that is being recorded on an Audio Interface connected to PC has to be converted into a number representation of the waves. For this purpose Audio Interfaces include an Analog->Digital Converter (ADC) which converts Analog [sound waves] into Digital [audio file] on the fly.
Digital audio files are nothing else but a set of numbers, coming one after another, representing the shapes of the recorded sound waves. There are two such numbers:
Samples - also called
Sample Rate - is a number in kHz or Hz (1 kHz = 1000 Hz). This number tells how often the current shape of the signal is saved in the file during one second (how many samples it has in one second). Most common used Sample Rates are 44100 Hz (44,1 kHz) or 48000 Hz (48 kHz). 44,1 kHz file means that the sound wave's shape in that file is recorded or played 44100 times in one second. More samples per second needs more processing power - both for recording and playing. The Sample Rate in sound is not the same as resolution in monitors and cannot be directly compared to it. 44100 Hz sound files can save sound waves of maximum 22050 Hz (the half of it) because to accurately in a mathematically correct way restore a sound wave minimum only 2 samples are needed.
Bits - also called
Bit depth - is a number in bits. This number for most people is a mystery, but it's not that hard to understand. It tells about the dynamic range of the whole signal - more specifically, how close/far the recorded sound is from the lowest possible - in a digital audio file - noise. Each bit gives 6 dB of Dynamic Range. Most common used value today is 16 bits, which means that there is a total of 96 dB Dynamic Range. -96 dB is the noise floor, 0 dB is the maximum possible signal loudness. 24-bit files (144 dB) are mostly only used in recording and working on the original audio files. 16 bits audio is totally fine for playing on all home audio, as even the best professional microphones can have lower dynamic range (signal-to-noise ratio) than 96 dB. Therefore the noise that you hear on most 16-bit recordings will most often be not the noise floor of the digital audio file itself, but still the noise of the equipment used to record and process it, and often even background noise of the environment in which the recording took place.
Now, the most important thing to remember:
All sound wave shapes percieved by a human being meant for playing back on devices can be represented 100% accurately and mathematically correct in a 44,1 kHz and 16-bit audio file.
Really nothing more is needed for playback. That's why 44,1 kHz and 16-bit became the defacto standard for playing CD music and is still used until today.
The representation is so mathematically accurate, that all sound waves saved in a 44,1 kHz/16-bit file can be losslessly restored from it. And there is really no need to use anything more - unless you want to capture and then play sound which is not percieved by human beings or you need to have more samples to work with during the recording, filtering, editing, mixing and mastering process. During recording and working with those sounds more samples are somewhat useful, but in playing it back on a consumer device it doesn't matter, unless you really want to play ultrasounds outside 22050 Hz for your dogs [assuming your gear can actually play those sounds, and not just brag about them in the specs by the manufacturer, as is usually the case].
Check this video about Analog-Digital and Digital-Analog conversions by Xiph.org foundation. After understanding this process, you will never waste your disk space on anything else than 44,1 kHz/16-bit for pure playback.
#4 Step: The Digital Audio file has to be processed
When a sound engineer has sounds recorded and saved on a disk in a digital audio format, they have to process them correctly. This process is called editing.
For example, one instrument is most often saved as one mono track. Each of these tracks - one for each instrument - then has to be edited - that is, prepared before mixing it into the whole song. This preparation process often includes such things as denoising, removing (silencing) unwanted noises (like clicking, popping, humming), adjusting the timing between various tracks, and even adding that mythical color to the vocals or instruments' sounds.
All this is made in software, just like the photographers process all their photos in for example Photoshop or Lightroom. After this cames the next step.
#5 Step: Mixing all tracks together
When all the tracks are after all those basic introductory processes, they can be now put together - mixed - into one file containing them all (song). In this process all those separate tracks are being mixed down into one single stereo file. Mixing is a process directly before mastering. This is a very important step in the audio path as it largely determines how the end result will sound like after mastering. That's why it's so important for it to be as good as possible.
In this step single tracks are being corrected, compressed, effects are added, plugins and filters are used, panning is being made (all the sounds are being positioned in space) the sound and volume of single tracks is adjusted to the other tracks. And the same as a photographer, sound engineers also can use layers - put one slightly different track on another of the same instrument - to create a fuller sound.
#6 Step: Mastering the final result
After mixing comes the time of sending the resulting raw mixes to the mastering engineer, who will work on them. Mastering is the last process of working on sound before the release to the public. Mastering engineer puts all songs to one album, sets appriopriate loudness for all tracks, also makes sure that the sound of every song sounds consistent across the whole album (the proper tonal balance and dynamics). Interspaces between songs are made, special codes for the use on CDs are applied.
When the mastering process is finished, the so called
Gold Master or
Master recording is made from which then all other CDs are being manufactured.
As you can see, there are many steps in the audio path required to do before the final release. Short sum up:
Environment -> Instrument -> Microphone -> Audio Interface -> Digital Audio Processing -> Mixing -> Mastering
At every part of this long path, some original audio quality will be lost - either to the lack of skills, lack of knowledge, lack of proper equipment, improper environment, subjective taste for some specific sound, or even just something as prosaic as popular loudness trends (mastering).
And this is all still even before the sound will be opened and played on our audio gear. For this we have a whole big additional step.
#7 Step: Playback of the final result
When a recording is released, you buy it somewhere or listen to on Spotify, Tidal, Apple Music on your specific audio gear and in a specific digital audio format and quality. This applies also to games and movies.
Audio gear can be:
Electronic devices - like PC, console, phone, mp3 player, Audio CD player, Hi-fi tower etc.
Software - operating systems, music players, movie players etc. including software drivers
Sound cards, DACs and amplifiers - like Integrated Audio, internal sound cards, external sound cards, USB DACs, Stereo amplifiers etc.
Output devices - headphones and speakers
Those resulting sound files always have some specific parameters in which they are saved, and they differ between medium to medium. The software (system, players and drivers) also will change the original sound depending on your settings in them and usage cases. To hear the correct, hopefully unchanged from original result each time you will need to set up your audio gear properly depending on the usage case and medium.
Direct advices for specific usage cases you will find below:
But for now we are back at the beginning - where I said that
the whole path of sound in our audio devices starts from opening the source audio file.
Well then... what is actually happening each time when you open/play that file on your device? I will show it to you by explaining how this path of sound looks like inside Windows systems step-by-step.
Audio path in Windows systems from start till the end:
- Application - Audio Player, Video Player, Game, Browser etc.
- Microsoft APIs - WASAPI, XAudio2, MIDI and others
- Windows Audio Engine - Audio Device Graph - audiodg.exe
- Software Audio Effects - APO - Audio Processing Objects present in the drivers
- Hardware Driver - hardware recieves the audio sent by Windows for processing
- Hardware Audio Effects and signal processing - DSP - Digital Signal Processor
- Hardware DAC - Digital-Analog Converter
- Analog Audio - final audio sent through cables to speakers/headphones
1. Application - Audio Player
The audio file is first read, opened and then played back in the audio player - audio players can be anything, that can play sounds - like music players (Windows Media Player, AIMP, Spotify, iTunes), to video players (Media Player Classic, VLC) and also games and browsers.
Important to remember:
Audio players also have their own filters, volume controls, sound effects, resamplers and equalizers - or just general audio signal processing coded either inside them or provided by plugins - which can be applied to the original audio file in real-time even before Windows Audio Engine in point 3. will recieve the actual audio file for playback. Audio application is the first layer in the system in which the original audio file can be changed - often without even user realizing something had been changed.
You will find specific advices about how to configure your applications to minimize their interference in the original audio files on the next pages of this guide.
2. Microsoft API - Application Programming Interface
Just an application is not enough to start playing an audio file in Windows. Windows needs to recieve the audio samples through a special route that was designed specifically for these purposes. These routes in software are APIs - Application Programming Interfaces. Through these API routes - something like highways for audio - a programmer can send an audio file so that Windows then recieves it for playback. They are like a middleman between the program and Windows.
These highways are generally transparent to the user and only software programmers know what route they are using in their program. Though some programs allow the user to change the currently used API. This is useful, because through some special APIs the program can send the audio data directly to the hardware driver bypassing points 3. and 4. and this way also lowering latency considerably and potentially also improving audio quality in some cases. Example of such API can be ASIO. But not all hardware and their drivers support ASIO natively. Fortunately ASIO support can be also easily emulated on Windows systems and almost all hardware by using Microsoft's dedicated WaveRT Port Driver (Vista and everything up) or legacy Direct Kernel Streaming (Windows XP and everything below).
Step by step details about how to provide easy ASIO support for your hardware and potentially improving sound quality I will show you on the next pages.
3. & 4. Windows Audio Engine & Software Audio Effects
- sometimes also called system mixer, kernel audio mixer, Windows mixer etc.
In this point, Windows recieves the audio stream sent by the application. The Audio Engine mixes and processes all current audio streams together and also applies software filters or effects, if enabled (like external equalizer in Control Panel of our sound card's drivers). For example it will mix the song playing from the program with all other sounds coming from all different programs running or the Windows system itself - notifications, communication, alerts, game sounds, videos on youtube etc. It also resamples all the audio in real-time. This way Windows will act as a kind of realtime Audio Engineer, gathering everything into one big stream of a set Sample Rate and Bit Depth.
This unified stream will have the Sample Rate and Bit Depth set globally in Advanced Properties of our Audio Hardware in Windows Control Panel's Sound and Devices. This is called the
Default Format in Windows. And the mode of mixing, processing of all other sounds together and applying filters and effects in the system is called
Shared Mode. This one unified audio stream is then being sent into the Hardware Driver.
5. Hardware Driver - recieves the audio sent by Windows
If Windows and software are done mixing, filtering or generally just changing our beloved original audio file, the stream containing all this data is then being sent to the Hardware Drivers. The Drivers then send this stream into our Hardware - internal or external sound card - for additional processing. Hardware Drivers can optionally contain additional filters and effects, called APO - Audio Processing Objects, which were being loaded and applied by Windows in points 3. and 4.
6. Hardware Audio Effects - or Digital Signal Processing inside the hardware
Audio Hardware - like an internal or external sound card or DAC - can contain an additional electronic device called a DSP - Digital Signal Processor. Throught DSP filtering, resampling, equalizer, effects can be added or done directly in hardware, without the usage of CPU. This is not required, and is totally optional, because in modern Windows systems all the audio processing can be also done purely in software (by APOs in Windows Mixer). However hardware DSP can also be included - instead or additionally. This can give various performance benefits and lower the audio latency of audio processing. Most usual consumer Windows audio devices however (like HDAudio, USB and Bluetooth devices) don't normally use a hardware DSP.
Most notable example of hardware DSPs is the famous line of X-Fi audio chips from Creative used in Sound Blaster X-Fi internal sound cards - E-mu 20K - their high processing power was used to aid other DSPs present on the cards, lower CPU usage of audio processing (especially in games and with EAX active) process effects directly in hardware and considerably increase audio quality of internal resampling.
7. Hardware DAC
[ section still in progress - please come back later ]
8. Analog Audio sent to speakers/headphones
[ section still in progress - please come back later ]
#8 Step: When audio comes out, it has to be heard
[ section still in progress - please come back later ]
Links and useful resources
Useful links and resources used in conjunction with sound, software and programming experience to create this guide:
- Microsoft's official developer references for Windows audio stack
- Old article in polish about improving audio quality in Windows systems
- Creative's official manuals (really more people should read them)
- ...and maaaany years of testing sound cards in games/movies/music on various headphones, DACs and sound cards.
Thank you for reading
Feel free to share a link to this information page to help more people in having the best sound experience possible.
Written by rezno[R].
If you like what I'm doing and you feel that I deserve some support I would be honoured if you do it here:
I'm open for any suggestions of improvements in this article: contact information.