Deb here,

People seem to be anxious about ACX submission guidelines. ACX is where audiobooks are submitted for distribution on Audible.com, Amazon’s audiobook service. So, I thought I’d break down the techy side of the requirements, from an audio engineer’s perspective.

“Your submitted audiobook must:

be consistent in overall sound and formatting…”

This can be a huge problem for amateurs. It could happen because you change your preamp-levels each time you record; or change your distance or angle from the mic; by being inconsistent with your effects settings. At home, I set my preamp and leave it. I also have a few preset effects settings. This guarantees my sound stays consistent project to project.

Another common offender is recording parts of the audiobook in different spaces and/or with different equipment. If you don’t take the time to carefully match the sound captured from the previous setup, your audio will be impossible to match. Whenever I work on a project that I know might go past a day, I carefully note the settings for everything used in that session: the mic used, the distance from the reader, the preamp setting, and the effects settings. This helps me maintain as consistent a sound as possible, even if a month goes by between sessions.

“be comprised of all mono or all stereo files…”

Again, a rookie mistake. It is really important that you stay consistent with mono (1 file) or stereo (2 linked files) throughout a project. The only time you should use stereo files is if you’re adding stereo music or sound effects (SFX). That being said, what if you use music at the beginning and end of an audiobook, but none in between? Then the whole book will have to be delivered as stereo files, even if most of it doesn’t contain music. That or you’ll have to make the music and SFX mono.

“include a retail audio sample that is between one and five minutes long”

Consumers want to hear a sample of a book before they buy it. And ACX isn’t going to take the time to find the perfect 1 to 5 minutes of your book to showcase. Once a section is chosen, you can either listen for a good sentence to end the sample with, or you can fade the audio out.

“Each uploaded audio file must:...”

“have room tone at the beginning and end and be free of extraneous sounds…”

We talked about the importance of capturing room tone in a previous blog post. The easiest thing to do is to record and set aside at least 1 second of room tone to copy/paste to the beginning and end of each audiofile.

As far as being free of extraneous sounds, c’mon. You can hear that dog barking, that fridge buzzing, and those kids playing, and so can we. Same goes for your breathing, fidgeting, mouth and stomach noises. That’s one reason why I like to wear headphones while I record -- so that I can really hear what’s being captured.

“measure between -23dB and -18dB RMS and have -3dB peak values and a maximum -60dB noise floor…”

Ladies and gents, we have finally arrived at the truly techy part of the requirements. Very simply, this has to do with your recording levels, compression, normalization, and the Signal-to-Noise ratio of your voice to the background HISS.

Let’s start with the RMS levels. This is making sure you’re recording at good levels and not over- or under- compressing the audio. The RMS (Root Mean Square) tells you the average loudness level of the project, of which your goal is between -23dB(quietest) and -18dB(loudest). Your average recording level should be in or close to this range anyway, with peaks at -6dB, tops (ideally).

The idea is to record at a slightly lower level (-6dB peaks) to guarantee there’s no clipped audio (overdriven, distorted audio), and then raise the loudness with compression and Normalization. But not by too much -- you should always avoid over-compressing. As Scott likes to say, you use compression to “give the audio a haircut,” trimming off the peaks so that you can increase the overall level.

Next is the -3dB peak value. Decibels (or dB) are a measure of average loudness. In this case, the maximum loudness is 0dB. When you Normalize the audio, set your Max level to -3dB. This guarantees that the audio is never too loud. After all, the voice is different from music, we don’t need to hear it as loudly.

Finally, a maximum -60dB noise floor. The “noise floor” is all of the sound in the recording that’s not your voice. It’s the “shhhhh” hiss in the background. It’s caused by your equipment, computer fan, air conditioner, fridge, etc. With a maximum noise floor of -60dB, the “noise” needs to be AT LEAST 40dB lower than your average loudness (that being RMS -23dB to -18dB). That is the maximum ratio of signal (your voice) to noise (the hiss) for words to be intelligible. If the noise is any louder, it starts to mess with the end consonants of words. For example, making the words “Bat,” “Bad,” and “Bath” difficult to distinguish from each other.

Ideally, the noise floor will be even lower than that. 70:1 is the pro-audio standard, which would put the noise floor all the way down to -90dB.

Ways to lower the noise floor include: getting better equipment -- a higher end mic and/or preamp makes a huge difference (check out this video to hear how); find as quiet a space as possible, away from the fridge or A/C vents; separate your computer and any other noisy electronics from your recording space OR make sure your computer fan is very quiet; use a Noise Reduction effect. Noise Reduction (NR) helps a lot, just use it gingerly. If there’s a lot of noise to lower, do several light sweeps of NR rather than one big pass. Otherwise, it can eat into the quality of your voice. NR is another reason to make sure you capture at least a second or more of pure “room tone.”

“be a 192kbps or higher MP3, Constant Bit Rate (CBR) at 44.1 kHz”

You should be recording your audiobooks as WAV or AIFF files at 44.1kHz anyway. You can record at higher Sample Rates, but it really won’t improve your sound (the human voice doesn’t necessarily benefit from higher sample rates). As for MP3s at 192kbps or higher, this is a setting you choose when you Export your files as MP3s. Make sure you have selected an MP3 setting of 192kbps or higher. These settings are good for final products. Lower settings, like 128kbps, are better for auditions and sending samples of work.

As for CBR, most programs default to “constant.” If they don’t then you’ll see “Variable Bit Rate” as an option. Make sure you’ve chosen “Constant Bit Rate” while you are choosing your MP3 setting during Export.

And that’s the techy side of ACX explained! If you want to learn more about recording yourself, check us out at vrbootcamp.com.

Voice Recording Bootcamp

Voice Recording Bootcamp

Blog

Voice Recording Bootcamp

ACX Guidelines? We've got you covered.

“Your submitted audiobook must:

“Each uploaded audio file must:...”

Voice Recording Bootcamp