By: Patrik Höglund  What is Automatic Gain Control?  It’s time to talk about advanced media quality tests again! As experienced Google testing blog readers know, when I write an article it’s usually about WebRTC, and the unusual testing solutions we build to test it. This article is no exception. Today we’re going to talk about Automatic Gain Control, or AGC. This is a feature that’s on by default for WebRTC applications, such as http://apprtc.appspot.com . It uses various means to adjust the microphone signal so your voice makes it loud and clear to the other side of the peer connection. For instance, it can attempt to adjust your microphone gain or try to amplify the signal digitally.Figure 1. How Auto Gain Control works [code here ].  
This is an example of automatic control engineering (another example would be the classic PID controller ) and happens in real time. Therefore, if you move closer to the mic while speaking, the AGC will notice the output stream is too loud, and reduce mic volume and/or digital gain. When you move further away, it tries to adapt up again. The fancy voice activity detector is there so we only amplify speech, and not, say, the microwave oven your spouse just started in the other room.Testing the AGC Now, how do we make sure the AGC works? The first thing is obviously to write unit tests  and integration tests . You didn’t think about building that end-to-end test first, did you ? Once we have the lower-level tests in place, we can start looking at a bigger test. While developing the WebRTC implementation in Chrome, we had several bugs where the AGC code was working by itself, but was misconfigured in Chrome. In one case, it was simply turned off for all users. In another, it was only turned off in Hangouts.  stable, low-maintenance audio quality tests  with the ability to record Chrome’s output sound for analysis. I encourage you to read that article, but the bottom line is that those tests can run a WebRTC call in two tabs and record the audio output to a file. Those tests run the PESQ  algorithm on input and output to see how similar they are. Add file support to Chrome’s fake audio input device, so we can play a known file. The original audio test avoided this by using WebAudio, but AGC doesn’t run in the WebAudio path, just the microphone capture path, so that won’t work. Instead of running PESQ, run an analysis that compares the gain between input and output. Adding Fake File Support This is always a big part of the work in media testing: controlling the input and output. It’s unworkable to tape microphones to loudspeakers or point cameras to screens to capture the media, so the easiest solution is usually to add a debug flag. It is exactly what I did here . It was a lot of work, but I won’t go into much detail since Chrome’s audio pipeline is complex. The core is this : int  FileSource::OnMoreData(AudioBus* audio_bus, uint32 total_bytes_delay) {// Load the file if we haven't already. This load needs to happen on the // audio thread, otherwise we'll run on the UI thread on Mac for instance. // This will massively delay the first OnMoreData, but we'll catch up. if  (!wav_audio_handler_)if  (load_failed_)return  0;// Stop playing if we've played out the whole file. if  (wav_audio_handler_->AtEnd(wav_file_read_pos_))return  0;// This pulls data from ProvideInput. return  audio_bus->frames();chrome --use-fake-device-for-media-stream \ The Analysis Stage Next I had to get the analysis stage figured out. It turned out there was something called an AudioPowerMonitor  in the Chrome code, which you feed audio data into and get the average audio power for the data you fed in. This is a measure of how “loud” the audio is. Since the whole point of the AGC is getting to the right audio power level, we’re looking to computeAdiff  = Aout  - Ain  Adiff   should be 0 if the AGC is turned off and it should be > 0 dB if the AGC is on and we feed in a low power audio file. Computing the average energy of an audio file was straightforward to implement :   // ... "Expected to write entire file into bus." ;// Set the filter coefficient to the whole file's duration; this will make // the power monitor take the entire file into account. // ... return  power_monitor.ReadCurrentPowerAndClip().first; Ain   by running the above algorithm on the reference file (which I fed in using the flag I implemented above) and Aout   on the recording of the output audio. At this point I pretty much thought I was done. I ran a WebRTC call with the AGC turned off, expecting to get zero… and got a huge number. Turns out I wasn’t done.What Went Wrong? I needed more debugging information to figure out what went wrong. Since the AGC was off, I would expect the power curves for output and input to be identical. All I had was the average audio power over the entire file, so I started plotting the audio power for each 10 millisecond segment instead to understand where the curves diverged. I could then plot the detected audio power over the time of the test. I started by plotting Adiff   : Figure 2. Plot of Adiff . 
Aout   and Ain   next to each other:  Figure 3. Plot of Aout  and Ain . 
Clock Drift and Packet Loss Let me explain. As a part of WebRTC audio processing, we run a complex module called NetEq  on the received audio stream. When sending audio over the Internet, there will inevitably be packet loss  and clock drift . Packet losses always happen on the Internet, depending on the network path between sender and receiver. Clock drift happens because the sample clocks on the sending and receiving sound cards are not perfectly synced. Silence Splitting to the Rescue! I could probably have solved this with math and postprocessing of the results (least squares   maybe?), but I had another idea. The reference file happened to be comprised of five segments with small pauses between them. What if I made these pauses longer, split the files on the pauses and trimmed away all the silence? This would effectively align the start of each segment with its corresponding segment in the reference file.  Figure 4. Before silence splitting. 
Figure 5. After silence splitting. 
Result Here is the final test implementation :    base::FilePath reference_file = vector <base::FilePath> ref_segments =// Keep the recording and split files if the analysis fails. vector <base::FilePath> actual_segments =false );true ); void  AnalyzeSegmentsAndPrintResult(const  std::vector <base::FilePath>& ref_segments,const  std::vector <base::FilePath>& actual_segments,const  base::FilePath& reference_file,const  std::string & perf_modifier) {"Failed to split reference file on silence; sox is likely broken." ;"The recording did not result in the same number of audio segments " "after on splitting on silence; WebRTC must have deformed the audio " "too much." ;for  (size_t i = 0; i < ref_segments.size(); i++) {float  difference_in_decibel = AnalyzeOneSegment(ref_segments[i],string  trace_name = MakeTraceName(reference_file, i);"agc_energy_diff" , perf_modifier, trace_name,"dB" , false );this :  Figure 6. Average Adiff values for each segment on the y axis, Chromium revisions on the x axis. 
 
Hi Ted!
ReplyDeleteHaving a perfect zero in the agc-off case would mean that the power curves for the ref and actual audio are identical. That outcome is unlikely since there are several sources of distortion along the way. In fact, I would be worried something was broken if we got perfect zeroes :)
One source of distortion is NetEQ speech expansion, which we can't eliminate (although we can mitigate it by having short segments). Another is that recording isn't perfect, and the outcome is slightly different per platform as a result.