3. Wave Field Synthesis

Wave Field Synthesis, also known as WFS, label a spatial audio reproduction procedure. The principle no longer remains dependent on psycho-acoustic phantom sound source perception, like all conventional audio procedures. The soundfield is reconstructed physically. For this purpose, the synthesis emulates nature like wave fronts according to Huygens' principle by the assembling of elementary waves. A computer synthesis independently moves a large number of separately controlled loudspeaker membranes, mostly arranged as an array around the listener, exactly in that moment, as the wave front of a virtual point source would reach its point in space.

3.1 Mathematical Base

The WFS- procedure was developed in the late 1980´s by Prof. Berkhout at the Delft University of Technology. The underlying mathematical base is the Kirchhoff- Helmholtz Integral (KHI). It states that if the sound pressure and particle velocity in any point at the surface of a source free volume is known, then the sound pressure at any point within this volume is determined. According to Rayleigh II, the sound pressure at point A within a half-space is determined only if the pressure distribution on a plane is known. On both sides of this plane an acoustic field occurs. In case of the rear sound is suppressed, a half-space emission results.

3.2 Physical principle

3.2.1 Virtual acoustic sources

For true spatial audio we have the task of constructing sound sources which no longer migrate dependent of the listener's position. As mentioned above, phantom acoustic sources cannot produce true spatial audio for this reason. One possibility for establishing such firmly positioned acoustic sources would be to assemble a lot of loudspeakers upon a spherical surface. Obviously, we will perceive the origin of the radiated wave front independently of one's own position always in the center of such a sphere:

However, it is impossible to distribute such bullets at each of the source and reflection starting points as described in the second chapter. Fortunately, the Wave Field Synthesis Principle can produce an unlimited amount of virtual acoustic sources from elementary waves in a loudspeaker line.

3.2.2 Elementary waves

As Christiaan Huygens discovered, each point of a wave front represents a starting point of an elementary wave. It has been more than 300 years since the Dutch mathematician managed to explain diffraction effects by this principle. This principle is applicable to any sort of wave propagation including light waves as well as sound waves. Huygens' Principle is one of the most important concepts in the range of physics. In the range of acoustics today, this knowledge delivers the possibility for the restoration of genuine like sound waves from such elementary waves:

In this animation we consider the holes in the baffle, in respect to the loudspeakers, as such initial points of elementary waves. As long as the dimension and spacing of the holes remains small compared by wavelength, the sound pressure will not differ between both sides of the hole. The superposition of enough such elementary waves completely restores a genuine wave front, seemingly running out from the virtual sound source. All we need is the dry recorded source signal and distance of the virtual source starting point for each starting point of the elementary waves.

Unfortunately, the sonic field in the recording room is not established from the direct wave front alone. The major fraction of the sound energy is contained in reflections. For true spatial audio, we cannot radiate the reflections alone from the main source direction, as is often done during conventional audio reproduction. The difference in direction of the first reflections concerning the direct wave starting point delivers the most important cues regarding source distance and distances of the recording room walls. The huge amounts of subsequent reflections, which compose the reverberation tail, are less important regarding the direction but provide information regarding the fine structure and properties from the recording room surfaces.

Nevertheless, the WFS - loudspeaker arrangements are able to create more than one virtual sound source. Its signal content is independent and may be originated by different sources. In case of congruent signal content, radiating from diversified positions, we will perceive the signal as a reflection of the main source signal. As described in the second chapter, the genuine sound field in a recording room is established from a huge amount of such starting points with the same signal content. If we were able to reconstruct all those positions, the spatial sound field would be completely recoverable from singly, dry recorded mono audio source signals. The main difficulty for restoring the genuine sonic field is appointing all the starting points of all the reflections in the recording room.

3.2.3 The model based approach

Wave Field Synthesis provides two different ways in this matter. The simplest method is the model-based approach. According to the mirror source model, the starting points of the reflections are calculated from the recording room geometry. The calculated distance of each of these virtual sound source positions in regard to each of the loudspeaker positions determines run-times and levels. The wall reflection factors are included into this calculation, as well as the directional radiation pattern of the primary source. However, such a procedure is practicable only to restore direct wave and first reflections in the recording room. The huge amount of discrete reflections in the reverberation tail makes the correct reconstruction of the complete sound field impossible with the model-based approach.

3.2.4 The data based approach

For that reason, the common practice in the scientific institutes is the application of the data-based approach. In the prearrangement of the transmitting process, the spatial impulse response of the recording room is captured. For this purpose, a line array of microphones is arranged in the recording room comparable to the loudspeakers arranged in the playback room. In order to capture the spatial impulse response, a short impulse is induced on the later position of the primary sound source, caught from the microphone array. The impulse will be caught by the nearest microphone first. The dedicated loudspeaker will radiate the audio signal ahead of all other loudspeakers during playback, if the signal becomes convolved in the different impulse responses. The other microphones in the recording room will later register in turn the impulse. By that way, the convolution of each loudspeaker signal into the assigned impulse response will be able to recreate the direct wave and all its reflections in the recording room from its correct starting points. [1]

Nevertheless, in practice it is impossible to record the spatial impulse response from every microphone position in the recording room for all possible positions of the sound source. Thus, the measuring results must become extrapolated and interpolated during playback for all different positions. This calculation has to also include all the mirror source positions of the reflections. Above and beyond, the microphone array poses unequal acoustic length in regard to the loudspeakers' environment due to the different temperatures in the playback room. This would cause a loss in upper frequency range, as far as the different propagation speed isn't included in the calculation. Apart from this, the loudspeaker positions are different to the microphone positions in normal case. The immense bulk of calculation tasks are hardly manageable in real time with the currently available computing power, especially in the case of the sound source moving.

3.3 Procedure advantages

In principle, though, the wave field synthesis has the ability to produce a virtual, physical copy of a genuine sonic field, at least inside the horizontal plane of the listener. All sound sources and all of their reflections in the recording room become recreated virtually at the correct starting points. This is different to conventional procedures. Normally, we are attempting to transmit all spatial information through the time and level differences between some separate audio channels. The synthesis of the reflection pattern from the dry recorded signal of the sound source, in the same manner as that source generates all spatial impressions in the recording room by emanating distributed reflections, poses the more natural like way to true spatial audio.

Experts agree with some purists who consider mono as the best audio. Only the direct grasp at the source signal, without the phase, comb filter, and ITDG problems which are unavoidably connected with distanced microphone alignments, is able to provide tangible audio. In comparison, recordings with microphone set at more distanced positions deliver pleasant, enveloping, yet more "ghost-like" perception.

The natural and coherent sound of the mono recordings remains essentially preserved, as long as we complement the direct wave radiation with its correct reflections. Thus, wave field synthesis recreates the true spatial impression of the recording room from such mono tracks.

The manifest advantages of this solution means that the listener is no longer bound to a narrow sweet spot. Wave Field Synthesis restores the entire sonic field. Any change of the listener's position in the playback room causes the same change in perception as would the listener moving accordingly in the recording room. This marks true spatial audio reproduction! It would be never possible by the psycho-acoustic phantom source detection, because the source position migrates dependent on the listener's position. In contrast, the virtual acoustic sources, produced from a sufficient amount of elementary waves, provide the same behavior as real sound sources. The loudspeaker itself no longer remains as the reference point.

Apart from that, Wave Field Synthesis provides the ability to reposition the virtual sound source in front of the loudspeaker arrangement. In the principle animation above there wouldn't appear a difference for delay times if the virtual sources align behind or in front of the microphone row. Thus, we would perceive the starting point in any case behind the speakers. However, if the delay times are inverted according to the “Time Mirror Approach”, the outer loudspeakers radiate first. If so, we are producing concave wave fronts. The virtual source appears in a focus point, inside the playback room area. We can walk around it, to a certain degree.

Even so, the most important advantage of Wave Field Synthesis is hardly mentioned in all the scientific publications. In conventional audio all signal components, as the direct wave, first reflections and the reverberation, merge together inseparably in a common signal. Thus it remains impossible for the handling of each of these components in a different manner during playback. In Wave Field Synthesis, however, we synthesize all these components during playback. As a result, it becomes feasible to configure the time and level of each single component of the sound signal independently. The use of this advantage seems hardly possible in the data-based approach. However, breathtaking possibilities arise in the model-based procedure for this reason.

3.4 Remaining problems

3.4.1 Horizontal restriction

The Wave Field Synthesis principle is not limited to a plane. In principle, the procedure would be able to restore the sound field in all three room dimensions. Nevertheless, for the data-based solution, the available computing power for 3D audio was insufficient until now. Besides, populating all playback room walls with loudspeakers is hardly a usable approach in practice. In a search for a practicable solution, the developers were leaving the representation of the elevation level compromised. Reducing loudspeakers to a single line around the listener was an acceptable solution, already achievable in the nineties. Our detection in azimuth mainly works through time detection, which becomes reconstructed perfectly by the horizontal loudspeaker lines. Such solutions are possible today with hundreds of loudspeakers. Even so, the horizontal limitation remains clearly audible, especially in strongly damped environments. Other procedures, like Ambisonics or Vector Base Amplitude Panning (VBAP), have shown a really three-dimensional reproduction of the sound event is essential.

3.4.2 Disturbing playback room acoustics

Unfortunately, we need the suppression of the playback room reflections for the WFS to perform the source-free volume requirements of the Kirchhoff- Helmholtz Integral. The loudspeaker rows cannot solve the problem of the disturbing, additional playback room reflections in the transmitting chain. In order to produce only the recording room acoustics, the playback room acoustics must get suppressed completely.

However, a horizontal row of loudspeakers doesn't really produce directed, parallel wave fronts. It radiates cylindrical waves. Such wave fronts lose three dB of their volume every time the distance is doubled. This lost energy comes back as abnormal playback room acoustic. Besides, in the case of a listener near the speakers, the increasing volume of the nearby speakers becomes disturbing. In addition, the acceptance factor for having such loudspeaker rows all around is very low.

3.4.3 Aliasing Effects

The Kirchhoff- Helmholtz integral describes an unlimited amount of elementary waves. In practice though, the number of loudspeakers is limited. As with any quantization, this causes aliasing effects. Inside the playback area, depending on the wavelength, across the room points of higher level alternate with points of a lack in magnitude. At one dedicated point, the notches and hills have a very small bandwidth. Fortunately, such effects are less disturbing in perception as suggest the measured frequency response curves.

The disparity of notches and hills depend upon the distance between the elementary wave sources, as well as listener and source position regarding the radiating loudspeaker alignment. For aliasing-free reproduction, a loudspeaker distance of less than one inch would be needed. Some improvement for a given amount of loudspeakers is described in DE102009006762A1. Seemingly random spacing at defined positions would reduce aliasing, in the same manner as the wheel in the Western film no longer runs backwards, if the spikes are aligned in randomized angles.

3.4.4 Truncation effect

As far as the loudspeaker arrangement not being completely closed around the listener, the ends of the radiating surface cause the “Truncation Effect”. As visible, in the animation of WFS principle, at such ends no further elementary waves contribute towards sound pressure. That will change the resulting superposition suddenly and a shadow wave arises.

To a certain extent, this effect is avoided by decreasing the level of the outer speakers. As long as the virtual acoustic source aligns behind the loudspeakers, the shadow wave arrives at the listener later than the direct wave front. However, if the shadow wave arrives in front of the actual wave front, this is audible and disturbing to the listener.

3.4.5 Concave wave fronts

Another problem for such virtual sound sources inside the playback area is the wrong ITD´s of the concave wave fronts. All surfaces of acoustic wavefronts in nature are curved convex; we have no other listening experience. The time difference of these inward curved wave fronts produces an utterly odd perception.

Thus, misguiding signs accrue if the listener is positioned between the radiating loudspeakers and the virtual acoustic source. Two different ways for solving this problem are described in the protected solution EP1637012 or in the DE 10 2006 054 961 A1 Application.

3.4.6 Parallax problems

The mentioned DE application solves a further problem for realistic perception: WFS delivers the possibility for producing virtual sound sources inside the spectator's area. However, we cannot constitute the dedicated picture at that point inside the spectator range. The described solution for that problem combines the advantage of the physical principle with psycho-acoustic principles for faking source positions. This point out the breathtaking possibilities of this audio principle and will become highly important for 3D vision.

3.5. Compatibility

The wave field synthesis is an object based approach. We have to transmit the pure, dry recorded audio (content) and in addition the (form) data regarding the recording room properties. In the range of computer games the object-based standard has been in use for a long time because of its efficiency in recording. At the German Fraunhofer Institute they are developing the MPEG4 standard, applicable to such an object-based audio broadcast. Unfortunately, most traditional components cannot play that standard at present. The units would have to be able to convolute in order to merge together the separate components.

On the other hand, WFS loudspeaker alignments can play traditional audio. Even so, the fundamental advantage for producing the true spatial impression of the recording room is getting lost during such reproduction. The channels are routed in virtual panning spots. Those faked loudspeakers are positioned far behind the real playback room walls. That lessens the influence of the actual listener position, because the angles and levels regarding the distanced loudspeakers hardly change in different points in the playback room. The enhancing of the sweet spot nearly covers the whole playback room. Nevertheless, such perception remains in traditional audio, including all the disadvantages of the phantom source perception, as described in the first chapter.

3.6 Stage of development

For almost thirty years, Wave Field Synthesis has been a subject of research at many respectable scientific institutes around the world. Today the implementation feasible without unsolvable problems, only effort remained obstructive. Currently, the largest realized plant in practice is the loudspeaker row in the lecture hall on the Technical University in Berlin, Germany. 2700 loudspeakers work together to simulate acoustic environments. Particularly notable in the audio world was a very successfully live transfer of an organ concert from Cologne Cathedral to the Berlin WFS Loudspeaker rows in the summer of 2008. The most remarkable WFS- Speaker's row installation in America was built in Mann's Chinese Theatres in Hollywood. The producer was German IOSONO ® GmbH.

3.7 Subjective impression

Impressions always subjective, but many of the readers will have no occasion until for listen to the Wave Field synthesis in person. There are some plants in Europe currently, but in the USA only the Mann's Chinese 6 Theatre in L.A. is available to the public. Therefore, let me describe in as neutral terms as possible my impressions from different listening tests:

Until today, the installation has not reached the goal of congruent perception as compared with the genuine sound event. Most notably audible is the reduction within the horizontal plane. That´s especially distracting for reason, the damped playback rooms are hardly producing their own reflections outside the plane of the loudspeakers. In the realized plants, the loudspeakers are spaced at least 20 cm apart from each other which, in theory, causes deep breaks in frequency response due to spatial aliasing effects. Even so, those effects weren't really disturbing. More audible was a tonal inaccuracy, especially a loss in the upper frequency range.. However, the spatial impression is incomparably better than in all traditional procedures. The positions of the sources are absolutely stable. Never in traditional audio will this become possible, such an unambiguously estimated distance regarding the source. No loudspeakers remain audible. The sound seems independent of all loudspeakers, outside and inside the playback room. The source starting point remains unchanged, even if the listener moves across the playback room. The volume of the faked source varies correctly, according to its distance. Only very near to the speaker rows do the loudspeakers themselves become audible because of the incorrect increasing level of the cylinder waves. Besides, virtual sources inside the playback room that are very close to the listener become indifferent in perception and, without a concrete source position, sometimes seem inside the head.

Most of the remaining problems seem solvable in the foreseeable future; first plants of tightly assembled two- dimensional loudspeaker fields indicate promising results. Realized HOLOPLOT Solution by Advanced Acoustic SF GmbH in Potsdam/ Germany already provides breathtaking dynamics and dry audio even in a very reverberant playback room based on described patented procedures acquired.