User Tools

Site Tools


inputting_20real-time_20audio

This is an old revision of the document!


Inputting real-time audio

by Richard Russell, November 2006

Most PCs provide the capability of inputting audio, typically either from a microphone or from a line-level input. This article describes how a BBC BASIC for Windows program can read and process audio data from this source in real-time (i.e. without it having to be recorded to a file first).

Preliminaries


As is usual for programs accessing the Windows API, it is important to trap errors, and closing the window, so that the necessary 'cleanup' operations can take place:

      ON ERROR PROCcleanup : SYS "MessageBox", @hwnd%, REPORT$, 0, 48 : QUIT
      ON CLOSE PROCcleanup : QUIT

The PROCcleanup routine is listed later. You may want to change the error reporting to a different method.

Selecting the audio format


The first step is to decide the audio format you will use: the principal choices being of sampling rate (the main ones being 11025 Hz, 22050 Hz and 44100 Hz) and number of channels (mono, 1 channel, or stereo, 2 channels). The higher the sampling rate the higher the audio frequency that can be received, but the more work your software needs to do. Normally you should choose the lowest sampling rate suitable for your application, remembering that it needs to be at least double the highest audio frequency in which you are interested (according to the Nyquist criterion).

You set up the required audio format and open the wave input device as follows:

      DIM Format{wFormatTag{l&,h&}, nChannels{l&,h&}, nSamplesPerSec%, \
      \          nAvgBytesPerSec%, nBlockAlign{l&,h&}, wBitsPerSample{l&,h&}, \
      \          cbSize{l&,h&}}
      Format.wFormatTag.l& = 1 : REM WAVE_FORMAT_PCM
      Format.nChannels.l& = 1  : REM Monaural
      Format.nSamplesPerSec% = 44100
      Format.wBitsPerSample.l& = 16
      Format.nBlockAlign.l& = Format.nChannels.l& * Format.wBitsPerSample.l& / 8
      Format.nAvgBytesPerSec% = Format.nSamplesPerSec% * Format.nBlockAlign.l&
      _WAVE_MAPPER = -1
      SYS "waveInOpen", ^WaveIn%, _WAVE_MAPPER, Format{}, 0, 0, 0 TO ret%
      IF ret% ERROR 100, "waveInOpen failed: "+STR$~ret%

In this example a sampling rate of 44100 Hz has been selected. Note that you cannot be sure that the incoming audio is actually being sampled at the specified rate. On some PCs the sampling rate may be as low as 11025 Hz even if a higher frequency has been selected in your program.

Creating and initialising the buffers


The next step is to decide how many audio buffers you need and how large they should be. To some extent this is an arbitrary decision, but it will depend on things like latency (how much time elapses between the audio arriving and it being processed by your program) and the amount of work needed to process the received audio data.

Normally you should have at least three buffers: one inputting the sampled sound, one being processed by your program, and one spare (the buffers are reused cyclically). It is vitally important that your program can process the audio data quickly enough, otherwise data will be lost, with undesirable results. If there is any variability in the rate at which you can process the data (for example it depends on disk or network accesses) then you may need to use more and/or larger buffers to 'iron out' the fluctuation. Using more buffers is generally preferable to using larger buffers, to minimise any increase in latency.

In the example below the number of buffers is three and the length of each buffer is 1024 samples; at 44100 Hz that implies a latency of at least 24 milliseconds. The code for creating and initialising the buffers is as follows:

      nBuffers% = 3
      SamplesPerBuffer% = 1024
      BytesPerBuffer% = SamplesPerBuffer% * Format.nBlockAlign.l&
      DIM _WAVEHDR{lpData%, dwBufferLength%, dwBytesRecorded%, dwUser%, \
      \            dwFlags%, dwLoops%, lpNext%, Reserved%}
      DIM Headers{(nBuffers%-1)} = _WAVEHDR{}
      FOR buff% = 0 TO nBuffers%-1
        DIM buffer% BytesPerBuffer% - 1
        Headers{(buff%)}.lpData% = buffer%
        Headers{(buff%)}.dwBufferLength% = BytesPerBuffer%
        SYS "waveInPrepareHeader", WaveIn%, Headers{(buff%)}, DIM(_WAVEHDR{}) TO ret%
        IF ret% ERROR 100, "waveInPrepareHeader failed: "+STR$~ret%
        SYS "waveInAddBuffer", WaveIn%, Headers{(buff%)}, DIM(_WAVEHDR{}) TO ret%
        IF ret% ERROR 100, "waveInAddBuffer failed: "+STR$~ret%
      NEXT

Note that in this case the audio buffers are allocated from BASIC's heap; you could alternatively use the Windows API to allocate the memory.

Starting audio capture


Once you have prepared the wave input system using the above code, you can start the real-time capture as follows:

      SYS "waveInStart", WaveIn% TO ret%
      IF ret% ERROR 100, "waveInStart failed: "+STR$~ret%


Inputting in real-time


Once the above code has been executed you need to process the received audio buffers fast enough to keep up with the incoming data. The following code constantly checks whether any of the buffers needs processing and if so calls the PROCprocessbuffer routine:

      _WHDR_DONE = 1
      REPEAT
        FOR buff% = 0 TO nBuffers%-1
          IF Headers{(buff%)}.dwFlags% AND _WHDR_DONE THEN
            PROCprocessbuffer(Headers{(buff%)}.lpData%, SamplesPerBuffer%)
            Headers{(buff%)}.dwFlags% AND= NOT _WHDR_DONE
            SYS "waveInAddBuffer", WaveIn%, Headers{(buff%)}, DIM(_WAVEHDR{})
          ENDIF
        NEXT
        SYS "Sleep", 1
      UNTIL FALSE

In this example the audio processing continues indefinitely, but you can terminate the program prematurely if you wish. If you do, don't forget to execute PROCcleanup before exiting the program.

Processing the audio data


Obviously it's only possible to describe this aspect in general terms, because precisely what audio processing takes place will depend on what the program is designed to do. The code below simply calculates the RMS (Root Mean Square) value of the incoming audio:

      DEF PROCprocessbuffer(B%,N%)
      LOCAL I%, V%, sumsq
      FOR I% = 0 TO N%*2-2 STEP 2
        V% = B%!I% AND &FFFF : IF V% >= &8000 V% -= 65536
        sumsq += V%^2
      NEXT
      RMS = SQR(sumsq / N%)
      ENDPROC

This code is appropriate for monaural input (one channel) where each audio sample consists of a signed 16-bit value in the range -32768 to +32767.

Cleaning up


When you stop the sound capture, or exit the program, you need to shut down the audio input in a controlled fashion:

      DEF PROCcleanup
      WaveIn% += 0 : IF WaveIn% THEN
        SYS "waveInStop", WaveIn%
        SYS "waveInReset", WaveIn%
        SYS "waveInClose", WaveIn%
        WaveIn% = 0
      ENDIF
      ENDPROC

This code might form part of a larger routine, if there are other things that need to be shut down.

This website uses cookies. By using the website, you agree with storing cookies on your computer. Also you acknowledge that you have read and understand our Privacy Policy. If you do not agree leave the website.More information about cookies
inputting_20real-time_20audio.1522502365.txt.gz · Last modified: 2024/01/05 00:17 (external edit)