Only this pageAll pages
Powered by GitBook
1 of 32

mooc-hwlab

Loading...

Loading...

The Microcontroller

Loading...

Loading...

Loading...

Loading...

The Audio Peripherals

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Real-World DSP

Loading...

Loading...

Loading...

Loading...

Loading...

Voice Transformers

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Bill of materials

In this module we will use the following components:

  • STM32 NUCLEO-F072RB

  • USB cable - 6" A/MiniB

  • Adafruit I2S MEMS Microphone Breakout

In principle, any board from STM32 can be used for these exercises, as long as it is supported by , and exposes at least two I2S buses since both the microphone and the DAC (Stereo Decoder) require a dedicated I2S bus for audio transfers.

Prerequisites

  • basic knowledge of C and Python programming

  • a PC with a USB port (the microcontroller will be programmed and powered by your PC via a USB cable)

  • and, of course, having completed the the previous DSP modules!

Adafruit I2S Stereo Decoder
Jumper Wires
STM32CubeIDE

DSP4.3 - Real-Time DSP

by Eric Bezzam, Adrien Hoffet and Paolo Prandoni. (c) LCAV, EPFL, 2020.

Hi, and welcome to this final module of the DSP course, in which we will learn how to build a real time signal processing system using a general-purpose microcontroller unit.

Developing DSP applications for a low-level device does not affect the theory behind the algorithms that we implement. However, the constraints imposed by a low-power CPU will demand that we pay particular attention to details such as code efficiency, memory use and input/output routines.

In addition to that, working "close to the metal", so to speak, will give us the chance to look in some detail at the often neglected physical aspect of DSP, namely, the actual electronic components and the signalling protocols that are used to move data around in a complex circuit.

We will focus this module on audio signal processing, so as to build a system whose functions we can immediately enjoy; in particular, we will design and implement a variety of voice scramblers, that is, devices that you can use to alter your speaking voice in real time and sound, say, like a chipmunk or like Darth Vader.

The skills and the experience needed to port a real-time signal processing algorithm to an embedded board are the province of the truly accomplished DSP engineers; in this module you will be able to get a first taste of the challenges and the struggles of the job, together with the priceless satisfaction of truly "making" a device come to life.

In this online book we will provide you with step-by-step instructions together with links to videos on that illustrate all the steps leading to the "finished product". Working with hardware can often prove overwhelming and dealing with the numerous protocols, lengthy documentation, and specific components can be frustrating at times. But, thanks to these notes and the videos, you should be able to navigate around these issues with ease and taste the fun and exciting side of practical DSP right away.

Even if you don't have access to the selected hardware platform, you will still be able to appreciate the main lessons on real-time, low-level programming and we still recommend you read the nevertheless.

We hope you'll find this tutorial instructive and entertaining!

Coursera
section on Audio I/O theory

The ST Nucleo

In this eBook we will develop real-time signal processing algorithms for a specific piece of hardware, namely the STM32 NUCLEO-F072RB microcontroller board manufactured by STMicroelectronics (often abbreviated as ST). ST provides many inexpensive development boards that are used by hobbyists, students, and professionals to prototype countless applications.

The STM32 Nucleo development board.

In principle, any board from the STM32 family can be used for the exercises, as long as it exposes at least two I2S buses. You can find more information about this family of boards by reading the official documentation.

In order to facilitate the development of applications, ST provided full integrated development environments (IDEs) that you can use to program their boards. These tools can sometimes be overwhelming as they allow for a lot of customization, but they are meant to make your life easier! Attention to detail and reading the documentation will help you in setting up a successful workflow. We will cover these tools and their installation in the next section.

We will conclude this introductory part by illustrating in detail how to build a first simple application on the microcontroller.

Useful tips

Here are some useful shortcuts and debugging tips when using the IDE. You will find this section more useful later on, when you begin to be familiar with the programming environment.

Shortcuts

Below are some shortcuts we find particularly useful. For MacOs, replace "Ctrl" with "Command".

"Ctrl +" click on function/variable for seeing its definition.

  • "Ctrl + Space" for getting function/variable auto-completion (also "Ctrl" for MacOS since "Command + Space" opens the Spotlight application).

  • "Ctrl + B" for building the active project.

  • More shortcuts can be found here.

  • Debugging

    Below are some useful debugging tips. Although these may not make a lot of sense upon first reading, just remember that these tips are here for later, when you start coding your applications!

    • At one point you might end up with the error presented below.

    Don't worry you just have to stop the previously made "Debug session" as the driver cannot access twice the target board. (Same error as if the board was disconnected or not well powered). Press the "Stop" or the "Disconnect" button as shown below.

    • If you rename or copy/paste a project (useful to make a backup of a working project!) you might need to edit the debug configuration manually. Indeed, the debug config will still have the old binary file's name and thus will use it to program the board. The easiest workaround is to manually delete the binary file and make a new session that will automatically have the new binary file. First, you will need to open the "Debug Configuration" window as shown below.

    And then you can proceed to deleting the old debug configuration files and creating a new session.

    Consequently, building the project after renaming it and performing these steps will result in an ELF file with the new project's name!

    • You can double-click just to the left of a line number to create a breakpoint at a particular line. When running the program in "Debug" mode, the execution will stop at this line and you can resume using one of the buttons on the toolbar.

    • If you right-click a variable, you can select to "Add watch expression", which will let you monitor and edit the value of a certain variable. Just note that they will be visible only when the microcontroller is stopped on a breakpoint or with the pause button.

    Some shortcuts for debugging with breakpoints:

    • F5 - "Step into"

    • F6 - "Step over"

    • F7 - "Step return"

    • F8 - "Resume"

    • "Ctrl+Shift+B" - "Toggle breakpoint"

    About

    text and code Eric Bezzam, Adrien Hoffet and Paolo Prandoni For inquiries and information please write to [email protected]

    You can download the code for the examples described in this gitbook here.

    (c) LCAV, EPFL, 2020

    Going a bit further with this material

    The Adafruit Boards

    Any real-time audio application running on the microcontroller will need to acquire data from a source (for instance, a microphone) and deliver data to an output sink (for instance, an digital-to-analog converter connected to a loudspeaker) that we can listen to. Here is a brief description of the components we selected.

    The microphone breakout board

    The component used to capture sound is the by Adafruit. The actual microphone on this mini-board produces an analog signal (continuous in time and amplitude) but the device also contains an Analog-to-Digital Converter that returns a digital signal (discrete in time and amplitude), which is the format we need in order to pass the data to our microcontroller. We will describe the component in more detail .

    The DAC breakout board

    The microcontroller accepts and produces digital signals; in order to playback its output on a pair of headphones, it is necessary to create an analog signal and this can be achieved via a Digital-to-Analog Converter (DAC). We will use Adafruit's I2S Stereo Decoder Breakout, which contains the DAC, an audio jack for connecting headphones, and the necessary additional components. We will describe the DAC in more detail later.

    Adafruit I2S Stereo Decoder - UDA1334A Breakout
    I2S MEMS Microphone Breakout
    later
    Adafruit I2S MEMS Microphone Breakout

    Introduction

    We will now start the most interesting part of this module, the one where we start implementing actual audio DSP algorithms on the microcontroller.

    As we said in the beginning, the theme of our examples will be the design of increasingly more sophisticated voice transformers, that you can use in real time to modify the sound of your own voice.

    Before proceeding with this section, you should download and play with the Voice Transformer Jupyter notebook that we prepared for this module. The notebook is also available in your Coursera workspace in module DSP4 if you prefer not to run it locally.

    In the notebook you will find a theoretical explanation of the algorithms that we will try to implement in the microcontroller, together with the code and with sound examples that you can listen to.

    Reading and understanding the notebook is fundamental to understanding the sections that follow since, from now on, we will focus solely on the implementation details associated to our specific hardware and on the need to implement the algorithms in strict real time.

    Figure: The Voice Transformer Jupyter notebook

    The audio passthrough project

    A "passthrough" can be viewed as the audio processing equivalent of a "hello world" program. In this section we will program the Nucleo to simply pass the audio samples from the microphone to the DAC.

    Using the CubeMX software, we will first update the configuration of the microcontroller. We will then guide you through the wiring and, finally, we will program our passthrough using the SW4STM32 software.

    Highlighted boxes, as shown below, specify a task for which you need to find out the appropriate solution and implementation.

    TASK: This is a task for you!

    A passthrough is a great sanity check when first beginning with an audio DSP system. Moreover, it serves as a useful starting point for new projects, as we will see in the following chapters when we develop more complicated programs.

    Real-time audio I/O

    The microphone and the DAC are peripheral components external to the microcontroller board and therefore we need to understand two fundamental things:

    • the protocol used by external peripherals to electrically transfer data to and from the microcontroller board; for audio, this is usually the I2S protocol

    • the mechanism by which the data transfer is handled; in our case this will be a so-called DMA transfer.

    Code efficiency

    In this section we will illustrate some common coding practices that are used in real-time DSP implementations and that will be used later in our examples.

    Circular buffers

    In in the we discussed some implementation issues related to discrete-time filters and, in particular, we talked about circular buffers.

    As a quick recap, remember that if you need to store past values of a signal, the best solution is to use a circular buffer; assume that you need to access at most past values of the signal :

    The stereo DAC

    The microphone we are using measures an analog signal and returns a digital signal, which can be further processed by our microcontroller entirely in the digital domain. In order to playback or listen to this digital signal, it is necessary to convert it back to analog form; this can be done with a DAC (Digital-to-Analog Converter). We will be using Adafruit's , which contains a DAC, an audio jack for connecting headphones, and the necessary additional components. In the following subsections, we will explain the important inputs/outputs of the DAC we will be using, the I2S stereo output protocol our application will have to conform to, and an explanation about the breakout board from Adafruit.

    DAC inputs/outputs

    The DAC component in the Adafruit breakout is the by NXP, whose block diagram is shown below.

    Alien Voice

    In this section we will implement the "alien voice" effect on the microcontroller. As shown in the , the alien voice effect is achieved simply by performing sinusoidal modulation on the input signal in order to shift the voice spectrum up in frequency.

    Given a modulation frequency (in Hz) and an input sample we can compute each output sample instantaneously as:

    where is the system's sampling frequency (in Hz). The modulation frequency must be kept small in order to preserve intelligibility; still, the resulting signal will be affected by aliasing and other artifacts that we cannot really control.

    As mentioned in the Jupyter notebook, this voice transformer is great for real-time applications as it requires only a single multiplication per sample. This means that, compared to the passthrough project, we will not have to write too much new code. But the devil, as they say, is in the details!

    .

    Granular Synthesis

    We will now implement the second voice transformation method described in the Jupyter notebook, namely pitch shifting via granular synthesis.

    While the alien voice transformer simply alters the quality of the voice, with a pitch shifter we will be able to move the perceived pitch of the speaker either down (to create a "Darth Vader" sound) or up (to create a "Chipmunks" voice).

    We recommend you study the relevant section on the notebook carefully before proceeding, since the algorithmic details are going to be a bit trickier than what we have seen so far; please make sure you understand the theory and that you are comfortable with the offline implementation before tackling the real-time version of the transformer.

    May the Force be with you!

    Figure: Modified from .

    STM32 Cube IDE

    The ST Nucleo board hosts a microcontroller that is both:

    • highly configurable, in the sense that some of its electrical pins can be rerouted in software and assigned to specific function

    • programmable at a high level, since we can use C code and use a compiler to produce the microcode that will be uploaded onboard

    To handle this great flexibility, ST provides us with an integrated development environment (IDE) that we can use to manage both aspects of Nucleo programming. This is the , an Eclipse-based IDE for programming STM32 microcontrollers. From the description webpage:

    fcf_{c}fc​
    x[n]x[n]x[n]
    y[n]y[n]y[n]
    y[n]=x[n] cos⁡(2πfcFsn)=x[n] cos⁡(ωcn),y[n] = x[n] \, \cos\left(2\pi \frac{f_{c}}{F_s} n \right) = x[n] \, \cos(\omega_c n),y[n]=x[n]cos(2πFs​fc​​n)=x[n]cos(ωc​n),
    FsF_sFs​
    Jupyter notebook
    Source
                                       _.-'~~~~~~~~~~~~`-._
                                      /         ||         \
                                     /          ||          \
                                    |           ||          |
                                    | __________||__________|
                                    |/ -------- \/ ------- \|
                                   /     (     )  (     )    \
                                  / \     ----- () -----    / \
                                 /   \         /||\        /   \
                                /     \       /||||\      /     \
                               /       \     /||||||\    /       \
                              /_        \o=============o/        _\
                                `--...__|`-._       _.-'|__...--'
                                        |    `-----'    |
    here

    set up an array x_buf[] of lengthMMM(of the appropriate data type)

  • set up an index variable ix, initialized at zero

  • every time you receive a new sample, store it in the array at ix and increment ix modulo M;M;M;

  • with this, the expression x[n−k],k<M,x[n-k], k <M,x[n−k],k<M,can be accessed as x[(ix + M - k) % M].

    In a microcontroller, where each CPU cycle counts, modulo operations are expensive but they can be avoided and replaced by binary masks if we choose MMMto be a power of two. In those cases, ix % M is equivalent to ix & (M-1) and the bitwise AND is a much faster operation. Since MMMis the minimum number of past values that we need access to, we can always increase MMMuntil it reaches a power of two, especially when MMMis small.

    Here is a simple example:

    Speaking of circular buffer, remember that we also set the DMA transfer buffers to be circular!

    Sinusoidal lookup tables

    Most signal processing algorithms require the use of sinusoidal functions. In a microcontroller, however, computing trigonometric values for arbitrary values of the angle is an expensive operation since it always involves some form of Taylor series approximation. Even using a few terms, as in

    clearly requires a significant number of multiplications. A computationally cheaper alternative is based on the use of a lookup table. In a lookup table, we precompute the sinusoidal values that we need and use the time indexnnnsimply to retrieve the correct value.

    In sinusoidal modulation we need to know the values of the sequence cos⁡(ωcn)\cos(\omega_c n)cos(ωc​n) for all values of nnn. However, if ωc\omega_cωc​is a rational multiple of 2π2\pi2π, that is, if ωc=2π(M/N)\omega_c = 2\pi(M/N)ωc​=2π(M/N)for M,N∈NM,N \in \mathbb{N}M,N∈N, then the sequence of sinusoidal values repeats exactly every NNNsamples.

    For instance, assume the input sampling frequency is Fs=32F_s = 32Fs​=32KHz and that our modulation frequency is fc=400f_c = 400fc​=400Hz. In this case ωc=2π/80\omega_c = 2\pi /80ωc​=2π/80and therefore we simply need to precompute 80 values for the cosine and store them in an array C[0], ..., C[79]. The equation

    becomes simply

    Of course, we are trading computational time for memory here so, if NNNin the denominator is impractically large, the table lookup method may become prohibitive, especially on architectures such as the Nucleo which do not have a lot of onboard memory. Also note that this is one case in which we most likely won't be able to use binary masks instead of modulo operations since the period of the sinusoid is unlikely to be a power of two.

    Another difficulty is when ωc\omega_cωc​is not a rational multiple of 2π2\pi2π. In this case, we may want to slightly adjust the modulation frequency to a value for which the rational multiple expression becomes valid.

    State variables

    All discrete-time signal processing data and algorithms make use of a free "time" variable nnn. As we know, in theory n∈Zn \in \mathbb{Z}n∈Z so its value ranges from minus infinity to plus infinity. In an actual DSP application we are much more likely to:

    • start the processing with all buffers empty and with n=0n=0n=0(initial conditions)

    • store nnnin an unsigned integer variable and increment it at each iteration.

    The second point in particular means that, in real time applications that may run for an arbitrary amount of time,nnnwill increase until it reaches the maximum positive value that can be expressed by the variable and then roll over to zero. Since we certainly do not want this rollover to happen at random times and since the roll over is unavoidable, we need to establish a strategy to carry it out explicitly.

    In practice, all real-time applications only use circular buffers, either explicitly (to access past input and output values or to access lookup tables) or implicitly (to compute the output of functions that are inherently periodic). As a consequence, we never need the exact value ofnnnbut only the position of a set of indices into synchronous circular buffers.

    In our code, therefore, we will explicitly roll over these indexes independently and incrementally. To this end:

    • in functions, indexes will be defined as static variables so that their value will be preserved between consecutive function calls.

    • to make sure that state variables used by different functions are stepped synchronously, we will define them as global-scope variables at the application level.

    These types of variables are often referred to as state variables in C programming and they are usually much frowned upon; the truth is, in a microcontroller real-time application where performance is key, they simply cannot be avoided.

    MMM
    x[n]x[n]x[n]
    Lecture 2.2.5a
    second DSP course
    cos⁡x=x−x22!+x44!−x66!+O(x8)\cos x = x - \dfrac{x^2}{2!} + \dfrac{x^4}{4!} - \dfrac{x^6}{6!} + \mathcal{O}(x^8)cosx=x−2!x2​+4!x4​−6!x6​+O(x8)
    y[n]=x[n] cos⁡(ωcn),y[n] = x[n] \, \cos(\omega_c n),y[n]=x[n]cos(ωc​n),
            .     .       .  .   . .   .   . .    +  .
              .     .  :     .    .. :. .___---------___.
                   .  .   .    .  :.:. _".^ .^ ^.  '.. :"-_. .
                .  :       .  .  .:../:            . .^  :.:\.
                    .   . :: +. :.:/: .   .    .        . . .:\
             .  :    .     . _ :::/:               .  ^ .  . .:\
              .. . .   . - : :.:./.                        .  .:\
              .      .     . :..|:                    .  .  ^. .:|
                .       . : : ..||        .                . . !:|
              .     . . . ::. ::\(                           . :)/
             .   .     : . : .:.|. ######              .#######::|
              :.. .  :-  : .:  ::|.#######           ..########:|
             .  .  .  ..  .  .. :\ ########          :######## :/
              .        .+ :: : -.:\ ########       . ########.:/
                .  .+   . . . . :.:\. #######       #######..:/
                  :: . . . . ::.:..:.\           .   .   ..:/
               .   .   .  .. :  -::::.\.       | |     . .:/
                  .  :  .  .  .-:.":.::.\             ..:/
             .      -.   . . . .: .:::.:.\.           .:/
            .   .   .  :      : ....::_:..:\   ___.  :/
               .   .  .   .:. .. .  .: :.:.:\       :/
                 +   .   .   : . ::. :.:. .:.|\  .:/|
                 .         +   .  .  ...:: ..|  --.:|
            .      . . .   .  .  . ... :..:.."(  ..)"
             .   .       .      :  .   .: ::/  .  .::\
    #define BUF_LEN 16
    #define BUF_MSK 15 /* binary mask is always len - 1 */
    uint16_t x_buf[BUF_LEN];
    uint16_t ix = 0;
    
    /* storing sample x */
    x_buf[ix++] = x;
    ix &= BUF_MSK;
    
    /* accessing x[n-k] */
    uint16_t x_k = x_buf[(ix + BUF_LEN - k) & BUF_MSK];
    y[n] = x[n] * C[n % 80]
    I2S data transfer protocol

    Our digital microphone and DAC components rely on the I2S (Inter-IC Sound) bus specification for transferring digital audio samples to and from the microcontroller. Ultimately, the data that transits on the bus is simply a sequence of binary digits (zeros and ones) that are mapped to two distinct voltage levels, HIGH and LOW; each audio sample is encoded by a fixed number of bits (usually 24 or 32), that is, by a binary word. The bus will require some form of synchronization in order to determine when words begin and end. Finally, note that the audio data is usually stereo, that is, it consists of a time-multiplexed stream in which left and right channel data words are interleaved.

    The I2S bus is a 3-line serial bus consisting of:

    1. A clock (CLK) line that indicates the timing for each binary digit

    2. A data line for the actual sequence of binary digits.

    3. A word select (WS) line to indicate the beginning of a binary word and it's channel (Left or Right).

    A typical word transfer over the I2S bus looks like so:

    We will look at the details in the next section but, for now, notice the following:

    1. the data signal is synchronized to the rising edge of the the clock signal and is kept constant for the duration of a clock cycle.

    2. the beginning of a word is signaled by a state transition in the word select signal

    3. words are sent starting from the most significant bit (MSB)

    4. in this example words are 32-bit long; however only 18 bits are actually used for the data. Bits 19 to 24 are set to zero and from the 25th to the 32nd clock cycle the data signal is set to tri-state, which is a high impedance mode that essentially removes an output port from the circuit in order to avoid a short circuit. See for more information on tri-state.

    5. words are started either on the rising or on the falling edge of the WS signal, depending on the configuration of the DAC. In the above figure, words are started on the falling edge: the output is kept on tri-state after the rising edge at the end of the diagram and until the next falling edge of WS. This is to allow for two DACs to operate in parallel when building a stereo system, with the WS signal selecting one out of the two possible channels for data transmission.

    More information about the I2S bus specification can be read here.

    We first discuss the I2S protocol with respect to the microphone and then for the DAC. We recommend reading in this order as the microphone section is easier to grasp and will introduce some common terminology used later on.

    For the STM board that we are using, we will configure two I2S buses: one for the input and the other for the output. This configuration process will be covered here.

    DMA transfers

    The microcontroller has a certain amount of onboard memory that it can access, and the input samples need be stored in this memory before they can be processed. It would however be too onerous for the microcontroller to explicitly fetch each new input sample from the input peripheral and, similarly, deliver each sample to the output peripheral explicitly. To free the microcontroller from these tasks and use the CPU power primarily for processing, peripherals can access the onboard memory directly both to write and to read data; such data transfers are called Direct Memory Access (DMA) and the peripherals only contact the microcontroller (via a so-called interrupt) to signal that a transfer has just been completed.

    DMA transfers occur automatically, but they need to be configured; for an input DMA, for instance, we need to decide:

    • where in memory the peripheral should store the data; this means that we need to set up a buffer reserved for input DMA

    • how much data should a DMA transfer handle before notifying the microcontroller; this will determine the size of the DMA buffer.

    Obviously, the same design decisions need to be performed for an output DMA.

    The buffer's length is a key parameter that needs to be fine-tuned to the demands of a specific audio application. In general, the longer the buffer, the fewer DMA transfers per second, which is desirable since it minimizes the number of interrupts and allows for more code optimization. Additionally, certain types of signal processing operations provide results that are dependent on the buffer length; the DFT of a signal, for instance, will provide a frequency resolution that is proportional to the buffer's length. On the other hand, a large buffer will also introduce a significant latency, as we need to wait for more samples to arrive before we can begin processing. For real-time audio applications, having a low latency is extremely important for the user experience and so we are in a situation of conflicting requirements for which a suitable compromise needs to be determined on a case-by-case basis.

    For a refresher on buffering in real-time applications, please refer to Lecture 2.2.5b in the second DSP module on Coursera.

    Finally, remember that, in real-time DSP applications, we usually need to use alternating buffers for DMA transfers. Consider for instance an input DMA transfer: while the incoming samples are placed into an array by the DMA controller, the incoming array should not be accessed by our application until the DMA transfer is complete. When the DMA interrupt is signaled, it is then safe to copy the data from the incoming buffer into a safe area for processing. We will see later that, in our case, half-buffer interrupts will allow us to process the data in place.

    Figure: UDA1334ATS block diagram, p. 5 of forementioned datasheet.

    __

    A couple interesting things to take note of:

    1. The "DIGITAL INTERFACE" block takes an I2S input, and therefore exposes the three lines BCK, WS, DATA that are used in the I2S protocol. Note: I2S input is not a necessary feature of DACs; other input formats are also possible.

    2. This component has two DACs; one for the left channel (VOUTL) and another for the right channel (VOUTR) for stereo output.

    All input and output pins are briefly explained in the figure below.

    Figure: UDA1334ATS pinning, p. 6 of forementioned datasheet.

    Compared to the microphone which only had six pins, the above list of pins may seem overwhelming! But not to worry; we will explain the important settings for our application, referred to as "audio mode" in the datasheet. Moreover, as we will see later on, the breakout board by Adafruit nicely abstracts the interfacing between our microcontroller and the UDA1334ATS component.

    Mode configuration (p. 7 of datasheet)

    PLL stands for "Phase-locked loop"; you can find more information about PLLs on Wikipedia. In the UDA1334ATS component, it is used to generate the internal system clock from the WS signal in "audio mode". In fact, in order to enable "audio mode", PLL0 (Pin 10) must be set to LOW. Moreover, SYSCLK/PLL1 (Pin 6) should also be set to LOW to select a sampling frequency typical for audio application, i.e. within fs=16−50f_s = 16 - 50fs​=16−50 kHz.

    Input configuration (p. 9 of datasheet)

    In addition to I2S input, the DAC also accepts other formats. Therefore, we must explicitly configure the chip to expect an I2S input. This is done by setting both SFOR1 (Pin 7) and SFOR0 (Pin 11) to LOW. BCK (Pin 1), WS (Pin 2), and DATAI (Pin 3) will then serve as our I2S inputs.

    De-emphasis is a low-pass filter to undo a high frequency boost (aka pre-emphasis) that may have been performed at the ADC (Analog-to-Digital Converter). We do not expect any pre-emphasis and this only applies for 44.1 kHz so we can set DEEM/CLKOUT (Pin 9) to LOW for de-emphasis off.

    In our application, we may wish to toggle the mute control. For this reason, we will create a physical link (wire) between our microcontroller and MUTE (Pin 8).

    Powering the chip

    As you may have noticed from the list of pins above, there are two power supplies:

    1. Digital: VDDD (Pin 4) and VSSD (Pin 5).

    2. Analog (DAC): VDDA (Pin 13) and VSSA (Pin 15).

    The breakout board we are using will nicely abstract these signals into a "single" supply as we will see later on.

    Output pins

    VOUTL (Pin 14) and VOUTR (Pin 16) are our output pins for left and right channel respectively. In order to output these to signals, they must be used with Vref(DAC) (Pin 12) as a reference voltage when supplying the output to an analog output device such as an audio jack. As our breakout board incorporates an audio jack and the necessary wiring, we will not have to worry about this! We will still have access to these pins though, which can be useful for debugging purposes, e.g. with an oscilloscope.

    I2S output timing

    The UDA1334ATS chip supports word lengths up to 24 bits for the I2S bus. As our microphone anyways has a maximum bit precision of only 18 bits, we do not need to go above this precision.

    There are also some requirements on the BCK and WS signals (p. 9 of datasheet):

    1. BCK frequency can be at most 64 times the WS frequency.

    2. The WS signal must change at the negative edge of the BCK signal.

    In the figure below, we have a timing diagram for an I2S input signal. We can see that the second requirement is met. Moreover, we can observe that the Most-Significant Bit (MSB) should be the first bit. This is always the case for the I2S bus; we can observe the same property in the microphone timing diagram as well.

    Finally, the first requirement will be met as we have that the BCK frequency equals 64 times the WS frequency for the microphone.

    Figure: UDA1334ATS I2S timing, p. 10 forementioned datasheet.

    UDA1334ATS Wiring / Adafruit Breakout

    As we are interested in using the UDA1334ATS component under "audio mode", this requires a wiring as shown in the figure below.

    Figure: UDA1334ATS audio mode wiring, p. 15 of formentioned datasheet.

    In addition to the capacitors and resistors needed for the UDA1334ATS component, we would also like to listen to the resulting audio output with heaphones. For this, an audio jack would be ideal.

    Having to wire up all these components on a breadboard and connect them to our microcontroller would be a bit laborious. For this reason, we will be using Adafruit's I2S Stereo Decoder Breakout which contains the UDA1334ATS component, an audio jack, the necessary capacitors and resistors, and all the inter-connections.

    Using this breakout board has a few other benefits when used in "audio mode", as Adafruit assumes most users will be using it in this mode:

    1. SFOR1, SFOR0, PLL0, SYSCLK/PLL1, and DEEM/CLKOUT of UDA1334ATS are all pulled LOW by the breakout board; so the SF1, SF0, PLL, and DEEM pins of the breakout board do not need to be set for our application as we are interested in "audio mode".

    2. We can provide a 3V to 5V power on the VIN and GND pins of the breakout board; a built-in regulator will take care of supplying the digital voltage supply (VDDD and VSSD) and the DAC supply voltage (VDDA and VSSA).

    3. As an audio jack is already built into the breakout board, we do not need to worry about connecting the VOUTR, VOUTL, and Vref(DAC) pins of the UDA1334ATS component. However, we can easily debug these signals from Lout, AGND, and Rout of the breakout board.

    Check Adafruit's website for more information on each pin.

    I2S Stereo Decoder Breakout
    UDA1334ATS

    STM32CubeIDE is an all-in-one multi-OS development tool, which is part of the STM32Cube software ecosystem. STM32CubeIDE is an advanced C/C++ development platform with peripheral configuration, code generation, code compilation, and debug features for STM32 microcontrollers and microprocessors. It is based on the ECLIPSE™/CDT framework and GCC toolchain for the development, and GDB for the debugging.

    The IDE includes a chip configuration graphical interface called CubeMX:

    Figure: Screenshot of STM32CubeMX

    and an Eclipse-based programming environment:

    Figure: Screenshot of STM32CubeIDE.

    Installation

    Note: these instruction and images were produced on October 1, 2019.

    The following steps are the same for Windows, Linux, and MacOS and they simply consist of downloading the installation files.

    Download instructions

    Please refer to the distributor website for detailed installation instructions. Please refer to the next subsection for additional remarks for MaxOS users.

    1) Go on ST official download page with your favorite browser.

    2) Select the download link according to your operating system.

    4) You will be asked to log-in in order to continue with the download; please create an account and follow the instructions.

    5) When you will have completed the log-in, the download will normally start.

    6) Open the installer and follow the steps. You need to perform a standard installation. Some driver will also be installed during the process. Don't skip this, otherwise you will not be able to download your code into the microcontroller.

    Here, we provide some useful shortcuts tips when working with Eclipse-based tools like SW4STM32.

    Specific instructions for MacOS

    Note: the instructions were tested using MacOS Mojave, Version 10.14 on October 1, 2019.

    You may encounter the following dialog on macOS, please follow the instructions below if it is your case.

    1) Open your System Preferences and navigate to Security & Privacy

    2) In the General tab, click Open Anyway

    STM32CubeIDE
    Figure: Final wiring.

    Benchmarking

    When discussing the code architecture of a generic real-time audio device, we already remarked that if our processing callback is too slow with respect to the frequency of the DMA transfers, we will run into a condition called buffer underflow (or overflow, if you look at it from the point of view of the input DMA).

    It's therefore very important to make sure that our processing is fast enough and find out if and where the code is using up a lot of time. Fortunately, the microcontroller provides us with functionalities that gives the possibility to monitor that.

    Timers

    The HAL library includes a function uint32_t HAL_GetTick(void); which will return the number of ticks since the start of the microcontroller in milliseconds. Unfortunately we cannot use this tool because a resolution of one millisecond is too large for most audio sampling frequencies. For instance, with MHz the period to perform one operation is, thus the micro-second granularity is way too slow.

    In order to have a finer timebase, we will use the Nucelo's onboard , whose full technical details can be found . Briefly, all computing boards (and microcontroller are no exception) possess an internal clock that provides a reference timebase signal; this timebase is usually generated by a . The onboard timer is a roll-over counter that is incremented in lockstep with the timebase signal, often via a that can be used to lower its frequency, since the oscillator is usually very fast.

    For our application, we will use a timer with a large counting capacity (32 bits) and we will set it to increment itself every microsecond.

    Setting up the timer

    To set up the timer we will use CubeMX and then regenerate the initialization code. Open the CubeMX file by double clicking the .ioc file of the copied project it in the IDE project explorer.

    In order to activate a timer, you need to set a "Clock Source". Open TIM2 in the Timers menu (TIM2 happens to be 32bit timer) and activate its clock by setting the Clock Source to "Internal Clock".

    Next, we need to configure the timer in the configuration panel that appears:

    TASK 1: Set the Prescaler value (in the figure above) in order to achieve a period for "TIM2", i.e. we want our timer to have aresolution.

    Hint: Go to the "Clock Configuration" tab (from the main window pane) to see what is the frequency of the input clock to "TIM2". From this calculate the prescaler value to increase the timer's period to .

    Set the Counter Period to 0xFFFFFFFF; this ensures that the 32-bit timer counter only rolls around at its maximum value. You can leave the rest of the parameters as is for "TIM2". Finally, you can update the initialization code by saving the .ioc file.

    Using the timer

    In order to use the timer we configured, we will define a couple of macros to start and stop the timer and a global variable to keep track of the time that elapses between calls. Between the USER CODE BEGIN PV and USER CODE END PV comment tags, add the following lines. Note the volatile declaration for the timer, which underscores how this variable is a global variable modified by an interrupt service routine independently of the normal control flow of the rest of the code.

    For instance, to benchmark the passthrough example, we can modify the Process function like so

    Benchmarking live

    In a real-time audio application the processing time cannot exceed the time between successive DMA calls; if this is not the case, we have a so-called buffer underflow which results in extremely corrupted audio. We will use our benchmarking timer to make sure we are within the limits.

    TASK 2: In the passthrough example, the macro FRAMES_PER_BUFFERdetermines the length of the DMA transfer. In our code, we set this length to 32 (stereo) samples.

    What is the maximum processing time that we can afford in this case?

    What if we change the value to 512 samples?

    To check the actual time used by our processing function we will use an extremely convenient facility provided by the STM32 IDE, namely the possibility to monitor the live value of the variables in our code while the code runs.

    Pull up the passthrough example and modify the processing function as shown in the previous section by inserting the START_TIMER and STOP_TIMER macros. Then launch the application in the debugger.

    In the debugging window in the top right corner of the screen, select the "live variables" tag and add the variable timer_value_us.

    You can see that the passthrough code takes about 33 microseconds to execute, which is well below the maximum available time. This is good news!

    Solutions

    Are you ready to see the answer? :)

    As proposed in the hint, if you go to the tab Clock Configuration of CubeMX, you will see the following graph:

    Note the last block on the right column APB1 Timer clocks (MHz): 48. It means that your timer is "driven" by a base tick frequency of 48MHz. in order to reduce this to or in other word 1 MHz, you will have to divide it by 48. This number is thus your prescaler. This leads to the following timer configuration:

    Note the Counter Period, it is the value where the interrupt is triggered, here it is set to the maximum value.

    Basic implementation

    Assuming you have successfully implemented the , you can simply copy and paste that project from within the STM32CubeIDE environment. We recommend choosing a name with the current date and "alien_voice" in it. Remember to delete the old binary (ELF) file inside the copied project.

    Lookup table

    Remember that in the passthrough we set up the system to work with a sampling frequency of 32KHz and a sample precision of 16 bits. Here we will use a modulation frequency of 400Hz for the frequency shifting, so that the digital frequency is a rational multiple of as in the :

    The values for one period of the digital sinusoid can be encoded in a

    Low Level Debugging

    If you are writing your code from scratch, you might need several iteration before having the result you aimed for. There is some tools you can use in order to debug a non working micro-controller

    Breakpoint and watch

    The first and maybe most instinctive way to check if a code is working as expected is to put a breakpoint at a critical line of code. In that way it is possible to check if the micro-controller is going through a certain instruction and to do a step-by-step execution of the code starting from the breakpoint.

    A breakpoint is added by a double click on a line number in the code window. It can be added either during execution (debug session already started) or during editing. When a breakpoint is reached, the view jumps to the breakpoint's line and you will see the following view:

    The time between DMA calls for a sampling frequency FsF_sFs​and a buffer of NNNsamples is

    Since the audio peripherals are working at 32 KHz, the time between DMA calls for a buffer of 32 samples is 1000 μs\mu sμs (i.e. one millisecond)

    For a buffer of 512 samples, the maximum processing time is 16'000 μs\mu sμs.

    Fs=32F_s = 32Fs​=32
    Ts=31.25μsT_s = 31.25 \mu\textrm{s}Ts​=31.25μs
    1 μs1\,\mu s1μs
    1 μs1\,\mu s1μs
    1 μs1\,\mu s1μs
    1 μs1 \, \mu s1μs
    timer
    here
    crystal oscillator
    prescaler
    Figure: Timer activation
    Figure: Timer configuration
    tmax⁡=106 N/Fs μst_{\max} = 10^6\, N/F_s \ \mu stmax​=106N/Fs​ μs
    of length 80, where each element is (in 16-bit precision)

    The lookup table is provided here for your convenience. Begin by copying it between the USER CODE BEGIN PV and USER CODE END PV comment tags.

    TASK 1: Write a short Python function that prints out the above table given a value for the period of the sinusoid.

    Gain

    Let's also define a gain factor to increase the output volume. As we said before, we will use a gain that is a power of two and therefore just define its exponent. If you find that the sound is distorted, you may want to reduce this number.

    The processing function

    In the following version of the main processing function you will need to provide a couple of lines yourself. In the meantime please note the following:

    • we are assuming that we're using the LEFT channel for the microphone and we go through the input buffer two samples at a time, while we duplicate the output to produce a signal to both ears.

    • ix is the state variable that keeps track of our position in the lookup table. Since the alien voice is an instantaneous transformation, this is the only global time reference that we need to have

    • the function also implements a simple DC notch. Since this filter only requires memory of a single past input sample, there is no need to implement a circular buffer and we just use a single static variable in the function

    • and the result is scaled back to 16 bits; we take the gain into account in this rescaling.

    TASK 2: Complete the function to perform sinusoidal modulation.

    Now place the function between the USER CODE BEGIN 4 and USER CODE END 4 comment tags and test the application!

    Going further

    You can now try changing the modulation frequency by creating your own lookup tables!

    Solutions

    Are you sure you are ready to see the solution? ;)

    Here is the complete function:

    2π2\pi2π
    ωc=2π40032,000=2π80≈0.0785398\omega_c = 2\pi\frac{400}{32,000}=\frac{2\pi}{80}\approx 0.0785398ωc​=2π32,000400​=802π​≈0.0785398
    passthrough
    previous example
    circular lookup table
    On the previous image you can see several interesting things. Firstly on the left, you can see that the code is currently executing the process() function, you can also see that this process function was called by HAL_I2S_TxHalfCpltCallback() and all the hierarchy of function call that lead to this current execution line.

    On the center part of the screen you can see the green line, where the microcontroller actually stopped. The current position of the execution can be slightly different from the breakpoint location, particularly if the breakpoint is set on a line that was optimised during the compilation. It can also happen when the compiled code (assembly code) is too different from the C code, in such case an instruction can take several cycles to be executed.

    Then on the right side, all currently available variables are displayed. The content of the variables are accessible, for example look at the input buffer in the following screenshot:

    Figure: Exploring the content of variables while stopped at a breakpoint

    When the micro controller is stopped, you can either resume, or use the advanced stepping methods to continue the execution of the code.

    Common debug instruction to manage code execution during debug

    Be careful, breakpoints can also lead to break the synchronization of your internal peripheral or even lead to serious hazard: Imagine, put a breakpoint in the control loop of a coffee machine, this could lead to stop the system with the heater on and you end up melting the whole thing because the control loop is not active.

    For this reason, you might want to watch the internal state of your micro controller without stopping it. Modern IDEs usually propose live monitoring. In the case of STM32CubeIDE, there is a Live Expressions tab where you can watch global variables of your program and check their values as we have seen in the benchmaking section.

    Figure: Look at the value of your variables, here the state is changed by pressing a hardware button

    External tools

    When interacting with peripherals that are external to the micro controller, the interaction will either be with digital signals (like in our case I2S protocol) or sometimes with analog signals (imagine if you where reading the analog value of an ambiant light sensor). In both cases you will need to assess if the input and output signals are consistent with what you expect.

    Oscilloscope

    To visualise signals, there usually are two possibilities: either a logic analyser or an oscilloscope. Nowadays oscilloscopes tend to also have logic analyser features. An oscilloscope lets you visualise a signal and perform measurements on it. For example you can see below an analog signal on top and a logic analyser with the I2S bus on the bottom of the screenshot.

    Figure: View of an analog and 3 digital signal using a digital oscilloscope.

    In the past oscilloscopes had only 2 signals plotted on a cathodic screen with very few parameter available. Now with digital systems and particularly usb oscilloscopes the analysis possibilities are endless. We recommend using the Analog Discovery 2 digital oscilloscope as it gives a lot of IO's at a reasonable cent price compared to more conventional table top oscilloscopes.

    Trigger setting

    The screen of an oscilloscope is displaying the signal over a period of time that can be very short. In the image below, the whole screen is showing only 200μs200\mu s200μs. As the display is continuously updating, one notion is important in order to have a stable display. Most signals that you will watch are somehow periodical. To be able to visualise a very fast changing signal, even with our "slow" brain, the oscilloscope will try to synchronise frames together in order to always print the same part of the signal at the same place of the screen. To do this, the oscilloscope has a trigger setting, it will sense when the signal reaches a certain threshold and it will synchronise all frames to this event. The level of this threshold is set by a dedicated button on table-top machines, and corresponds to the yellow arrow on the right side of the display in our digital oscilloscope. It is possible to set the trigger system to react to a positive or negative slope. On a digital analyser, the trigger event can be more elaborated, for example it could be triggered by a particular start sequence of a bus communication.

    Figure: Measuring actual signal with an analog oscilloscope

    It is much easier to get an impression of how to handle the oscilloscope when signals are moving. For this reason we made two videos, one for the analog mode and one for the digital mode.

    Figure: Program execution stopped at a breakpoint on line 480 of main.c
    here

    Last Details

    In the previous section we implemented a basic granular synthesis voice transformer that lowers the pitch of the input voice. In this section we will address some remaining issues, namely:

    • implement an effect that raises the pitch of the voice (aka the "Chipmunks" effect)

    • properly initialize the buffer as a function of the pitch change

    • optimize the code a little more

    The Chipmunks

    To raise the pitch of the voice we need to set to values larger than one. As we have seen, this makes the effect noncausal, which we need to address by introducing some processing delay.

    The way to achieve this is to place the audio buffer's input index forward with respect to the output index; let's do this properly by creating an initialization function for the buffer that takes the resampling factor as the input.

    TASK 1: Determine the proper initial value for buf_ix when in the function below.

    By now you know where to place this code but don't forget to

    • add the following line to the file main.h between the /* USER CODE BEGIN Includes */ tags.

    • declare the function prototype in the USER CODE BEGIN PFP block

    • call the function before launching the DMA transfers:

    Switching between effects

    We can use the blue button on the Nucleo board to switch between Darth Vader and the Chipmunks; to do so, define the following constants at the beginning of the code

    and modify the user button callback like so:

    Final optimizations

    In the main processing loop, we are performing two checks on the value of grain_m per output sample. However, in the current implementation, both the stride and the taper lengths are multiples of the size of the DMA half-buffer. This allows us to move these checks outside of the processing loop and perform them once per call rather than once per sample

    TASK 2: Modify the VoiceEffect() function to reduce the number of if statements per call. Benchmark the result and observe the change in performance.

    Solutions

    Are you ready to see the answers ? :)

    We have seen in the previous section that the maximum displacement between current output index and needed input index is . Since this value can be non-integer, we round it up to the nearest integer value:

    Since the DMA transfer size is an exact divisor of both grain stride and taper length, the boundaries that we check grain_m against can only be crossed at the end of a function call. We can therefore rewrite the function like so:

    With this implementation, the computational cost per sample oscillates between and per sample, which represents a saving of almost one microsecond per sample or, equivalently, a performance increase of at least 9%.

    Signal levels

    Gain

    One thing that you might have noticed from the passthrough example is that the output signal is not very loud. To correct this, we can add a gain factor to the processfunction that multiplies each signal sample by a constant.

    In order to take advantage of the architecture of the microcontroller's internal multiplier, it is recommended to use factors that are a power of 2 since in this case a multiplication corresponds to a simple binary shift of the integer values to the left. We measured1μs1\mu s1μsdifference in processing time when tested with the first voice transformer algorithm.

    Removing the DC offset

    In general, in DSP applications we assume that input signals are zero mean. This is no different in the case of our microphone, so that, if there is no sound, we expect a sequence of zeros. If you actually look at the input samples, however, you will almost certainly find out that this is not so. In fact, the internal circuitry in the microphone almost always adds a voltage offset, and sometimes different microphones (of the same manufacturer) will have different offsets. We typically call this shift in the waveform a .

    DC offsets are highly undesirable since they limit the dynamic range of our system; in other words, we are "wasting" binary digits on a constant that serves no purpose.

    TASK 1: From your passthrough implementation, determine the value of the offset. Is it significant compared to the range of the microphone?

    Hint: put a breakpoint in the process function while being quiet; then with the debug tool, check the content of the input buffer.

    We have talked about DC offset removal in in the . Recall that a DC component corresponds to a nonzero frequency value at so the idea is to use a filter with a zero in A very simple example is the so-called FIR "DC notch" whose CCDE is simply

    Unfortunately this filter has the very poor frequency response shown here and, while good as a first approximation, it is not really recommended if audio quality is important to you.

    A better filter is obtained by using a an IIR DC notch which, while marginally more expensive computationally, provides a much flatter frequency response over the audio frequency band:

    When is close to (but less than) one, we can get a magnitude response like this:

    TASK 2: Assume that our input samples are between -1 and +1 and are encoded as signed 16-bit integers. Write a C function that implements an IIR DC notch with using integer arithmetic.

    Tasks solutions

    Are you sure you are ready to see the solution? ;)

    When the code is running, you can double click on any line number to add a breakpoint.

    We suggest you to ad a breakpoint at line 430:

    If the micro-controller is connected and a debug session is ongoing, you will see a change in the software and the following list:

    It is the hierarchy of the function executed by the micro-controller, indeed main() is the root. Please note that the button Skip All Breakpoints should not be activated for the micro-controller to stop at the given line.

    Connecting the peripherals

    Now that we have initialized the different peripherals that we will use to interface with the outside world (from the point of view of the microcontroller), we are ready to wire everything up! Make sure that the STM32 board is not powered, i.e. unplugged, while connecting the microphone and DAC breakout boards.

    For this task, we will have to refer to the card provided with the STM32 board (see below) and the image of the chip on the "Pinout" tab of our CubeMX project (further below).

    Adafruit I2S MEMS Microphone Breakout

    As previously mentioned, make sure that the STM32 board is powered off! We can then begin by connecting the microphone's ground pin. In electronics, it is common practice to first ground a component/circuit.

    TASK 3: Connect the microphone's GND pin to one of the STM32 board's GND pins, e.g. slot 22 on the CN7 (left) header.

    Tip: try to keep all the connector cables attached to each other to avoid messy wiring!

    We can now connect the supply voltage pin.

    TASK 4: Connect the microphone's 3V pin to the STM32 board's 3V3 pin.

    Note: the microphone component accepts voltage levels between 1.6V and 3.6V so do not use the STM32 board's 5V pin!

    Previously, we configured I2S2 for the microphone so we will have to connect the following pins (see image of chip from "Pinout" tab for the names on the left side of the arrow) to the corresponding pins on the microphone breakout board (right side of the arrow):

    • I2S2_SD DOUT

    • I2S2_CK BCLK

    • I2S2_WS LRCL

    TASK 5: From the "Pinout" configuration on CubeMX, determine which pins of the STM32 board are used by I2S2. Using the card provided with the board (see PDF figure above), use the jumper cables to wire the pins from the STM32 board to the appropriate pins on the microphone breakout board.

    Hint: for example, from the "Pinout" tab we can see that I2S2_SD is output on pin PC3. From the card provided with the board, we see PC3 is located in the bottom left corner of the board's pin header extensions. Therefore, we will use a wire to connect this pin to the DOUT pin of the microphone breakout board.

    Finally, we configured an additional GPIO pin in order to select whether we would like the microphone to be assigned to the left or right channel.

    TASK 6: Connect the microphone's SEL pin to the pin on the STM32 board corresponding to LR_SEL.

    BONUS: do we have to connect the microphone's SEL pin for the passthrough to work? What would happen if we didn't?

    Adafruit I2S Stereo Decoder

    As previously mentioned, make sure that the STM32 board is powered off! We can then begin by connecting the DAC's power supply, starting with the ground pin.

    TASK 7: Connect the DAC's GND and VIN pins to the STM32 board.

    Note: you can provide 5V to the VIN pin and the built-in regulator will produce a 3.3V supply, which is also available on the 3VO pin.

    Previously, we configured I2S1 for the DAC so we will have to connect the following pins to the appropriate pins on the DAC breakout board:

    • I2S1_SD

    • I2S1_CK

    • I2S1_WS

    Moreover, we configured an additional GPIO pin in order to mute the output.

    • MUTE

    TASK 8: Connect the above four pins from the STM32 board to the appropriate pins on the DAC breakout board.

    Hint: see the and for more information on wiring the DAC component.

    With everything correctly wired up, we can proceed to the passthrough on the SW4STM32 software!

    Tasks solutions

    Sadly we cannot connect all the wires for you or double check your connections. However we did our best to help you with this wiring by making a step-by-step video accessible at this .

    Are you sure you are ready to see the solution? ;)

    Indeed we have to connect the SEL pin of the microphone, otherwise the microphone might send it's signal randomly on the left or right channel (however it is common practice that these type of input pins have pull-down or pull-up resistors in order to have a by default state).

    The ON/OFF button

    The Nucleo board has a user programmable push button. We will now use it as an ON/OFF button for the alien voice effect.

    Configuration

    The idea is to use the push button to call an asynchronous routine in our code. To do that, we need to configure the button to trigger an interrupt and then we need to catch the interrupt in our code.

    Go into CubeMX by clicking on the ioc file in your alien voice project; in the left panel click on "System > NVIC" and enable the line "EXTI line 4 to 15" by checking the corresponding checkmark. The pin PC13 is linked to EXTI13 in the hardware of the microcontroller. Interrupts are used because they provide a very fast access to the core of the system and thus a very fast reaction.

    Still in CubeMX, verify that the label for pin PA5 is "LD2" and the label for pin PC13 is "B1".

    Add the following state variable to the USER CODE BEGIN PV section

    and add the following interrupt handler to the USER CODE BEGIN 0 section:

    The interrupt handler toggles the variable user_button and switches the LED on when its value is true.

    TASK 1: Modify the alien voiceProcessfunction so that it switches between a passthrough and the alien voice.

    Benchmarking

    Now that we have an ON/OFF button, we can use the to see how expensive it is to compute the alien voice.

    TASK 2: Add the timing macros to the Process function and use the push button to compare execution times.

    Solution

    Are you sure you are ready to see the solution? ;)

    We don't want to check the user_button status variable every time we process a sample, so we will place the logic at the DMA interrupt level, before we process a data buffer. First, rename the function that implements the alien voice form Process to VoiceEffect. Then modify the function prototypes between the /* USER CODE BEGIN PFP */tags like so:

    The modified Process function is trivial since we just need to add the timing macros before and after the code:

    You should find that, while the passthrough requires approximately 33 microseconds, the alien voice effect requires 94 microseconds.

    /* USER CODE BEGIN PV */
    volatile int32_t timer_value_us;
    
    #define START_TIMER {\
      HAL_TIM_Base_Init(&htim2);\
      HAL_TIM_Base_Start(&htim2); }
    
    #define STOP_TIMER {\
      timer_value_us = __HAL_TIM_GET_COUNTER(&htim2);\
      HAL_TIM_Base_Stop(&htim2); }
    void inline Process(int16_t *pIn, int16_t *pOut, uint16_t size) {
      START_TIMER
    
      ... // passtrhough code here
    
      STOP_TIMER
      // at this point the variable timer_value_us will contain
      //  the number of microseconds used by the portion of code
    }
    def make_cos_table(period):
        c = 0x7FFF * np.cos(2 * np.pi * np.arange(0, period) / period)
        print('#define COS_TABLE_LEN {}'.format(period))
        print('static int16_t COS_TABLE[COS_TABLE_LEN] = {', end='\n\t')
        for n in range(period - 1):
            print('0x{:04X}, '.format(np.uint16(c[n])), \
                end='' + '\n\t' if (n+1) % 12 == 0 else '')
        print('0x{:04X}}};'.format(np.uint16(c[period-1])))
    void inline Process(int16_t *pIn, int16_t *pOut, uint16_t size) {
      static int16_t x_prev = 0;
      static uint8_t ix = 0;
    
      // we assume we're using the LEFT channel
      for (uint16_t i = 0; i < size; i += 2) {
        // simple DC notch
        int32_t y = (int32_t)(*pIn - x_prev);
        x_prev = *pIn;
    
        // modulation
        y = y * COS_TABLE[ix++];
        ix %= COS_TABLE_LEN;
    
        // rescaling to 16 bits
        y >>= (15 - GAIN);
    
        // duplicate output to LEFT and RIGHT channels
        *pOut++ = (int16_t)y;
        *pOut++ = (int16_t)y;
        pIn += 2;
      }
    }
    cos_table[n] = (int16_t)(32767.0 * cos(0.0785398 * n));
    #define COS_TABLE_LEN 80
    static int16_t COS_TABLE[COS_TABLE_LEN] = {
        0x7FFF, 0x7F99, 0x7E6B, 0x7C75, 0x79BB, 0x7640, 0x720B, 0x6D22, 0x678D, 0x6154, 0x5A81, 0x5320, 
        0x4B3B, 0x42E0, 0x3A1B, 0x30FB, 0x278D, 0x1DE1, 0x1405, 0x0A0A, 0x0000, 0xF5F6, 0xEBFB, 0xE21F, 
        0xD873, 0xCF05, 0xC5E5, 0xBD20, 0xB4C5, 0xACE0, 0xA57F, 0x9EAC, 0x9873, 0x92DE, 0x8DF5, 0x89C0, 
        0x8645, 0x838B, 0x8195, 0x8067, 0x8001, 0x8067, 0x8195, 0x838B, 0x8645, 0x89C0, 0x8DF5, 0x92DE, 
        0x9873, 0x9EAC, 0xA57F, 0xACE0, 0xB4C5, 0xBD20, 0xC5E5, 0xCF05, 0xD873, 0xE21F, 0xEBFB, 0xF5F6, 
        0x0000, 0x0A0A, 0x1405, 0x1DE1, 0x278D, 0x30FB, 0x3A1B, 0x42E0, 0x4B3B, 0x5320, 0x5A81, 0x6154, 
        0x678D, 0x6D22, 0x720B, 0x7640, 0x79BB, 0x7C75, 0x7E6B, 0x7F99};
    #define GAIN 3  /* multiply the output by a factor of 2^GAIN */
    void inline Process(int16_t *pIn, int16_t *pOut, uint16_t size) {
      static int16_t x_prev = 0;
      static uint8_t ix = 0;
    
      // we assume we're using the LEFT channel
      for (uint16_t i = 0; i < size; i += 2) {
        // simple DC notch
        int32_t y = (int32_t)(*pIn - x_prev);
        x_prev = *pIn;
    
        // modulation
        y = ...
        ...
    
        // rescaling to 16 bits
        y >>= (15 - GAIN);
    
        // duplicate output to LEFT and RIGHT channels
        *pOut++ = (int16_t)y;
        *pOut++ = (int16_t)y;
        pIn += 2;
      }
    }
    the multiplications should be performed using 32-bit integers
    α\alphaα
    α>1\alpha > 1α>1
    D=(α−1) LD = (\alpha - 1)\,LD=(α−1)L
    4.4μs4.4\mu s4.4μs
    7.8μs7.8\mu s7.8μs
    benchmarking timer we defined before
    static void InitBuffer(float Alpha) {
      memset(buffer, 0, BUF_LEN * sizeof(int16_t));
    
      alpha = (int32_t)(0x7FFF * Alpha);
      // input index for inserting DMA data
      if (Alpha <= 1)
          buf_ix = 0;
      else
        buf_ix = ...;
    
      prev_ix = BUF_LEN - GRAIN_STRIDE;
      curr_ix = 0;
      grain_m = 0;
    }
    #include <memory.h>
    UNMUTE
    SET_MIC_LEFT
    
    InitBuffer(3.0 / 2.0);
    
    // begin DMAs
    HAL_I2S_Transmit_DMA(&hi2s1, (uint16_t *) dma_tx, FULL_BUFFER_SIZE);
    HAL_I2S_Receive_DMA(&hi2s2, (uint16_t *) dma_rx, FULL_BUFFER_SIZE);
    #define DARTH (2.0 / 3.0)
    #define CHIPMUNK (3.0 / 2.0)
    void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin) {
      if (GPIO_Pin == B1_Pin) {
        // blue button pressed
        if (user_button) {
          user_button = 0;
          HAL_GPIO_WritePin(LD2_GPIO_Port, LD2_Pin, GPIO_PIN_RESET);
          InitBuffer(DARTH);
        } else {
          user_button = 1;
          HAL_GPIO_WritePin(LD2_GPIO_Port, LD2_Pin, GPIO_PIN_SET);
          InitBuffer(CHIPMUNK);
        }
      }
    }
        buf_ix = (uint16_t)(GRAIN_LEN * (Alpha - 1) + 0.5) & BUFLEN_MASK;
    inline static void VoiceEffect(int16_t *pIn, int16_t *pOut, uint16_t size) {
      for (int n = 0; n < size; n += 2) {
        buffer[buf_ix++] = pIn[n];
        buf_ix &= BUFLEN_MASK;
      }
    
      if (grain_m < TAPER_LEN) {
        // we are inside the tapering slope
        for (int n = 0; n < size; n += 2) {
          int32_t z = Resample(grain_m + GRAIN_STRIDE, prev_ix) * (0x07FFF - TAPER[grain_m]);
          z += Resample(grain_m, curr_ix) * TAPER[grain_m];
          pOut[n] = pOut[n+1] = (int16_t)(z >> 15);
          ++grain_m;
        }
      } else {
        for (int n = 0; n < size; n += 2)
          pOut[n] = pOut[n+1] = Resample(grain_m++, curr_ix);
      }
      // end of stride?
      if (grain_m >= GRAIN_STRIDE) {
        grain_m = 0;
        prev_ix = curr_ix;
        curr_ix = (curr_ix + GRAIN_STRIDE) & BUFLEN_MASK;
      }
    }
    char user_button = 0;  /* user button status */
    void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin) {
      if (GPIO_Pin == B1_Pin) {
        // blue button pressed
        if (user_button) {
          user_button = 0;
          // turn off LED
          HAL_GPIO_WritePin(LD2_GPIO_Port, LD2_Pin, GPIO_PIN_RESET);
        } else {
          user_button = 1;
          // turn on LED
          HAL_GPIO_WritePin(LD2_GPIO_Port, LD2_Pin, GPIO_PIN_SET);
        }
      }
    }
    void VoiceEffect(int16_t *pIn, int16_t *pOut, uint16_t size);
    
    void Process(int16_t *pIn, int16_t *pOut, uint16_t size) {
      if (user_button == 1) {
        VoiceEffect(pIn, pOut, size);
      } else { // just pass through
        for (uint16_t i = 0; i < size; pIn += 2, i += 2) {
          *pOut++ = *pIn;
          *pOut++ = *pIn;
        }
      }
    }
    void Process(int16_t *pIn, int16_t *pOut, uint16_t size) {
      START_TIMER
    
      if (user_button == 1) {
        VoiceEffect(pIn, pOut, size);
      } else {
        // just pass through
        for (uint16_t i = 0; i < size; pIn += 2, i += 2) {
          *pOut++ = *pIn;
          *pOut++ = *pIn;
        }
      }
    
      STOP_TIMER
    }
    It is then possible to right-click in the editor and press Add Watch Expression you can now enter the name of the variable you want to explore and it will show up in the Expression viewer panel. Unfold the array and you should see something close to this:
    Figure: Variable watch panel

    Notice that even if the values are fluctuating, the average is around -1540. This is the offset that we where looking for. It is introduced by the microphone and can be variable from one sample to an other.

    In the function we will use the key points we saw in the section on numerical precision:

    • performing the multiplication in double precision and rescaling

    • since x[n]x[n]x[n]and x[n−1]x[n-1]x[n−1]are usually close in value (audio signals do not swing wildly), the chance of overflow in the addition and subtraction is negligible

    The above DC notch is better than the simple one-step difference, but it can be made better with respect to its fixed-point implementation. Here is an interesting article about that.

    ω=0\omega=0ω=0
    ω=0.\omega = 0.ω=0.
    y[n]=x[n]−x[n−1].y[n] = x[n] - x[n-1].y[n]=x[n]−x[n−1].
    y[n]=λy[n−1]+x[n]−x[n−1]y[n] = \lambda y[n − 1] + x[n] − x[n − 1]y[n]=λy[n−1]+x[n]−x[n−1]
    λ\lambdaλ
    λ=0.9\lambda = 0.9λ=0.9
    DC offset/noise/bias
    Lecture 2.2.3
    second DSP course
    Frequency response of the FIR DC notch
    Frequency response of the IIR DC notch
    Figure: Breakpoint set at line 430
    Figure: Execution call hierarchy when stopped at a brakepoint
    Figure: Breakpoint bypass button
    ←\leftarrow←
    →\rightarrow→
    →\rightarrow→
    DAC chip explanation
    Adafruit's site
    coding
    address

    A simple test project

    In this section we will guide you step by step through the process of coding a simple application for the microcontroller, connecting the board, and running the application on the microcontroller. This first application does not use any peripherals and simply makes an onboard LED blink, it will be a basic project template that we can reuse many times later.

    Please note that, if you get stuck, you can always download the working STM32 projects for the examples in this gitbook here.

    Open the IDE and select a workspace

    1) Open the STM32CubeIDE that you just installed in the previous section.

    2) Select a workspace, this will be the folder where all projects are going to be stored. You can create multiple workspaces if you work on different projects.

    3) Once you select a valid folder you can Launch the IDE.

    Create a new project

    The first time you open the software, you will be prompted by the screen shown below. If a pop-up appears, asking if you would like to initialize all peripherals with their default mode, simply press Yes.

    Press the Start new STM32 project button in order to launch CubeMX and start initializing the project.

    If you have a workspace that already contains a project, the new project button is in the top left corner.

    Configuring the hardware with CubeMX

    We will be using CubeMX's graphical interface to generate the initialization steps for the board and its peripherals. Once the board is configured, the IDE will translate our configuration choices into automatically-generated C code.

    Board selection

    When all necessary downloads are completed, you should eventually see something similar to the screenshot below. Click on the Board Selector tab in the top toolbar to the left.

    Make sure the "Board Selector" tab is the active one (top-left corner) and look for our board, the "NUCLEO-F072RB" (you can use the "Part Number Search" facility). Double-click the board in the search results. Note that if you are using a different ST board that , you should select the model you actually have.

    Chose an appropriate name for the project, including the date, project goal etc, and leave the options as default.

    When clicking next, you will see a pop-up asking if you want to initialize all peripherals to their default mode: this applies to the external circuits that may have been added to the Nucleo board. Peripheral initialization will be relevant later, when we add a microphone and an audio output module, but in this case we are only using an onboard LED and a button. Press Yes in any case.

    When the board has loaded, you should see something similar to the following screenshot:

    Extend the central pane if it was hidden, because it will be needed later!

    Code generation

    When a Nucleo template is selected and all peripheral initialized to their default values, the blue button B1 and the LED LD2 are already configured; this is sufficient for our first project.

    We are now ready to generate the initialization code. Save your project by pressing CTRL+ S. The project will be automatically generated if a modification was made; in this case, since we did not change the layout, you may have to trigger the code generation by pressing Alt + K. CubeMX will generate some C files, using HAL libraries, that encode all the settings that were selected via the GUI.

    HAL is short for Hardware Abstraction Layer and it is a set of libraries provided by ST to help developers produce portable code, i.e. code that can be used across the whole family of STM32 boards. For more information on HAL, please refer to .

    The user application

    From the "Project Explorer", open the file "Src/main.c"; this is the code automatically generated by CubeMX and it will look like so:

    If you look at the C code, you can notice matched commented lines that read USER CODE BEGIN and USER CODE END; it is only between these tags that you should write your code!

    All other lines of code have been generated automatically by CubeMX according to the configuration we specified via the graphical tool. If you go back and change some of the configuration parameters, CubeMX will overwrite all the code that is not between the USER CODE tags!

    Blinking an LED

    We will now program the board to perform a simple task - make an onboard LED blink!

    In the code, look for the infinite loop between the comments USER CODE BEGIN WHILE and USER CODE END WHILE, add the following lines to the body of the loop:

    HAL_GPIO_TogglePin and HAL_Delay are commands provided by the ST HAL library for toggling the voltage level on a pin and to pause execution, respectively. Remember that you can always look for the definition of a function or of a variable by pressing Ctrl and clicking the function/variable.

    The first command toggles the value of the pin corresponding to the LED at pin LD2; this turns the LED on for one iteration of the while loop and off for the next iteration. In order to actually be able to observe the LED blinking we must set a delay between each toggle operation, otherwise the blinking would be too fast to be perceived. This is what the second command accomplishes by placing a delay of 1 second; the argument of the function HAL_Delay is indeed in milliseconds.

    Building the project

    Before plugging in the board, let's try building the project. This can be done by pressing the hammer icon on the top toolbar, or by using the shortcut Ctrl + B ("Command + B" on MacOS). Make sure you are building for the Debug target and for the correct project.

    In the figure below, we can see the two signs of a successful build:

    • A "Binaries" folder was created, as can be seen in the "Project Explorer", and it contains an ELF file corresponding to our project. It should have the same name as your project. If this does not appear, it may be necessary to refresh the project by right-clicking the project directory and selecting Refresh (or using the shortcut F5).

    • There are no errors in the "Console" pane.

    Now we can program the board! Plug the board to your computer using the USB Type-A to Mini-B cable. A couple LEDs of the board should light up as it is being powered by your computer.

    Debugging the code

    Click on the bug icon from the toolbar and select Debug As > STM32 MCU C/C++ Application (see below).

    If there are no debug configurations available from the menu, set up a configuration first by choosing "Debug configurations..." and clicking on the STM32 Cortex-M option.

    If this is your first time debugging in this workspace, you should see a pop-up similar to the one below appear. Click "Yes" as this perspective will be very useful, and you can check the box for "Remember my decision" so that this pop-up does not appear again.

    If something similar to the following error appears:

    make sure the board is properly plugged in and/or try another USB port.

    If the Nucleo's firmware is outdated, you might be requested to update it, shown by the following pop-up:

    Just press OK and then Yes.

    When the Nucleo is reconnected. First press Open in update mode, and then Upgrade the firmware of your Nucleo.

    After the upgrade, you can press again on the bug button to resume debbuging.

    A view similar to the one below should then appear. This is the typical "Debug perspective" in Eclipse.

    Your program should be momentarily paused as is the case in the figure above at Line 90. You can continue the program by pressing the Resume button as pointed out above.

    You should now observe the green "LD2" LED (see below) blinking!

    Figure: Top view of a NUCLEO board. Red arrow pointing out the location of "LD2" LED. .

    Terminating the program

    In order to properly stop the debugger, it is also necessary to disconnect from the board. Both can be done by pressing the Disconnect button on the top toolbar (see below).

    Finally, you can switch back to the normal perspective by pressing the button to the left of the bug icon in the top-right corner (see below).

    The digital microphone

    For the input we will use the I2S MEMS Microphone Breakout by Adafruit; in the following we will refer to this part simply as the Adafruit mic. In the following subsections, we will explain the key inputs and outputs of the MEMS microphone component, the I2S input protocol for the data transfer, and what is meant by a "breakout board".

    Overview of the MEMS microphone pins

    For portable devices, digital MEMS microphones are the popular choice for audio capture since they integrate both the analog microphone and the analog-to-digital converter that samples and quantizes the audio. MEMS is short for MicroElectroMechanical System, a process technology used to create tiny integrated devices or systems that combine mechanical and electrical components; MEMS are small, cheap, and easy to integrate into one's desired application.

    The connectors on a MEMS microphone are the following:

    The basic input pins are:

    • VDD: (usually) 3.3V to power the device.

    • GND: ground.

    • CLK: an external "clock" signal that drives the sampler in the A/D circuit. The sampling frequency for the Adafruit mic is , that is, the input clock should be 64 times the desired audio sampling frequency.

    A standard MEMS microphone typically returns a PDM (Pulse-Density Modulation) signal. This is essentially a 1-bit, 64-oversampled signal that requires downsampling and filtering in order to obtain a PCM (Pulse-Code Modulation) signal. PCM is the format typically used for storing and processing audio and it is indeed the format that we want to provide to the microcontroller. You can read more about PDM and PCM and and you can play with one-bit, oversampled signals .

    Luckily for us, the MEMS component in the Adafruit mic already provides us with a PCM signal (the circuit implements a decimator and a low-pass filter), which it outputs in the that we have seen in the previous section. Each sample is encoded over 32 nominal bits (that is, the binary words is 32-bit long) and word synchronization requires an additional input signal:

    • WS: a "word select" signal whose level transitions mark the beginning of a binary word; since there will be a data word per audio sample, the frequency for the WS signal must be equal to the sampling frequency, that is, equal to the CLK frequency divided by 64. Since two MEMS microphones can be connected in parallel to provide an interleaved stereo signal, the following convention is used: when WS goes HIGH, the MEMS whose SEL signal is HIGH will start to transmit while the MEMS whose SEL is LOW will remain in a tri-state output (essentially disconnected); conversely, when WS goes LOW, the MEMS whose SEL is low will start to transmit. Note that, because of the interleaving, the sampling frequency will need to be twice the nominal value.

    I2S timing diagram example

    Let's look at an example timing diagram from the single Adafruit microphone we will be using. We assume we have configured our microphone to be the left channel (that is, we set SEL=0).

    Figure: I2S MEMS microphone output timing diagram. The output data format is I2S, 24 bit, 2's complement, MSB first. p. 7 of .

    From the figure above, we can make several observations:

    1. After WS switches to LOW, we receive the first bit of information on the DATA line from the microphone, since SEL=0. When WS switches to HIGH (meaning a word is expected from the right channel microphone) the left channel microphone stays disconnected from the data bus.

    2. Each new bit is received at a rising edge and held for an entire period of CLK.

    3. The first 18 bits after a rising or falling edge of the WS signal corresponds to actual audio data, starting with the Most-Significant Bit (MSB) and finishing with the Least-Significant Bit (LSB).

    I2S wiring example

    In general, two MEMS microphones are usually connected in parallel according to the following diagram; the component called "IS2 Master" would be our microcontroller. The terms "master" and "slave" are quite common in electronics to describe the device which acts as the controller, and the devices(s) that are being controlled, respectively. See for more information on the terminology.

    Figure: I2S MEMS microphone wiring for stereo use. Note that in our exercises we will be using a mono, i.e. one channel, setup. p. 7 of .

    Some important observations can be made:

    • The DATA lines for the two microphones are connected to each other and are supplied as a single input to the I2S Master.

    • The SEL input for each microphone is set differently: SEL=VDD for the right-channel microphone and SEL=GND for the left-channel microphone. This is absolutely essential if two microphones are to share the same DATA line, as we explained before.

    • The two microphones use the same BCLK (aka CLK

    In this module, we will only use a single microphone, but the wiring from the microcontroller to the MEMS is identical.

    Adafruit breakout

    From the diagram above, we can observe that a MEMS microphone requires several additional components (capacitors and resistors) on top of the several inter-connecting wires. Instead of taking care of this part ourselves we can simply use a pre-made breakout board. With a breakout board, all the necessary components are pre-installed and we simply provide the connections for the signals/ports that need to interact with our microcontroller. In the case of our microphone, all the components (microphone, resistors, capacitors) are soldered on a compact board and convenient access can be given to the following signals:

    1. VDD and GND: provided by the microcontroller to power the microphone.

    2. WS and BCLK: generated by the microcontroller for the I2S transfer.

    3. SEL: wired by the user to either VDD or GND to configure the microphone appropriately.

    It is possible to design your own breakout boards using CAD tools for PCBs (Printed Circuit Boards). But for popular components like microphones, it is easy to find breakout boards that have already been designed. is a great place to find such boards and other cool electronics for personal projects, along with very well-explained user guides. The is the component that perfectly fits our needs.

    The Formulas

    The key technique behind simple pitch shifting is that of playing the audio file either more slowly (to lower the pitch) or faster (to raise the pitch). This is just like spinning an old record at a speed different than the nominal RPM value.

    In the digital world we can always simulate the effect of changing an analog playing speed by using fractional resampling; for a refresher on this technique please refer to . Resampling, however, has two problems:

    • the pitch of speech is changed but so is the overall speed, which we do not want;

    • the ratio of output to input samples for the operation is not one, so it cannot be implemented in real time.

    #define LAMBDA 0x00007999  // (int_32_t)(0.9 * 32768);
    
    static inline int16_t DCNotch(int16_t x) {
      static int16_t x_prev = 0;
      static int16_t y_prev = 0;
      y_prev = (((int32_t)y_prev * 0x00007999) >> 16) - x_prev + x;
      x_prev = x;
      return y_prev;
    }
    SEL: a "select" signal used to specify whether the microphone captures the left or the right channel in a stereo signal. For this reason, SEL can also be called LR on datasheets. Typically, SEL=0 for the left channel and SEL=1 for the right channel.
  • Bits 19-24 are set to 0 so our data precision is essentially 18 bits. Nonetheless, this zero-padding is required as the output format chosen by the manufacturer is: I2S, 24 bit, 2's complement, MSB first (p. 7 of datasheet).

  • Bits 25-32 are set to tri-state, effectively disconnecting the circuit from the data bus; it will stay disconnected until a new transition of WS to LOW is detected in order not to corrupt the signal from the microphone from the other channel.

  • ) and
    WS
    signal. This is also necessary for microphones using the same
    DATA
    line for synchronization purposes.

    DIN: input to the microcontroller from the microphone(s).

    fs=fCLK/64f_s = f_{CLK}/64fs​=fCLK​/64
    here
    here
    here
    I2S format
    datasheet
    here
    datasheet
    Adafruit
    I2S MEMS Microphone Breakout
    To overcome these limitations, we can use granular synthesis: we split the input signal into chunks of a given length (the grains) and we perform resampling on each grain independently to produce a sequence of equal-length output grains.

    Grain rate vs length

    In order to implement granular synthesis in real time we need to take into account the concepts of grain length and grain stride. A grain should be long enough so that it contains enough pitched speech for resampling to work; but it should also be short enough so that it doesn't straddle too many different sounds in an utterance. Experimentally, the best results for speech are obtained using grains between 20 and 40ms.

    The grain stride indicates the displacement in samples between successive grains and it is a function of grain length and of the overlap between successive grains. With no overlap, the grain stride is equal to the grain length; however, overlap between neighboring grains is essential to reduce the artifacts due to the segmentation. Overlapping output grains are blended together using a tapering window; the window is designed so that it performs linear interpolation between samples from overlapping grains.

    Call ρ\rhoρthe amount of overlap (as a percentage) between neighboring grains. With ρ=0\rho = 0ρ=0there is no overlap whereas with ρ=1\rho = 1ρ=1all the samples in a grain overlap with another grain. The relationship between grain length LLLand grain stride SSSis L=(1+ρ) SL = (1+\rho)\,SL=(1+ρ)S. This is illustrated in the following figure for varying degrees of overlap and a stride ofS=100S=100S=100samples; grains are represented using the shape of the appropriate tapering window:

    Note that the stride is constant for any amount of overlap and that each grain starts at the same instants independently of overlap; this is the key observation that will allow us to implement granular synthesis in real time.

    The grains' content

    We can express the content of the kkk-th output grain as

    wherex(t)x(t)x(t)is the interpolated, continuous-time version of the input signal andα\alphaαis the sampling rate change factor (with α<1\alpha < 1α<1for subsampling, i.e. to lower the pitch, and α>1\alpha > 1α>1for upsampling, i.e. to raise the pitch). Note that the kkk-th grain starts at n=kSn=kSn=kSand is built using input data from n=kSn=kSn=kSas well.

    In practice we will obviously perform local interpolation rather than full interpolation to continuous time, as explained in Lecture 3.3.2 on Coursera. Let t=kS+αmt=kS+\alpha mt=kS+αmand set T=⌊T⌋T = \lfloor T \rfloorT=⌊T⌋and τ=t−T\tau = t - Tτ=t−T; with this, the interpolation can be approximated as

    Causality

    Note that when we lower the voice's pitch (i.e. we implement the "Darth Vader" voice transformer), since α<1\alpha < 1α<1, the computation of the output grains is strictly causal, that is, at any point in time we only need to access past input samples. Indeed, when we oversample, only a fraction of the grain's data will be used to regenerate its content; if a grain's length is, say, 100, and we are lowering the frequency by α=2/3\alpha=2/3α=2/3, we will only need 2/3 of the grain's original data to build the new grain.

    By contrast when we raise the pitch we are using subsampling, that is, samples are being discarded to create an output grain and so, to fill the grain, we will need to "look ahead" and borrow data from beyond the original grain's end boundary. The algorithm therefore is noncausal but, crucially, we can exactly quantify the amount of lookahead and handle it via buffering.

    For instance, if we are raising the frequency by α=3/2\alpha=3/2α=3/2 and our grain length is, say, 100 samples, we will need a buffer of 50 "future" samples; this can be accomplished by accepting an additional processing delay of 50 samples. The difference between over- and under-sampling is clear when we look at the illustration in the notebook that shows the input sample index as a function of the output sample index:

    We will see in the next sections that buffering is required anyway in order to implement overlapping windows, so that the extra buffering required by subsampling will just be an extension of the general setup.

    The tapering window

    The tapering window is as long as the grain and it is shaped so that the overlapping grains are linearly interpolated. The left sloping part of the window is WWWsamples long, with W=L−S=ρS.W=L-S = \rho S.W=L−S=ρS. The tapering weights are therefore expressed by the formula:

    The output signal

    The full output signal can be expressed in closed form by looking at the following picture, which shows the periodic pattern of overlapping grains:

    Any output index nnncan be written as:

    kkk is the index of the current grain and mmm is the index of the sample within the current grain. Note that the sample at nnn is also the sample with index S+mS+mS+m with respect to the previous grain. With this, the output at nnnis the sum of the sample number mmm from the current grain plus the sample number S+mS+mS+mfrom the previous grain; both samples are weighed by the linear tapering slopew[⋅]w[\cdot]w[⋅]:

    Buffering

    Consider once again the grain computation pattern, periodic with period SSS; let's use the index mmm to indicate the current position inside the current pattern; as mmm goes from zero to SSS we need to compute:

    • gk[m]g_k[m]gk​[m] for all values of mmm

    • gk−1[S+m]g_{k-1}[S+m]gk−1​[S+m] for 0≤m<W0 \leq m < W0≤m<W (the tail of the previous grain).

    Which audio samples do we need to have access to at any given time? Without loss of generality, consider the grain for k=0k = 0k=0as in the following figure:

    We need to compute:

    • g0[m]=x(αm)g_0[m] = x(\alpha m)g0​[m]=x(αm) for 0≤m<S0 \leq m < S0≤m<S

    • g−1[S+m]=x(αm+(α−1) S)g_{-1}[S+m] = x(\alpha m + (\alpha - 1)\,S)g−1​[S+m]=x(αm+(α−1)S) for 0≤m<W0 \leq m < W0≤m<W

    If α≤1\alpha \leq 1α≤1 both expressions are causal so that we can use a standard buffer to store past values. The size of the buffer is determined by "how far" in the past we need to reach; in the limit, for α\alphaα close to zero, we need to access x(−S)x(-S)x(−S) from m=Wm=Wm=W when we compute the end of the tapering section, so that, in the worst case, the buffer must be as long as the grain size L=S+WL = S+WL=S+W. The overall processing delay of the voice changer in this case is equal to the size of the DMA transfer.

    If α>1\alpha > 1α>1, on the other hand, we need to also access future samples; this is of course not possible but we can circumvent the problem by introducing a larger processing delay. This is achieved by moving the input data pointer in the buffer further ahead with respect to the output data pointer. The maximum displacement between the current time and the future sample that we need takes place for m=Wm = Wm=W (i.e., at the end of the tapering slope) for which:

    By offsetting the input and output pointers by D=(α−1) LD = (\alpha -1)\,LD=(α−1)L samples, we can raise the pitch of the voice by α\alphaα at the price of a processing delay equal to DDDsamples.

    TASK 1: Determine the maximum range for α\alphaα if the size of the audio buffer is equal to the grain size LLL.

    Solutions

    Are you ready to see the answer? :)

    We have already seen that for α<1\alpha < 1α<1 we need a causal buffer whose maximum length is equal to LLL. For α>1\alpha > 1α>1 the needed buffer size is (1−α) L(1-\alpha)\,L(1−α)L, so if the maximum buffer size is LLL, we must have 0≤α<20 \leq \alpha < 20≤α<2.

    Lecture 3.3.2 on Coursera
    gk[m]=x(kS+αm),0≤m<Lg_k[m] = x(kS + \alpha m), \qquad 0 \leq m < Lgk​[m]=x(kS+αm),0≤m<L
    gk[m]≈(1−τ) x[T]+τ x[T+1].g_k[m] \approx (1-\tau)\,x[T] + \tau\, x[T+1].gk​[m]≈(1−τ)x[T]+τx[T+1].
    w[n]={n/W0≤n<W1W≤n<Sw[n] = \begin{cases} n/W & 0 \leq n < W \\ 1 & W \leq n < S \end{cases}w[n]={n/W1​0≤n<WW≤n<S​
    n=kS+m,k,m∈Z,0≤m<S;n = kS + m, \qquad k, m \in \mathbb{Z}, 0 \leq m < S;n=kS+m,k,m∈Z,0≤m<S;
    y[n]=(1−w[m]) gk−1[S+m]+w[m] gk[m]y[n] = (1-w[m])\,g_{k-1}[S+m] + w[m]\,g_k[m]y[n]=(1−w[m])gk−1​[S+m]+w[m]gk​[m]
    [αm+(α−1) S−m]m=W=(α−1)(S+W)=(α−1) L[\alpha m + (\alpha -1)\,S - m ]_{m=W} = (\alpha - 1)(S+W) = (\alpha - 1)\,L[αm+(α−1)S−m]m=W​=(α−1)(S+W)=(α−1)L
    fulfills the requirements for our projects
    this document
    Picture source
    Workspace selection

    Setting up the I/O

    The initialization code we generated in the blinking LED example will need to be updated as it does not perform the setup for the two I2S buses that we will need to communicate with the microphone and the DAC.

    Create a new project

    First, let's make a copy of our working LED blinking project. We want to keep tracks of old projects in order to be able to go back to a known working configuration if something is not functioning anymore. To copy the project use the "Project Explorer" of the SW4STM32 software. Open the project you want and do a simple copy/paste operation. When you paste the project, a pop-up will ask you to rename the copied project: we recommend choosing a name that includes the current date and the word "passthrough" in it for bookkeeping purposes.

    To finish the copying process:

    • make sure that the binary file of the original project is removed by deleting the .elf file in the Binaries folder of the new project.

    • rename the .ioc file with the name of the project

    Now we are ready to update the initialization code. From the project explorer, click on the IOC file of the new project and open the CubeMX configurator.

    Enable and configure the I2S buses

    When the IOC file has successfully loaded, you should see something similar to the figure below. On the left-hand column, select "Multimedia" and expose the I2S1 and I2S2 selectors.

    I2S1 (DAC)

    Let's begin by setting up the I2S channel that communicates with the DAC. Click on I2S1 and select the "Half-Duplex Master" for the Mode in the top middle panel.

    You should see several pins highlighted in green: after enabling an I2S bus, the interface shows in green the electrical pins in the microcontroller that will be devoted to the signals used in the I2S protocol. Recall that an I2S bus uses three pins according to the :

    1. Clock (CK).

    2. Word select (WS).

    3. Serial data (SD).

    Move your attention now to the "Configuration" panel below; we'll need to set up the structure of the data that transits on the bus (bits per word and per frame) and the data rate.

    Select the "Parameter Setting" tab and set the transmission mode to "Mode Master Transmit" and the Communication Standard to "I2S Philips".

    Now let's configure the DMA transfers. Select the "DMA Settings" tab and press "Add". Adjust the settings so that DMA Request is set to "SPI1_TX, Data Width is set to "Half Word" and Mode is set to "Circular", as in the screenshot below. Note that the DMA stream can differ if you are using a different microcontroler as it is dependent on the physical implementation of the internal circuitry.

    TASK 1: Finish the set up for I2S1 so that it can be used to communicate to the DAC by setting the Data and Frame Format and the Audio Frequency. You will have to check the in order to find the correct parameters (sampling frequency, data and frame format).

    I2S2 (microphone)

    Repeat the previous steps for I2S2 with the following differences:

    • set the Transmission Mode to "Mode Master Receive"

    • set the DMA request to "SPI2_RX

    Finally, complete the configuration:

    TASK 2: Finish the set up for I2S2 so that it can be used to communicate with the microphone by setting the Data and Frame Format and the Audio Frequency. You will have to check the in order to find the correct parameters (sampling frequency, data and frame format).

    Hint: make sure that the DAC and the microphone have the same "Selected Audio Frequency" while satisfying the specifications detailed on the datasheets! An audio frequency below the specified limits will most likely result in .

    As a final sanity check, click on "NVIC" under "System" in the left column and ensure that the interrupts are enabled for both selected DMA channels, as below.

    Configure the GPIO pins

    The configuration we have done so far would be sufficient in order to create an audio passthrough. However, we will configure two more pins of the microcontroller so that we can programmatically:

    1. Mute the DAC.

    2. Assign the microphone to the left or the right channel.

    Go back to the "Pinout" tab, as seen below.

    By clicking on any of the pins, you should be able to see the different functions that particular pin can assume, see below.

    We are interested in using two pins as "GPIO_Output" (GPIO stands for "General-Purpose Input/Output") in order to output a HIGH or LOW value to the Adafruit breakout boards. Set the pins "PCO" and "PC1" to "GPIO_Output" (see below). You can reset a pin to having no function by selecting "Reset_State".

    Just as in the case of variables in a program, we should give meaningful names to our GPIO pins. We will rename "PC0" and "PC1" as "MUTE" and "LR_SEL" respectively. You can rename a pin by right-clicking it and selecting "Enter User Label" (see below).

    Update initialization code

    If you now save the IOC file (or if you change perspective) the source code will be updated:

    If you have any of the source files open on SW4STM32, they should refresh automatically to reflect the settings you have changed in CubeMX. Remember that this is why you should not add or modify any section in the code outside of the USER CODE BEGIN and USER CODE END comments; outside of these tags, all code will usually replaced by a change in configuration.

    With the peripherals and initialization code updated, we can proceed to !

    Tasks solutions

    Are you sure you are ready to see the solution? ;)

    The transmission mode is defined by the fact that the peripheral is a DAC, thus the I2S internal peripheral of the micro-controller will have to transmit data to the DAC. The mode to select is then "Master transmit".

    The communication standard can be either "I2S" or "LSB-justified" as shown in section 1.2 of the , we will then choose "I2S Phillips" as it is the default value selected when SF0 and SF1 of the breakout are not connected.

    The second paragraph of section 3 of the datasheet says:

    The UDA1334ATS supports the I2S-bus data format with word lengths of up to 24 bits and the LSB-justified serial data format with word lengths of 16, 20 and 24 bits.

    In the code, we will be using 16-bit samples, so the word size is 16 bit. It is not so clear what is meant by "frame" in this context, since the term is not part of the . Nevertheless, we assume that, since the word size could be up to 24 bit, we should choose a "frame" of 32 bits. This is confirmed experimentally in the sense that, if we choose a frame of 16 bits, the passthrough does not work. You could also test both parameters and control with a logic analyzer what is the frame length. Such type of missing information is often encountered when reading a datasheet.

    Numerical precision

    Coding a "real-world" DSP application on dedicated hardware is a bit of a shock when we are used to the idealized world of theoretical derivations and nowhere is the disconnect more profound than when we need to take numerical precision explicitly into account.

    float vs. int

    Floating point numbers, as implemented in most architecture today, free us of the need to explicitly consider the range of the numeric values that appear in our algorithm. Although not without caveats, a 64-bit double in C is pretty much equivalent to an ideal real number for most intents and purposes, with a dynamic range (that is, the ratio between the smallest and largest numbers that can be represented) in excess of .

    /* Infinite loop */
    /* USER CODE BEGIN WHILE */
    while (1) {
        HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin);
        HAL_Delay(1000);  // in ms
    /* USER CODE END WHILE */
    
    /* USER CODE BEGIN 3 */
    
    }
    /* USER CODE END 3 */
    "Unplugged target or STLink already in use or STLink USB driver not installed."
    However, operations with floating point variables can take significantly more time than the same operations with integer variables on a microcontroller; on the Nucleo, for instance, we noticed that an implementation with floating-point variables can take up to 35% more processing time than an equivalent implementation with integer variables

    If we try to avoid floats, then we need to use some form of fixed-point representation for our quantities. Implementing algorithms in fixed point is truly an art, and a difficult one at that. In the rest of this section we will barely scratch the surface and give you some ideas on how to proceed.

    Fixed-point representation

    The idea behind fixed point representations is to encode fractional number as integers, and assuming the position of the decimal point implicitly.

    In our case, let's start with a reasonable assumption: the audio samples produced by the soundcard are signed decimal numbers in the (−1,1)(-1, 1)(−1,1)open interval. How can we represent numbers in this interval via integers and, more importantly, how does this affect the way we perform computations?

    Since we are all more familiar with numbers in base 10, let's start with a 2-digit fixed point representation in base 10 for fractional numbers between -1 and 1. With this, for instance, the number 0.35 will be represented by the integer 35; more examples are shown in this table:

    decimal representation

    2-digit fixed-point representation

    0.35

    +35

    -0.2

    -20

    0.1234

    +12

    1.3

    +99

    Note that since we can only have 2 digit, the number 0.1234 will have to be truncated to the representation 12. Similarly, we will not be able to encode numbers greater that 0.99 or smaller than -0.99, which will induce an overflow in the representation. That's OK, a finite number of digits involves a loss of precision and this makes sense.

    It is clear that in this representation we go from decimal numbers to integers by multiplying the decimal number by 102=10010^2 = 100102=100 (see the 2 in the exponent: that's our number of digits) and taking the integer part of the result. Vice-versa, we can go back to the decimal representation by dividing the integer by 100.

    We can also choose at one point to, say, increase the precision of our representation. In this example, if we were to now use five digits,

    decimal representation

    5-digit fixed-point representation

    0.35

    +35000

    -0.2

    -20000

    0.1234

    +12340

    1.3

    +99999

    It's clear that we can convert a 2-digit representation into a 5-digit representation by adding three zeros (i.e. by multiplying by 1000), and vice versa. Note however that increasing the precision does not protect us against overflow: the maximum range of our variables does not change in fixed point, only the granularity of the representation.

    Fixed-point arithmetic

    The tricky part with fixed-point is when we start to do math. Let's have a quick look at the basic principles, but remember that the topic is very vast!

    Multiplication

    The first obvious thing is that when we multiply two 2-digit integers the result can take up to four digits. This case is however easy to handle because it only requires renormalization and it entails "simply" a loss of precision but not overflow.

    For example, if we were to multiply two decimal numbers together, we would have something like:

    If we use fixed-point representations, as long as the multiplication is carried out in double precision, we can renormalize to the original precision by dropping the two least significant digits:

    In the next section we will use this notation to indicate a multiplication in double precision followed by renormalization:

    Addition

    Addition is a bit trickier in the sense that the sum (or difference) of two numbers can result in overflow:

    This is of course mirrored by the fixed-point representation

    The result is not representable with two digits and if we cap it at 99 we have a type of distortion that is very different from the rounding that we performed in the case of multiplication.

    There is no easy solution to this problem and often it all depends on writing the code that performs the required operations in a smart way that avoids overflow (or makes it very unlikely). For instance, suppose we want to compute the average of two numbers:

    In theory, the way in which the average is computed makes no difference and, ifa=0.72a=0.72a=0.72andb=0.55b=0.55b=0.55, we would usually compute the sum first and then divide by two:

    In fixed-point, however, the order of operations does matter. If we start with the sum, we immediately overflow and, assuming overflows are capped at their maximum value, we obtain

    which is a really wrong value. On the other hand, suppose we compute the average as a/2+b/2a/2 + b/2a/2+b/2. In fixed point this becomes

    which is a totally acceptable approximation of the average's true value!

    Two's complement

    To encode signed integer in binary representation, the most common format is known as two's complement; this format allows for the normal addition operations to work across a range of representable positive and negative numbers.

    The main idea is that of addition of positive integers with truncated overflow and it originates in mechanical calculators whose digits roll around to zero after overflow. Suppose that we are using a single decimal digit; we can obviously use the digit to represent ten positive values from zero to 9. Alternatively, we can use the digits from 0 to 4 to represent themselves and map the digits from 5 to 9 to the negative numbers -5 to -1, in that order. With this "complement" representation, here is how addition now works:

    normal notation

    complement notation

    1 + 1 = 2

    1 + 1 = 2

    2 + 1 = 3

    2 + 1 = 3

    2 - 2 = 0

    2 + 8 = 0 (10, with truncated overflow)

    3 -2 = 1

    3 + 8 = 1 (11, with truncated overflow)

    -2 - 2 = -4

    8 + 8 = 6 (6 is mapped to -4)

    The same concept can be extended to multi-digit numbers and, obviously, to binary digits, in which case the representation is called "two's complement". In the binary case, the notation is particularly simple: to negate a positive binary number we need to invert all its digits and add one. For instance, using 4 bits, the decimal value 4 is 0100; the value -4 is therefore 1011 + 0001 = 1100. With this, 4 - 4 = 0100 + 1100 = (1)0000 = 0

    Note that in two's complement notation, the value of the leading bits indicates the sign of the number, with zeros for positive quantities and ones for negatives. With 16-bit words and using hexadecimal notation, for instance, the numbers 0x0000 to 0x7FFF (zero to 32767 in decimal) have their most significant bit equal to zero and they represent positive quantities. Conversely, the number 0x8000 is mapped to -32768, 0x8001 to -32767, all the way up to 0xFFFF which represents -1.

    This representation allows for an easy implementation of divisions by powers of two as right shifts: when dividing by two, we simply need to shift the word to the right by one, making sure to extend the value of the most significant bit. Consider four-bit words for simplicity:

    decimal

    binary two's complement

    4 / 4 = 1

    0100 >> 2 = 0001

    -4 / 4 = -1

    1100 >> 2 = 1111

    In the C language standard, the implementation of a right shift is left undetermined as to the propagation of the sign bit. On the Nucleo, however, you can safely use right-shift renormalization since the shifts preserve the sign.

    Fixed-point programming in C

    In the C language standard, the behavior of many numeric types is not standardized and is dependent on the compiler. To avoid unexpected side effects, in numerical programming it is customary to include the header <types.h> in which numeric types are defined precisely. In our code we will use the following types:

    • int16_t: 16-bit integers, two's complement representation. Ranges from -32768 (0x8000) to 32767 (0x7FFF). Zero is 0x0000 and -1 is 0xFFFF.

    • int32_t: 32-bit integers, two's complement representation. Used to perform multiplications prior to rescaling.

    The provided types also include unsigned versions such as uint8_t and uint16_t, which can be used when the sign is not needed; for instance, an uint16_t ranges from zero to 65535 (0xFFFF).

    Since we will be using integer arithmetic, here are a few practical rules that will be useful to understand and write the C code

    • all audio samples, unless specified otherwise, are assumed to be values in the [−1,1)[-1, 1)[−1,1) range and represented by 16-bit words, two's complement;

    • to convert a floating point number x∈[−1,1)x \in [-1, 1)x∈[−1,1) to its fixed-point representation, use int16_t x16 = (int16_t)(x * 0x7FFF)

    • to multiply two 16-bit variables using double precision and rescaling, use int16_t z = (int16_t)(((int32_t)x * (int32_t)y)) >> 15)

    • careful with overflow when performing addition.

    1060010^{600}10600
    0.23×0.31=0.0713≈0.070.23 \times 0.31 = 0.0713 \approx 0.070.23×0.31=0.0713≈0.07
    (+23)×(+31)=+0713⟶+07(+23) \times (+31) = +0713 \longrightarrow +07(+23)×(+31)=+0713⟶+07
    [(+23)×(+31)]=+07.[(+23) \times (+31)] = +07.[(+23)×(+31)]=+07.
    0.72+0.55=1.27>10.72 + 0.55 = 1.27 > 10.72+0.55=1.27>1
    (+72)+(+55)=127>99(+72) + (+55) = 127 > 99(+72)+(+55)=127>99
    a+b2\frac{a+b}{2}2a+b​
    (0.72+0.55)×0.5=0.623.(0.72 + 0.55) \times 0.5 = 0.623.(0.72+0.55)×0.5=0.623.
    [((+72)+(+55))×(+50)]=[(+99)×(+50)]=49[((+72) + (+55)) \times (+50)] = [(+99) \times (+50)] = 49[((+72)+(+55))×(+50)]=[(+99)×(+50)]=49
    [(+72)×(+50)]+[(+55)×(+50)]=(+36)+(+27)=(+63)[(+72) \times (+50)] + [(+55) \times (+50)] = (+36) + (+27) = (+63)[(+72)×(+50)]+[(+55)×(+50)]=(+36)+(+27)=(+63)

    Lastly, the Audio frequency has to be defined. It is important to keep in mind that a faster sampling frequency implies less time for the micro-controller to process each sample. On the other hand, a slow sampling frequency impacts the quality of the signal as it reduces its bandwidth.

    The pin called "PLL0" is set to 0 by default (according to the schematic), which means that the chip is in audio mode. Section 8.1.1, explains that in this mode the pin "PLL1" selects for audio frequency from 16 to 50 kHz (PLL1 = LOW) or from 50 to 100 kHz (PLL1 = HIGH). In this breakout, PLL1 is set to LOW according to the schematic. In order to make our final choice we will chose 32 kHz, this choice will be confirmed by task 2.

    The transmission mode is defined by the fact that the peripheral is a microphone, thus the I2S internal peripheral of the micro-controller will have to receive data form the microphone. The mode to select is then "Master Receive".

    The communication standard is "I2S" or "LSB-justified" as shown in first paragraph of page 7 of the datasheet, we will then choose "I2S Phillips" like done for I2S1.

    This datasheet gives more information about the Data and Frame format. We will chose the same parameter as for I2S1 but figure 7 of the datasheet shows us that the frame is 32bits and that the microphone will send 18 bits with the actual value, then 6 of 0 state and then 8 of tri-state. Nevertheless, we will chose "16 Bits Data on 32 Bits Frame" in order to have a faster processing.

    The Audio frequency has to be defined. This device is a bit more restrictive that the DAC. Indeed in page 7 of the datasheet we can read the following: Clock frequencies from 2.048Mhz to 4.096MHz are supported so sampling rates from 32KHz to 64KHz can be had by changing the clock frequency. In this case we clearly see that a frequency slower than 32kHz will not work properly.

    I2S specification
    DAC datasheet
    microphone datasheet
    aliasing
    wiring the breakout boards
    datasheet
    original I2S specification

    Implementation

    We are building a real-time system, so the output data rate will necessarily be equal to the input data rate. In the previous section we saw that grains are produced via a periodic pattern whose period is equal to the stride length. It would make perfect sense, therefore, to set the length of the DMA buffer equal to the stride and let that be the cadence of the processing function.

    Unfortunately this simple approach clashes with the capabilities of the hardware and so we need to trade resources for some extra code complexity: welcome to the world of embedded DSP!

    Memory limitations

    If we play around with the Jupyter notebook implementation of granular synthesis, we can quickly verify that the voice changer works best with a grain length of about 30ms and an overlap factor of about 50%. Using the formula derived in the previous section, this gives us a grain stride of 20ms.

    Now, remember that the smallest sampling frequency of our digital microphone is 32KHz so that 20ms correspond to 640 samples. Each sample is 2 bytes and the I2S protocol requires us to allocate a stereo buffer. This means that each DMA half-buffer will be

    Since we need to use double buffering for DMA, and since we need symmetric input and output buffers, in total we will need to allocate over 10KB of RAM to the DMA buffers alone; when we start adding the internal buffering required for computation, we are going to quickly exceed the 16KB available on the Nucleo F072RB!

    (As a side note, although 16KB may seem ludicrously low these days, remember that small memory footprints are absolutely essential for all devices that are not a personal computer. The success of IoT hinges upon low memory and low power consumption!)

    To avoid the need of large DMA buffers, we will implement granular synthesis using the following tricks:

    • to save memory, all processing will be carried out on a mono signal;

    • we will use a single internal circular buffer that holds enough data to build the grains; we have seen in the previous section that we need a buffer at most as long as the grain. Using mono samples, this will require a length of 1024 samples, for a memory footprint of 2 KBytes.

    • we will fill the internal buffer with short DMA input transfers and compute a corresponding amount of output samples for each DMA call; DMA transfers can be as short as 16 or 32 samples each, thereby reducing the amount of memory required by the DMA buffers.

    The code

    To code the granular synthesis algorithm, copy and paste the Alien Voice project from within the STM32CubeIDE environment. We recommend choosing a name with the current date and "granular" in it. Remember to delete the old binary (ELF) file inside the copied project.

    Here, we will set up the code for the "Darth Vader" voice transformer and will consider more advanced modifications in the next section.

    DMA size

    As we explained, the idea is to fill the main audio buffer in small increments to save memory. To this end, set the DMA half-buffer size to 32 samples in the USER CODE BEGIN PV section:

    Grain size and taper

    We will use a grain length of samples which corresponds to about 30ms for a sampling rate of 32KHz. The overlap is set at 50%, i.e., we will use a tapering slope of samples. The resulting grain stride is .

    TASK 1: Write a short Python function that returns the values of a tapering slope for a given length.

    Add the following lines to the USER CODE BEGIN 4 section in main.c, where the values for the tapering slope are those computed by your Python function:

    Main buffer

    We choose the buffer length to be equal to the size of the grain, since anyways the voice transformer doesn't sound too good for . With a size equal to a power of two, we will be able to use bit masking to enforce circular access to the buffer. Add the following lines after the previous ones:

    With these values the buffers are set up for causal operation (i.e., for lowering the voice pitch); we will tackle the problem of noncausal operation later.

    You can now examine the memory footprint of the application by compiling the code and looking at the "Build Analyzer" tab on the lower right corner of the IDE. You should see that we are only using less than 30% of the onboard RAM.

    Processing function

    This is the main processing function:

    The processing loop uses an auxiliary function Resample(uint16_t m, uint16_t N) that is supposed to return the interpolated value .

    A simplistic implementation is to return the sample with integer index closest to :

    TASK 2: Write version of Resample() that performs proper linear interpolation between neighboring samples.

    Benchmarking

    Since our processing function is becoming a bit more complex than before, it is interesting to start benchmarking its performance.

    Remember that, at 32KHz, we can use at most per sample; we can modify the timing function to return the number of microseconds per sample like so:

    If we now use the , we can see that the current implementation (with the full fractional resampling code) requires between and per sample, which is well below the limit. The oscillation between the two values reflects the larger computational requirements during the tapering slope.

    Solutions

    Are you ready to see the answer? :)

    With the resulting table is

    Here is the complete resampling function:

    we will use a "smart choice" for the size of the grain, the tapering and the DMA transfer, so as to minimize processing

    2∗2∗640=2560 bytes.2 * 2 * 640 = 2560 \rm{~bytes.}2∗2∗640=2560 bytes.
    L=1024L=1024L=1024
    W=384W=384W=384
    S=640S=640S=640
    α>1.5\alpha > 1.5α>1.5
    x(N+αm)x(N + \alpha m)x(N+αm)
    N+αmN + \alpha mN+αm
    30μs30\mu s30μs
    5.2μs5.2\mu s5.2μs
    8.5μs8.5\mu s8.5μs
    W=384W=384W=384
    method described before
    #define FRAMES_PER_BUFFER 32
    // grain length; 1024 samples correspond to 32ms @ 32KHz
    #define GRAIN_LEN 1024
    // length of the tapering slope using 50% overlap
    #define TAPER_LEN 384
    #define GRAIN_STRIDE (GRAIN_LEN - TAPER_LEN)
    
    // tapering slope, from 0 to 1 in TAPER_LEN steps
    static int32_t TAPER[TAPER_LEN] = {...};
    #define BUF_LEN 1024
    #define BUFLEN_MASK (BUF_LEN-1)
    static int16_t buffer[BUF_LEN];
    
    // input index for inserting DMA data
    static uint16_t buf_ix = 0;
    // index to beginning of current grain
    static uint16_t curr_ix = 0;
    // index to beginning of previous grain
    static uint16_t prev_ix = BUF_LEN - GRAIN_STRIDE;
    // index of sample within grain
    static uint16_t grain_m = 0;
    inline static void VoiceEffect(int16_t *pIn, int16_t *pOut, uint16_t size) {
      // put LEFT channel samples to mono buffer
      for (int n = 0; n < size; n += 2) {
        buffer[buf_ix++] = pIn[n];
        buf_ix &= BUFLEN_MASK;
      }
    
      // compute output samples
      for (int n = 0; n < size; n += 2) {
        // sample from current grain
        int16_t y = Resample(grain_m, curr_ix);
        // if we are in the overlap zone, compute sample from previous grain and mix using tapering slope
        if (grain_m < TAPER_LEN) {
          int32_t z = Resample(grain_m + GRAIN_STRIDE, prev_ix) * (0x07FFF - TAPER[grain_m]);
          z += y * TAPER[grain_m];
          y = (int16_t)(z >> 15);
        }
        // put sample into both LEFT and RIGHT output slots
        pOut[n] = pOut[n+1] = y;
        // update index inside grain; if we are at the end of the stride, update buffer indices
        if (++grain_m >= GRAIN_STRIDE) {
          grain_m = 0;
          prev_ix = curr_ix;
          curr_ix = (curr_ix + GRAIN_STRIDE) & BUFLEN_MASK;
        }
      }
    }
    // rate change factor
    static int32_t alpha = (int32_t)(0x7FFF * 2.0 / 3.0);
    
    inline static int16_t Resample(uint16_t m, uint16_t start) {
      // non-integer index
      int32_t t = alpha * (int32_t)m;
      int16_t T = (int16_t)(t >> 15) + (int16_t)start;
      return buffer[T & BUFLEN_MASK];
    }
    #define STOP_TIMER {\
      timer_value_us = 1000 * __HAL_TIM_GET_COUNTER(&htim2) / FRAMES_PER_BUFFER;\
      HAL_TIM_Base_Stop(&htim2); }
    def make_taper(W):
        taper = 32767.0 * np.arange(0, W) / W
        print("#define TAPER_LEN {}".format(W))
        print("static int32_t TAPER[TAPER_LEN] = {", end='\n\t')
        for n in range(W - 1):
            print('0x{:04X}, '.format(np.uint16(taper[n])), end='' + '\n\t' if (n+1) % 12 == 0 else '')
        print('0x{:04X}}};'.format(np.uint16(taper[W-1]))){% endtab %}
    static int32_t TAPER[TAPER_LEN] = {
      0x0000, 0x0055, 0x00AA, 0x00FF, 0x0155, 0x01AA, 0x01FF, 0x0255, 0x02AA, 0x02FF, 0x0355, 0x03AA,
      0x03FF, 0x0455, 0x04AA, 0x04FF, 0x0555, 0x05AA, 0x05FF, 0x0655, 0x06AA, 0x06FF, 0x0755, 0x07AA,
      0x07FF, 0x0855, 0x08AA, 0x08FF, 0x0955, 0x09AA, 0x09FF, 0x0A55, 0x0AAA, 0x0AFF, 0x0B55, 0x0BAA,
      0x0BFF, 0x0C55, 0x0CAA, 0x0CFF, 0x0D55, 0x0DAA, 0x0DFF, 0x0E55, 0x0EAA, 0x0EFF, 0x0F55, 0x0FAA,
      0x0FFF, 0x1055, 0x10AA, 0x10FF, 0x1155, 0x11AA, 0x11FF, 0x1255, 0x12AA, 0x12FF, 0x1355, 0x13AA,
      0x13FF, 0x1455, 0x14AA, 0x14FF, 0x1555, 0x15AA, 0x15FF, 0x1655, 0x16AA, 0x16FF, 0x1755, 0x17AA,
      0x17FF, 0x1855, 0x18AA, 0x18FF, 0x1955, 0x19AA, 0x19FF, 0x1A55, 0x1AAA, 0x1AFF, 0x1B55, 0x1BAA,
      0x1BFF, 0x1C55, 0x1CAA, 0x1CFF, 0x1D55, 0x1DAA, 0x1DFF, 0x1E55, 0x1EAA, 0x1EFF, 0x1F55, 0x1FAA,
      0x1FFF, 0x2055, 0x20AA, 0x20FF, 0x2155, 0x21AA, 0x21FF, 0x2255, 0x22AA, 0x22FF, 0x2355, 0x23AA,
      0x23FF, 0x2455, 0x24AA, 0x24FF, 0x2555, 0x25AA, 0x25FF, 0x2655, 0x26AA, 0x26FF, 0x2755, 0x27AA,
      0x27FF, 0x2855, 0x28AA, 0x28FF, 0x2955, 0x29AA, 0x29FF, 0x2A55, 0x2AAA, 0x2AFF, 0x2B54, 0x2BAA,
      0x2BFF, 0x2C54, 0x2CAA, 0x2CFF, 0x2D54, 0x2DAA, 0x2DFF, 0x2E54, 0x2EAA, 0x2EFF, 0x2F54, 0x2FAA,
      0x2FFF, 0x3054, 0x30AA, 0x30FF, 0x3154, 0x31AA, 0x31FF, 0x3254, 0x32AA, 0x32FF, 0x3354, 0x33AA,
      0x33FF, 0x3454, 0x34AA, 0x34FF, 0x3554, 0x35AA, 0x35FF, 0x3654, 0x36AA, 0x36FF, 0x3754, 0x37AA,
      0x37FF, 0x3854, 0x38AA, 0x38FF, 0x3954, 0x39AA, 0x39FF, 0x3A54, 0x3AAA, 0x3AFF, 0x3B54, 0x3BAA,
      0x3BFF, 0x3C54, 0x3CAA, 0x3CFF, 0x3D54, 0x3DAA, 0x3DFF, 0x3E54, 0x3EAA, 0x3EFF, 0x3F54, 0x3FAA,
      0x3FFF, 0x4054, 0x40AA, 0x40FF, 0x4154, 0x41AA, 0x41FF, 0x4254, 0x42AA, 0x42FF, 0x4354, 0x43AA,
      0x43FF, 0x4454, 0x44AA, 0x44FF, 0x4554, 0x45AA, 0x45FF, 0x4654, 0x46AA, 0x46FF, 0x4754, 0x47AA,
      0x47FF, 0x4854, 0x48AA, 0x48FF, 0x4954, 0x49AA, 0x49FF, 0x4A54, 0x4AAA, 0x4AFF, 0x4B54, 0x4BAA,
      0x4BFF, 0x4C54, 0x4CAA, 0x4CFF, 0x4D54, 0x4DAA, 0x4DFF, 0x4E54, 0x4EAA, 0x4EFF, 0x4F54, 0x4FAA,
      0x4FFF, 0x5054, 0x50AA, 0x50FF, 0x5154, 0x51AA, 0x51FF, 0x5254, 0x52AA, 0x52FF, 0x5354, 0x53AA,
      0x53FF, 0x5454, 0x54AA, 0x54FF, 0x5554, 0x55A9, 0x55FF, 0x5654, 0x56A9, 0x56FF, 0x5754, 0x57A9,
      0x57FF, 0x5854, 0x58A9, 0x58FF, 0x5954, 0x59A9, 0x59FF, 0x5A54, 0x5AA9, 0x5AFF, 0x5B54, 0x5BA9,
      0x5BFF, 0x5C54, 0x5CA9, 0x5CFF, 0x5D54, 0x5DA9, 0x5DFF, 0x5E54, 0x5EA9, 0x5EFF, 0x5F54, 0x5FA9,
      0x5FFF, 0x6054, 0x60A9, 0x60FF, 0x6154, 0x61A9, 0x61FF, 0x6254, 0x62A9, 0x62FF, 0x6354, 0x63A9,
      0x63FF, 0x6454, 0x64A9, 0x64FF, 0x6554, 0x65A9, 0x65FF, 0x6654, 0x66A9, 0x66FF, 0x6754, 0x67A9,
      0x67FF, 0x6854, 0x68A9, 0x68FF, 0x6954, 0x69A9, 0x69FF, 0x6A54, 0x6AA9, 0x6AFF, 0x6B54, 0x6BA9,
      0x6BFF, 0x6C54, 0x6CA9, 0x6CFF, 0x6D54, 0x6DA9, 0x6DFF, 0x6E54, 0x6EA9, 0x6EFF, 0x6F54, 0x6FA9,
      0x6FFF, 0x7054, 0x70A9, 0x70FF, 0x7154, 0x71A9, 0x71FF, 0x7254, 0x72A9, 0x72FF, 0x7354, 0x73A9,
      0x73FF, 0x7454, 0x74A9, 0x74FF, 0x7554, 0x75A9, 0x75FF, 0x7654, 0x76A9, 0x76FF, 0x7754, 0x77A9,
      0x77FF, 0x7854, 0x78A9, 0x78FF, 0x7954, 0x79A9, 0x79FF, 0x7A54, 0x7AA9, 0x7AFF, 0x7B54, 0x7BA9,
      0x7BFF, 0x7C54, 0x7CA9, 0x7CFF, 0x7D54, 0x7DA9, 0x7DFF, 0x7E54, 0x7EA9, 0x7EFF, 0x7F54, 0x7FA9};
    inline static int16_t Resample(uint16_t m, uint16_t start) {
      // non-integer index
      int32_t t = alpha * (int32_t)m;
      // anchor sample
      int16_t T = (int16_t)(t >> 15) + (int16_t)start;
      // fractional part
      int32_t tau = t & 0x07FFF;
      // compute linear interpolation
      int32_t y = (0x07FFF - tau) * buffer[T & BUFLEN_MASK] + tau * buffer[(T+1) & BUFLEN_MASK];
      return (int16_t)(y >> 15);
    }

    Coding the passthrough

    In this section, we will guide you through programming the microcontroller in order to implement the passthrough. Many of the concepts in this section lay the foundations for how to structure and code a real-time audio application on the microcontroller. In later sections we will build more complex processing functions, but the architecture of the code will remain the same.

    In the previous section, you should have copied the blinking LED project before updating the IOC file with CubeMX. From the SW4STM32 software, open the file "Src/main.c" in the new project; we will be making all of our modifications here.

    Macros

    In programming a microcontroller, it is customary to define preprocessor macros to set the values of reusable constants and to concisely package simple tasks that do not require much logic and flow control and for which, therefore, a function call would be overkill. See for more on macros and preprocessor directives when programming in C.

    Macros are usually defined before the main function; we will place our macros between the USER CODE BEGIN Includes and USER CODE END Includes comment tags.

    The MUTE macro

    As an example, we will begin by creating macros to change the logical level of the MUTE pin. As in the blinking LED example, we will be using HAL library calls in order to modify the state of the MUTE GPIO pin.

    TASK 1: Complete the two macros below -MUTE and UNMUTE- in order to mute/unmute the output. Simply replace the XXX in the definitions with eitherGPIO_PIN_SET or GPIO_PIN_RESET, according to whether you need a HIGH or LOW level.

    Hint: you should check the to determine whether you need a HIGH or LOW value to turn on the mute function of the DAC.

    Note how the MUTE pin that we configured before automatically generates two constants called MUTE_GPIO_Port and MUTE_Pin, which is why we suggested giving meaningful names to pins configured with the CubeMX tool.

    If you press "Ctrl" ("Command" on MacOS) + click on MUTE_GPIO_Port or MUTE_Pin to see its definition, you should see how the values are defined according to the pin we selected for MUTE. In our case, we chose pin PC0 which means that Pin 0 on the GPIO C port will be used. The convenience of the CubeMX software is that we do not need to manually write these definitions for the constants! The same can be said for LR_SEL.

    The Channel Select macro

    We will now define two more macros in order to assign the MEMS microphone to the left or right channel of the I2S bus, using the LR_SEL pin we defined previously. As before, you should place these macros between the USER CODE BEGIN Includes and USER CODE END Includes comments.

    TASK 2: Define two macros - SET_MIC_RIGHT and SET_MIC_LEFT - in order to assign the microphone to the left or right channel. You will need to use similar commands as for the MUTE macros.

    Hint: you should check the (and perhaps the ) to determine whether you need a HIGH or LOW value to set the microphone to the left/right channel.

    Private variables (aka Constants)

    In most applications we will need to set some numerical constants that define key parameters used in the application.

    These definitions are also preprocessing macros and they are usually grouped together at the beginning of the code between the USER CODE BEGIN PV and USER CODE END PV comment tags.

    We will now define a few constants which will be useful in coding our application. Before defining them in our code, let's clarify some of the terminology:

    1. Sample: a sample is a single discrete-time value; for a stereo signal, a sample can belong either to the left or right channel.

    2. Frame: a frame collects all synchronous samples from all channels. For a stereo signal, a frame will contain two samples, left and right.

    3. Buffer length: a buffer is a collection of frames, stored in memory and ready for processing (or ready for a DMA transfer). The buffer's length is a key parameter that needs to be fine-tuned to the demands of our application, .

    Audio Parameters

    Add the following lines to define the frame length (in terms of samples) and the buffer length (in terms of frames):

    SAMPLES_PER_FRAME is set to 2 as we have two input channels (left and right) as per the I2S protocol.

    Since our application is a simple passthrough, which involves no processing, we can set the buffer length - FRAMES_PER_BUFFER - to a low value, e.g. 32.

    Data buffers

    Again, as explained in Lecture 2.2.5b in the , for real-time processing we normally need to use alternating buffers for input and output DMA transfers. The I2S peripheral of our microcontroller, however, conveniently sends two interrupt signals, one when the buffer is half-full and one when the buffer is full. Because of this feature, we can simply use an array that is twice the size of our target application's buffer and let the DMA transfer fill one half of the buffer while we simultaneously process the samples in the other half.

    TASK 3: Using the constants defined before - SAMPLES_PER_FRAME and FRAMES_PER_BUFFER - define two more constants for the buffer size and for the size of the double buffer. Just replace the ellipsis in the macros below with the appropriate expressions.

    Finally, we can create the input and output buffers as such:

    Private function prototypes

    In this section we will declare the function prototypes that implement the final application. The code should be placed between the USER CODE BEGIN PFP and USER CODE END PFP comment tags.

    Main processing function

    Ultimately, the application will work by obtaining a fresh data buffer filled by the input DMA transfer, processing the buffer and placing the result in a data buffer for the output DMA to ship out. We will therefore implement a main processing function with the following arguments:

    1. a pointer to the input buffer to process

    2. a pointer to the output buffer to fill with the processed samples

    3. the number of samples to read/write.

    The resulting function prototype is:

    This will be the main processing function which will be invoked by the interrupts raised by the DMA transfer every time either the first or the second half of the buffer has been filled.

    DMA callback functions

    As previously mentioned, the STM32 board uses DMA to transfer data in and out of memory from the peripherals and issues interrupts when the DMA buffer is half full and when it's full.

    The HAL family of instructions allows us to define triggered by these interrupts. Add the following function definitions for the callbacks, covering the four cases of two input and output DMAs times two interrupt signals:

    Note that the Rx callbacks (that is, the callbacks triggered by the input DMAs), have an empty body and only the Tx callbacks (that is, the ones driven by the output process) perform the processing via our process function.

    This is a simple but effective way of synchronizing the input and the output peripherals when we know that the data throughput should be the same for both devices. Of course we can see that if the process function takes too long, the buffer will not be ready in time for the next callback and there will be audio losses. In the next chapter, we will introduce a mechanism to monitor this.

    You can read more about the HAL functions for DMA Input/Output for the I2S protocol in the comments of the file "Drivers/STM32F0XX_HAL_Driver/Src/stm32f0xx_hal_i2s.c" from the SW4STM32 software:

    The user application

    Between the USER CODE BEGIN 4 and USER CODE END 4 comment tags, we will define the body of the process function which, in this case, implements a simple passthrough.

    TASK 4: Complete the main processing function which simply copies the input to the output buffer.

    Initial Setup

    Between the USER CODE BEGIN 2 and USER CODE END 2 comment tags, we need to initialize our STM32 board, namely we need to:

    1. un-mute the DAC using the macro defined .

    2. set the microphone to either left or right channel using the macro defined .

    3. start the receive and transmit DMAs with HAL_I2S_Receive_DMA and HAL_I2S_Transmit_DMA respectively.

    This is accomplished by the following lines:

    We can now try building and debugging the project (remember to press Resume after entering the Debug perspective). If all goes well, you should have a functioning passthrough and you should be able to hear in the headphones the sound captured by the microphone.

    Going a bit further

    If you still have time and you are curious to go a bit further, we propose to make a modification to theProcess function. In the current implementation, since the input is mono and the output is stereo, you may have noticed that only one output channel carries the audio while the other is silent. Wouldn't it be nice if both had audio, thereby converting the mono input to a stereo output?

    BONUS: Modify theProcess function so that both output channels contain audio.

    Note: remember to copy your project before making any significant modifications; that way you will always be able to go back to a stable solution!

    Congrats on completing the passthrough! This project will serve as an extremely useful starting point for the following (more interesting) applications. The first one we will build is an . But first, let's talk about some key issues in real-time DSP programming.

    Solutions

    Are you sure you are ready to see the solution? ;)

    Here you are asked to modify the macros and change the string

    to be either

    or

    The table 6 section 8.6.3 of the DAC says: LOW = mute off, HIGH = mute on. We will thus define the following macros:

    In the same way as we did for the DAC, we will look in the microphone datasheet. The information we are looking for is on page 6 of the datasheet: The Tri-state Control (gray) uses the state of the WS and SELECT inputs to determine if the DATA pin is driven or tri-stated. This allows 2 microphones to operate on a single I2S port. When SELECT=HIGH the DATA pin drives the SDIN bus when WS=HIGH otherwise DATA=tri-state. When SELECT=LOW the DATA pin drives the SDIN bus when WS=LOW otherwise DATA=tri-state. As the WS pin is LOW when the left signal is transmitted (cf. fig. 5 of the DAC datasheet), we will define the macro as following:

    The arithmetic is quite trivial here, and here is a quick recap:

    • a sample is "a value at a certain time for one channel"

    • a frame is "the package of a left and a right sample"

      Thus the buffer has in our case the length SAMPLE_PER_FRAME x FRAME_PER_BUFFER, as every sample has 16 bits (1 half-word) a buffer will be 32x2 half-words long.

    The double buffer size is then 128 values.

    The pass-through is made by copying the input buffer on the output buffer. This is done like so:

    There are always several ways to achieve the same goal in C. Here is a possible solution:

    In the code, we first check the GPIO pin to see which channel the microphone has been assigned to and use the value to offset the input pointer to the first audio sample. Then we simply copy the same audio sample in two consecutive output samples.

    here
    datasheet of the DAC
    I2S protocol
    datasheet of the microphone
    as we explained before
    second DSP module
    callback functions
    before
    here
    alien voice effect
    datasheet
    /* USER CODE BEGIN PV */
    #define SAMPLES_PER_FRAME 2      
    #define FRAMES_PER_BUFFER 32     
    
    #define HALF_BUFFER_SIZE (FRAMES_PER_BUFFER * SAMPLES_PER_FRAME)
    #define FULL_BUFFER_SIZE (2 * HALF_BUFFER_SIZE)
    void inline Process(int16_t *pIn, int16_t *pOut, uint16_t size) {
      for (uint16_t i = 0; i < size; i++)
        *pOut++ = *pIn++;
    }
    void inline Process(int16_t *pIn, int16_t *pOut, uint16_t size) {
        // if using the RIGHT channel, advance the input pointer
        if (HAL_GPIO_ReadPin(LR_SEL_GPIO_Port, LR_SEL_Pin) == GPIO_PIN_SET)
            pIn++;
    
      // advance by two now, since we're duplicating the input
      for (uint16_t i = 0; i < size; i += 2) {
        *pOut++ = *pIn;
        *pOut++ = *pIn;
        pIn += 2;
      }
    }
    #define MUTE HAL_GPIO_WritePin(MUTE_GPIO_Port, MUTE_Pin, XXX);
    #define UNMUTE HAL_GPIO_WritePin(MUTE_GPIO_Port, MUTE_Pin, XXX);
    #define SAMPLES_PER_FRAME 2   /* stereo signal */
    #define FRAMES_PER_BUFFER 32  /* user-defined */
    #define HALF_BUFFER_SIZE (...)
    #define FULL_BUFFER_SIZE (...)
    int16_t dma_in[FULL_BUFFER_SIZE];
    int16_t dma_out[FULL_BUFFER_SIZE];
    void Process(int16_t *pIn, int16_t *pOut, uint16_t size);
    void HAL_I2S_RxHalfCpltCallback(I2S_HandleTypeDef *hi2s) {
    }
    
    void HAL_I2S_RxCpltCallback(I2S_HandleTypeDef *hi2s) {
    }
    
    void HAL_I2S_TxHalfCpltCallback(I2S_HandleTypeDef *hi2s) {
      Process(dma_in, dma_out, HALF_BUFFER_SIZE);
    }
    
    void HAL_I2S_TxCpltCallback(I2S_HandleTypeDef *hi2s) {
      Process(dma_in + HALF_BUFFER_SIZE, dma_out + HALF_BUFFER_SIZE, HALF_BUFFER_SIZE);
    }
    /* 
    ...
    *** DMA mode IO operation ***
    ==============================
    [..] 
    (+) Send an amount of data in non blocking mode (DMA) using HAL_I2S_Transmit_DMA() 
    (+) At transmission end of half transfer HAL_I2S_TxHalfCpltCallback is executed and user can 
    add his own code by customization of function pointer HAL_I2S_TxHalfCpltCallback 
    (+) At transmission end of transfer HAL_I2S_TxCpltCallback is executed and user can 
    add his own code by customization of function pointer HAL_I2S_TxCpltCallback
    (+) Receive an amount of data in non blocking mode (DMA) using HAL_I2S_Receive_DMA() 
    (+) At reception end of half transfer HAL_I2S_RxHalfCpltCallback is executed and user can 
    add his own code by customization of function pointer HAL_I2S_RxHalfCpltCallback 
    (+) At reception end of transfer HAL_I2S_RxCpltCallback is executed and user can 
    add his own code by customization of function pointer HAL_I2S_RxCpltCallback
    (+) In case of transfer Error, HAL_I2S_ErrorCallback() function is executed and user can 
    add his own code by customization of function pointer HAL_I2S_ErrorCallback
    (+) Pause the DMA Transfer using HAL_I2S_DMAPause()
    (+) Resume the DMA Transfer using HAL_I2S_DMAResume()
    (+) Stop the DMA Transfer using HAL_I2S_DMAStop()
    ...
    */
    void inline Process(int16_t *pIn, int16_t *pOut, uint16_t size) {
      // copy input to output
      ...
    }
    // Control of the codec
    UNMUTE
    SET_MIC_LEFT
    
    // Start DMAs
    HAL_I2S_Transmit_DMA(&hi2s1, (uint16_t*) dma_out, FULL_BUFFER_SIZE);
    HAL_I2S_Receive_DMA(&hi2s2, (uint16_t*) dma_in, FULL_BUFFER_SIZE);
    GPIO_PIN_SET_OR_RESET
    GPIO_PIN_SET
    GPIO_PIN_RESET
    /* USER CODE BEGIN Includes */
    
    #define MUTE HAL_GPIO_WritePin(MUTE_GPIO_Port, MUTE_Pin, GPIO_PIN_SET);
    #define UNMUTE HAL_GPIO_WritePin(MUTE_GPIO_Port, MUTE_Pin, GPIO_PIN_RESET);
    
    /* USER CODE END Includes */
    /* USER CODE BEGIN Includes */
    
    #define MUTE HAL_GPIO_WritePin(MUTE_GPIO_Port, MUTE_Pin, GPIO_PIN_SET);
    #define UNMUTE HAL_GPIO_WritePin(MUTE_GPIO_Port, MUTE_Pin, GPIO_PIN_RESET);
    
    #define SET_MIC_RIGHT HAL_GPIO_WritePin(LR_SEL_GPIO_Port, LR_SEL_Pin, GPIO_PIN_SET);
    #define SET_MIC_LEFT HAL_GPIO_WritePin(LR_SEL_GPIO_Port, LR_SEL_Pin, GPIO_PIN_RESET);
    
    /* USER CODE END Includes */