In the previous section we implemented a basic granular synthesis voice transformer that lowers the pitch of the input voice. In this section we will address some remaining issues, namely:
implement an effect that raises the pitch of the voice (aka the "Chipmunks" effect)
properly initialize the buffer as a function of the pitch change
optimize the code a little more
The Chipmunks
To raise the pitch of the voice we need to set α to values larger than one. As we have seen, this makes the effect noncausal, which we need to address by introducing some processing delay.
The way to achieve this is to place the audio buffer's input index forward with respect to the output index; let's do this properly by creating an initialization function for the buffer that takes the resampling factor as the input.
TASK 1: Determine the proper initial value for buf_ix when α>1 in the function below.
We can use the blue button on the Nucleo board to switch between Darth Vader and the Chipmunks; to do so, define the following constants at the beginning of the code
In the main processing loop, we are performing two checks on the value of grain_m per output sample. However, in the current implementation, both the stride and the taper lengths are multiples of the size of the DMA half-buffer. This allows us to move these checks outside of the processing loop and perform them once per call rather than once per sample
TASK 2: Modify the VoiceEffect() function to reduce the number of if statements per call. Benchmark the result and observe the change in performance.
Are you ready to see the answers ? :)
We have seen in the previous section that the maximum displacement between current output index and needed input index is D=(α−1)L. Since this value can be non-integer, we round it up to the nearest integer value:
Since the DMA transfer size is an exact divisor of both grain stride and taper length, the boundaries that we check grain_m against can only be crossed at the end of a function call. We can therefore rewrite the function like so:
inline static void VoiceEffect(int16_t *pIn, int16_t *pOut, uint16_t size) {
for (int n = 0; n < size; n += 2) {
buffer[buf_ix++] = pIn[n];
buf_ix &= BUFLEN_MASK;
if (grain_m < TAPER_LEN) {
// we are inside the tapering slope
for (int n = 0; n < size; n += 2) {
int32_t z = Resample(grain_m + GRAIN_STRIDE, prev_ix) * (0x07FFF - TAPER[grain_m]);
z += Resample(grain_m, curr_ix) * TAPER[grain_m];
pOut[n] = pOut[n+1] = (int16_t)(z >> 15);
} else {
for (int n = 0; n < size; n += 2)
pOut[n] = pOut[n+1] = Resample(grain_m++, curr_ix);
// end of stride?
if (grain_m >= GRAIN_STRIDE) {
grain_m = 0;
prev_ix = curr_ix;
curr_ix = (curr_ix + GRAIN_STRIDE) & BUFLEN_MASK;
With this implementation, the computational cost per sample oscillates between 4.4μs and 7.8μs per sample, which represents a saving of almost one microsecond per sample or, equivalently, a performance increase of at least 9%.