Skip to content

AudioFrame proposal: Reference external buffer  #205

@microbit-carlos

Description

@microbit-carlos

A lot of the open issues discussing enhancements of some of the API using AudioFrames could be resolved by using memoryview, however memoryview cannot be played or recorded into because the relevant functions also need a rate attached to it.

So if AudioFrame behaved a bit more like memoryview, specifically when using slices, we could easily achieve a lot of the discussed functionality without additional unnecessary memory copies.

Proposal: AudioFrames to be able to reference external buffers

  • An AudioFrame created from the constructor, or from microphone.record(), would allocate their own buffer.
  • Each AudioFrame would also contain a "start" and "end" markers (or a buffer pointer and a length)
    • These markers are an implementation detail and invisible to the user
    • This is similar to how memoryview can point to other buffers
    • Exposing the start and end markers can be tempting, but can make AudioFrames harder to understand and it's also not clear how much they can be moved. E.g. as there isn't a way to retrieve the real start and end of the referenced buffer, so these markers could only be used to reduce the AudioFrame and not increase it
  • AudioFrames generated from slices would reference the buffer from the original AudioFrame, and change its internal start and end markers
  • AudioFrame.copy() does a make a copy of the buffer

Disadvantages

It might come as a surprised to a user that modifying a slice can change the original AudioFrame:

original_af = audio.AudioFrame(size=1024)
new_af = original_af[512:]
new_af[0] = 255    # This also change original_af[512] to 255

Alternative

We could have a new class that is essential memoryview, which can also point to the rate of the original AudioFrame. This has the advantage that it makes a lot more obvious that are not dealing with a new AudioFrame with its own copy of the data.

Because getting a different class instance from a slice is a bit weird, rather than use slices we could use a method call.
For example:

audio_frame = audio.AudioFrame(size=1000)
first_half = audio_frame.track(end=500)
second_half = audio_frame.track(start=500)
middle_half = audio_frame.track(start=250, end=750)

AudioFrame nomenclature

As we consider an "AudioTrack" being created from an AudioFrame, it's becoming more obvious that the AudioFrame name doesn't quite fit the current implementation. As a "frame" is generally small, deriving a "track" out of it doesn't make that much sense. The original intent of grouping multiple frames to create longer audio makes more sense than the current implementation of having frames taking several seconds.

Perhaps should we leave AudioFrame as it was implemented in V1, and rename the current expanded version to something along the lines of "AudioRecording" (could be something different, maybe not directly related to recording from the microphone), to which it would make more sense that it could have multiple "tracks".

Use cases

Copying multiple chunks of data into a single AudioFrame

There isn't slice assignment on AudioFrame, bytearray, nor memoryview, and AudioFrame.copyfrom() always copies data from the beginning of the AudioFrame. So, we have to go byte by byte:

Before

af = audio.AudioFrame(size=(sum([len(c) for c in chunks])))
i = 0
for chunk in chunks:
    for byte in chunk:
        af[i] = byte
        i += 1

After, new AudioFrame

This allows us to copy full chunks in one operation, instead of byte by byte.

af = audio.AudioFrame(size=(sum([len(c) for c in chunks])))
i = 0
for chunk in chunks:
    small_af = af[i:]
    small_af.copyfrom(chunk)
    i += len(chunk)

After, slice assignment

Slice assignment might not be that obvious to novice programmers, but could be an even more succinct option.

af = audio.AudioFrame(size=(sum([len(c) for c in chunks])))
i = 0
for chunk in chunks:
    af[i:i+len(buffer)] = chunk
    i += len(chunk)

Break down AudioFrame into smaller chunks

The best method for this currently is to use a memoryview (could also create a bytes object from the AudioFrame and slice it, but memoryview saves copying the data):

Before

af = audio.AudioFrame(duration=1000)
m = memoryview(af)
for i in range(0, len(m), PACKET_SIZE):
    radio.send_bytes(m[i:i+PACKER_SIZE])

After

With this approach we could use slices directly on the AudioFrame without creating unnecessary copies:

af = audio.AudioFrame(duration=1000)
for i in range(0, len(af), PACKET_SIZE):
    radio.send(af[i:i+PACKER_SIZE])

Playing an AudioFrame from an arbitrary position

As a memoryview cannot be played directly, and an AudioFrame is always played from the beginning, we need to create a new AudioFrame that starts from the point we'd like to playback.

Before

original_af = microphone.record(1000)
memoryview_af = memoryview(af)
shorter_af = audio.AudioFrame(duration=500)
shorter_af.copyfrom(memoryview_af[500:])
audio.play(shorter_af)

After

original_af = microphone.record(1000)
audio.play(shorter_af[500:])

Playing just a portion of the AudioFrame

This works fine in the current implementation, the only thing is that the most common way of doing this would be with sleep() (instead of time.ticks_ms()) to measure time, and the CODAL uBit.sleep() has a resolution of 4ms + any extra overhead from calling functions. So it might not be extremely accurate.

Before

af = microphone.record(2000)
audio.play(af, wait=False)
sleep(1000)
audio.stop()

After

This should accurately play for the specified time

af = microphone.record(2000)
audio.play(af[:len(af)/2])

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions