-
Notifications
You must be signed in to change notification settings - Fork 31
Description
A lot of the open issues discussing enhancements of some of the API using AudioFrames could be resolved by using memoryview
, however memoryview
cannot be played or recorded into because the relevant functions also need a rate attached to it.
So if AudioFrame
behaved a bit more like memoryview
, specifically when using slices, we could easily achieve a lot of the discussed functionality without additional unnecessary memory copies.
Proposal: AudioFrames to be able to reference external buffers
- An
AudioFrame
created from the constructor, or frommicrophone.record()
, would allocate their own buffer. - Each
AudioFrame
would also contain a "start" and "end" markers (or a buffer pointer and a length)- These markers are an implementation detail and invisible to the user
- This is similar to how
memoryview
can point to other buffers - Exposing the start and end markers can be tempting, but can make
AudioFrames
harder to understand and it's also not clear how much they can be moved. E.g. as there isn't a way to retrieve the real start and end of the referenced buffer, so these markers could only be used to reduce theAudioFrame
and not increase it
- AudioFrames generated from slices would reference the buffer from the original AudioFrame, and change its internal start and end markers
AudioFrame.copy()
does a make a copy of the buffer
Disadvantages
It might come as a surprised to a user that modifying a slice can change the original AudioFrame
:
original_af = audio.AudioFrame(size=1024)
new_af = original_af[512:]
new_af[0] = 255 # This also change original_af[512] to 255
Alternative
We could have a new class that is essential memoryview
, which can also point to the rate of the original AudioFrame
. This has the advantage that it makes a lot more obvious that are not dealing with a new AudioFrame
with its own copy of the data.
Because getting a different class instance from a slice is a bit weird, rather than use slices we could use a method call.
For example:
audio_frame = audio.AudioFrame(size=1000)
first_half = audio_frame.track(end=500)
second_half = audio_frame.track(start=500)
middle_half = audio_frame.track(start=250, end=750)
AudioFrame nomenclature
As we consider an "AudioTrack" being created from an AudioFrame, it's becoming more obvious that the AudioFrame name doesn't quite fit the current implementation. As a "frame" is generally small, deriving a "track" out of it doesn't make that much sense. The original intent of grouping multiple frames to create longer audio makes more sense than the current implementation of having frames taking several seconds.
Perhaps should we leave AudioFrame as it was implemented in V1, and rename the current expanded version to something along the lines of "AudioRecording" (could be something different, maybe not directly related to recording from the microphone), to which it would make more sense that it could have multiple "tracks".
Use cases
Copying multiple chunks of data into a single AudioFrame
There isn't slice assignment on AudioFrame
, bytearray
, nor memoryview
, and AudioFrame.copyfrom()
always copies data from the beginning of the AudioFrame. So, we have to go byte by byte:
Before
af = audio.AudioFrame(size=(sum([len(c) for c in chunks])))
i = 0
for chunk in chunks:
for byte in chunk:
af[i] = byte
i += 1
After, new AudioFrame
This allows us to copy full chunks in one operation, instead of byte by byte.
af = audio.AudioFrame(size=(sum([len(c) for c in chunks])))
i = 0
for chunk in chunks:
small_af = af[i:]
small_af.copyfrom(chunk)
i += len(chunk)
After, slice assignment
Slice assignment might not be that obvious to novice programmers, but could be an even more succinct option.
af = audio.AudioFrame(size=(sum([len(c) for c in chunks])))
i = 0
for chunk in chunks:
af[i:i+len(buffer)] = chunk
i += len(chunk)
Break down AudioFrame into smaller chunks
The best method for this currently is to use a memoryview
(could also create a bytes
object from the AudioFrame
and slice it, but memoryview
saves copying the data):
Before
af = audio.AudioFrame(duration=1000)
m = memoryview(af)
for i in range(0, len(m), PACKET_SIZE):
radio.send_bytes(m[i:i+PACKER_SIZE])
After
With this approach we could use slices directly on the AudioFrame
without creating unnecessary copies:
af = audio.AudioFrame(duration=1000)
for i in range(0, len(af), PACKET_SIZE):
radio.send(af[i:i+PACKER_SIZE])
Playing an AudioFrame from an arbitrary position
As a memoryview
cannot be played directly, and an AudioFrame
is always played from the beginning, we need to create a new AudioFrame that starts from the point we'd like to playback.
Before
original_af = microphone.record(1000)
memoryview_af = memoryview(af)
shorter_af = audio.AudioFrame(duration=500)
shorter_af.copyfrom(memoryview_af[500:])
audio.play(shorter_af)
After
original_af = microphone.record(1000)
audio.play(shorter_af[500:])
Playing just a portion of the AudioFrame
This works fine in the current implementation, the only thing is that the most common way of doing this would be with sleep()
(instead of time.ticks_ms()
) to measure time, and the CODAL uBit.sleep()
has a resolution of 4ms + any extra overhead from calling functions. So it might not be extremely accurate.
Before
af = microphone.record(2000)
audio.play(af, wait=False)
sleep(1000)
audio.stop()
After
This should accurately play for the specified time
af = microphone.record(2000)
audio.play(af[:len(af)/2])