-
Notifications
You must be signed in to change notification settings - Fork 48
Support encoding to file-like object #754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
4fd3c85
8318fbe
10cdd5b
67962d8
78276a2
aa10ed1
dfa5bcb
6adb7dc
8951870
9b6d9ee
b3fc714
a78ef8b
fb6e463
6e88ee6
1c6fad8
eb1b51d
4d82cbb
88b335a
843ff79
558c8f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -66,6 +66,7 @@ function(make_torchcodec_libraries | |
set(decoder_sources | ||
AVIOContextHolder.cpp | ||
AVIOTensorContext.cpp | ||
AVIOFileLikeContext.cpp | ||
FFMPEGCommon.cpp | ||
Frame.cpp | ||
DeviceInterface.cpp | ||
|
@@ -142,15 +143,6 @@ function(make_torchcodec_libraries | |
"${pybind_ops_sources}" | ||
"${pybind_ops_dependencies}" | ||
) | ||
# pybind11 limits the visibility of symbols in the shared library to prevent | ||
# stray initialization of py::objects. The rest of the object code must | ||
# match. See: | ||
# https://pybind11.readthedocs.io/en/stable/faq.html#someclass-declared-with-greater-visibility-than-the-type-of-its-field-someclass-member-wattributes | ||
target_compile_options( | ||
${pybind_ops_library_name} | ||
PUBLIC | ||
"-fvisibility=hidden" | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can be removed now because the new |
||
# If we don't make sure this flag is set, we run into segfauls at import | ||
# time on Mac. See: | ||
# https://github.com/pybind/pybind11/issues/3907#issuecomment-1170412764 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
#pragma once | ||
#include <torch/types.h> | ||
#include "src/torchcodec/_core/AVIOFileLikeContext.h" | ||
#include "src/torchcodec/_core/AVIOTensorContext.h" | ||
#include "src/torchcodec/_core/FFMPEGCommon.h" | ||
#include "src/torchcodec/_core/StreamOptions.h" | ||
|
@@ -20,13 +21,27 @@ class AudioEncoder { | |
int sampleRate, | ||
std::string_view fileName, | ||
const AudioStreamOptions& audioStreamOptions); | ||
|
||
// We need one constructor for each type of AVIOContextHolder. We can't have a | ||
// single constructor that accepts the base AVIOContextHolder class and hold | ||
// that as attribute, because we are calling the getOutputTensor() method on | ||
// the AVIOToTensorContext, which is not available in the base class. | ||
AudioEncoder( | ||
const torch::Tensor& samples, | ||
int sampleRate, | ||
std::string_view formatName, | ||
std::unique_ptr<AVIOToTensorContext> avioContextHolder, | ||
std::unique_ptr<AVIOToTensorContext> AVIOToTensorContext, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: variable name should start with lower case, maybe |
||
const AudioStreamOptions& audioStreamOptions); | ||
|
||
AudioEncoder( | ||
const torch::Tensor& samples, | ||
int sampleRate, | ||
std::string_view formatName, | ||
std::unique_ptr<AVIOFileLikeContext> AVIOFileLikeContext, | ||
const AudioStreamOptions& audioStreamOptions); | ||
|
||
void encode(); | ||
|
||
torch::Tensor encodeToTensor(); | ||
|
||
private: | ||
|
@@ -49,8 +64,8 @@ class AudioEncoder { | |
|
||
const torch::Tensor samples_; | ||
|
||
// Stores the AVIOContext for the output tensor buffer. | ||
std::unique_ptr<AVIOToTensorContext> avioContextHolder_; | ||
std::unique_ptr<AVIOToTensorContext> avioToTensorContext_; | ||
std::unique_ptr<AVIOFileLikeContext> avioFileLikeContext_; | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a simpler approach is to only store and accept a A dynamic cast, in C++, can be used to say "I only have a pointer to This might solve the linking-visibility problem above, I'm not sure. The reason I think it might is that we remove |
||
bool encodeWasCalled_ = false; | ||
}; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -153,6 +153,59 @@ def create_from_file_like( | |
return _convert_to_tensor(_pybind_ops.create_from_file_like(file_like, seek_mode)) | ||
|
||
|
||
def encode_audio_to_file_like( | ||
samples: torch.Tensor, | ||
sample_rate: int, | ||
format: str, | ||
file_like: Union[io.RawIOBase, io.BufferedIOBase], | ||
bit_rate: Optional[int] = None, | ||
num_channels: Optional[int] = None, | ||
) -> None: | ||
"""Encode audio samples to a file-like object. | ||
|
||
Args: | ||
samples: Audio samples tensor | ||
sample_rate: Sample rate in Hz | ||
format: Audio format (e.g., "wav", "mp3", "flac") | ||
file_like: File-like object that supports write() and seek() methods | ||
bit_rate: Optional bit rate for encoding | ||
num_channels: Optional number of output channels | ||
""" | ||
assert _pybind_ops is not None | ||
|
||
if samples.dtype != torch.float32: | ||
raise ValueError(f"samples must have dtype torch.float32, got {samples.dtype}") | ||
|
||
# We're having the same problem as with the decoder's create_from_file_like: | ||
# We should be able to pass a tensor directly, but this leads to a pybind | ||
# error. In order to work around this, we pass the pointer to the tensor's | ||
# data, and its shape, in order to re-construct it in C++. For this to work: | ||
# - the tensor must be float32 | ||
# - the tensor must be contiguous, which is why we call contiguous(). | ||
# In theory we could avoid this restriction by also passing the strides? | ||
# - IMPORTANT: the input samples tensor and its underlying data must be | ||
# alive during the call. | ||
# | ||
# A more elegant solution would be to cast the tensor into a py::object, but | ||
# casting the py::object backk to a tensor in C++ seems to lead to the same | ||
# pybing error. | ||
|
||
samples = samples.contiguous() | ||
_pybind_ops.encode_audio_to_file_like( | ||
samples.data_ptr(), | ||
list(samples.shape), | ||
sample_rate, | ||
format, | ||
file_like, | ||
bit_rate, | ||
num_channels, | ||
) | ||
|
||
# This check is useless but it's critical to keep it to ensures that samples | ||
# is still alive during the call to encode_audio_to_file_like. | ||
assert samples.is_contiguous() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I hate that we have to do this but I do not see any other obvious way to keep the input @scotts, any thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the keep-alive part, I believe Claude is right. If we pass something as a Of course, we would ideally just pass the tensor - but we run into problems passing tensors as tensors into the pybind11 code. The next simplest thing that we probably can't do for performance reasons is to copy the tensor into either bytes or a list, and then pass those as Most workarounds I can think of are worse. One that might be just as bad, but could potentially apply to both this situation and decoder creation:
As it is right now, we're doing a lot of ugly pointer casting with tensors. The above may actually be better, as then the pybind11 code is only really concerned with creating AVIOFIleLikeContext objects. It doesn't even need to know about encoders and decoders. |
||
|
||
|
||
# ============================== | ||
# Abstract impl for the operators. Needed by torch.compile. | ||
# ============================== | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,7 +10,9 @@ | |
#include <string> | ||
|
||
#include "src/torchcodec/_core/AVIOFileLikeContext.h" | ||
#include "src/torchcodec/_core/Encoder.h" | ||
#include "src/torchcodec/_core/SingleStreamDecoder.h" | ||
#include "src/torchcodec/_core/StreamOptions.h" | ||
|
||
namespace py = pybind11; | ||
|
||
|
@@ -31,15 +33,58 @@ int64_t create_from_file_like( | |
realSeek = seekModeFromString(seek_mode.value()); | ||
} | ||
|
||
auto avioContextHolder = std::make_unique<AVIOFileLikeContext>(file_like); | ||
auto avioContextHolder = | ||
std::make_unique<AVIOFileLikeContext>(file_like, /*isForWriting=*/false); | ||
|
||
SingleStreamDecoder* decoder = | ||
new SingleStreamDecoder(std::move(avioContextHolder), realSeek); | ||
return reinterpret_cast<int64_t>(decoder); | ||
} | ||
|
||
void encode_audio_to_file_like( | ||
int64_t data_ptr, | ||
const std::vector<int64_t>& shape, | ||
int64_t sample_rate, | ||
std::string_view format, | ||
py::object file_like, | ||
std::optional<int64_t> bit_rate = std::nullopt, | ||
std::optional<int64_t> num_channels = std::nullopt) { | ||
// We assume float32 *and* contiguity, this must be enforced by the caller. | ||
auto tensor_options = torch::TensorOptions().dtype(torch::kFloat32); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we keep this technique, we can probably allow all dtypes by passing in the dtype from the Python side as ints. I assume the Python and C++ enums agree on values, but even if they don't, we can figure out the mapping. Ugly, but possible. |
||
auto samples = torch::from_blob( | ||
reinterpret_cast<void*>(data_ptr), shape, tensor_options); | ||
|
||
// TODO Fix implicit int conversion: | ||
// https://github.com/pytorch/torchcodec/issues/679 | ||
// same for sample_rate parameter below | ||
AudioStreamOptions audioStreamOptions; | ||
audioStreamOptions.bitRate = bit_rate; | ||
audioStreamOptions.numChannels = num_channels; | ||
|
||
auto avioContextHolder = | ||
std::make_unique<AVIOFileLikeContext>(file_like, /*isForWriting=*/true); | ||
|
||
AudioEncoder encoder( | ||
samples, | ||
static_cast<int>(sample_rate), | ||
format, | ||
std::move(avioContextHolder), | ||
audioStreamOptions); | ||
encoder.encode(); | ||
} | ||
|
||
PYBIND11_MODULE(decoder_core_pybind_ops, m) { | ||
m.def("create_from_file_like", &create_from_file_like); | ||
m.def( | ||
"encode_audio_to_file_like", | ||
&encode_audio_to_file_like, | ||
"data_ptr", | ||
"shape", | ||
"sample_rate", | ||
"format", | ||
"file_like", | ||
"bit_rate", | ||
"num_channels"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, wow, we're not listing the parameters for |
||
} | ||
|
||
} // namespace facebook::torchcodec |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this has been causing some headaches for me.
Forget about the
__attribute__((visibility("default")))
for a second.In this PR, the
AVIOFileLikeContext
class is now a direct dependency of theEncoder
[1]. It means that this file must be compiled along with thelibtorchcodec_decoderN
library. Before (i.e. inmain
), this file was only compiled withlibtorchcodec_pybind_opsN
.And we know from https://pybind11.readthedocs.io/en/stable/faq.html#someclass-declared-with-greater-visibility-than-the-type-of-its-field-someclass-member-wattributes that this class should be compiled with "hidden" visibility: it holds a
py::object fileLike_
attribute, which itself is set to "hidden" by pybind.But if we were to set
AVIOFileLikeContext
to "hidden", then we'd have another problem: we'd get the same kind of warning, but for theEncoder
, because theEncoder
would not be "hidden":We can't set the Encoder as hidden, that would make it invisible to custom ops and other things. It just wouldn't work.
[1] See comment below in this PR about the need for 2 different constructors: one for
AVIOFileLikeContext
, one forAVIOToTensorContext
.So here with
__attribute__((visibility("default")))
we are explicitly setting the visibility ofAVIOFileLikeContext
to "default"."default" actually means public, i.e. NOT hidden. This override pybind's desire of having the
py::object fileLike_
to be hidden, and avoid the related warnings/errors. I do not think this is a great solution though, because IIUC, we're not actually solving the issue, we're mainly silencing a warning by forcing something to be "default"/public when it shouldn't. I think we are thus exposing ourselves to symbol conflicts:If we don't want to do this (by this I mean setting
AVIOFileLikeContext
to "default"), then I think we'll need to avoid relying onAVIOFileLikeContext
within theEncoder
. That means we will only want to interact with the baseAVIOContextHolder
class, and I think that means we'll need to expose thegetOutputTensor
as virtual, even know it will only make sense forAVIOToTensorContext
.I hope this makes somewhat sense. I'm still wrapping my head around all this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug, we might be able to get around this, see my comment below about storing and accepting
AVIOContextHolder
and doing a dynamic cast toAVIOToTensorContext
as needed.