Support encoding to file-like object #754

NicolasHug · 2025-07-05T14:49:10Z

This PR adds the to_file_like() method to the AudioEncoder. This allows users to encode samples into a custom file-like object that supports seek and write methods.

Marking this as draft because there are unresolved points (see below), but this is still ready for a solid first review round.

NicolasHug · 2025-07-06T13:02:48Z

src/torchcodec/_core/ops.py

+
+    # This check is useless but it's critical to keep it to ensures that samples
+    # is still alive during the call to encode_audio_to_file_like.
+    assert samples.is_contiguous()


I hate that we have to do this but I do not see any other obvious way to keep the input samples alive for the duration of the call.
Claude is saying that we could just pass samples as a py::object. We won't be able to turn it back to a tensor (as mentioned in the code comment above), but claude claims that passing it as a parameter will ensure that pybind will keep it alive. I cannot verify this.

@scotts, any thoughts?

On the keep-alive part, I believe Claude is right. If we pass something as a py::object, that gets properly reference-counted which will keep the object alive. When we launder a pointer as an int, there's no reference counting.

Of course, we would ideally just pass the tensor - but we run into problems passing tensors as tensors into the pybind11 code. The next simplest thing that we probably can't do for performance reasons is to copy the tensor into either bytes or a list, and then pass those as py::object. But since samples will be large, I don't think we want to do that.

Most workarounds I can think of are worse. One that might be just as bad, but could potentially apply to both this situation and decoder creation:

On the pybind11 side, we only create the AVIOFileLikeContext. We don't create the encoder or decoder. We do still accept the file-like objects, and they are still stored in the AVIOFileLIkeContext.

We return an int from the C++ side to the Python side where that int is a pointer to the AVIOFileLikeContext.

On the PyTorch custom ops side, we have functions for create-from-file-like and encode-to-file-like that accept the int value and do a reinterpret_cast<AVIOFileLikeContext*> in the C++. Those are then passed to the decoder or encode.

As it is right now, we're doing a lot of ugly pointer casting with tensors. The above may actually be better, as then the pybind11 code is only really concerned with creating AVIOFIleLikeContext objects. It doesn't even need to know about encoders and decoders.

NicolasHug · 2025-07-06T13:21:28Z

src/torchcodec/_core/AVIOFileLikeContext.h

-class AVIOFileLikeContext : public AVIOContextHolder {
+// TODO: explain this. We probably don't want it.
+class __attribute__((visibility("default"))) AVIOFileLikeContext
+    : public AVIOContextHolder {


So, this has been causing some headaches for me.

Forget about the __attribute__((visibility("default"))) for a second.
In this PR, the AVIOFileLikeContext class is now a direct dependency of the Encoder [1]. It means that this file must be compiled along with the libtorchcodec_decoderN library. Before (i.e. in main), this file was only compiled with libtorchcodec_pybind_opsN.

And we know from https://pybind11.readthedocs.io/en/stable/faq.html#someclass-declared-with-greater-visibility-than-the-type-of-its-field-someclass-member-wattributes that this class should be compiled with "hidden" visibility: it holds a py::object fileLike_ attribute, which itself is set to "hidden" by pybind.

But if we were to set AVIOFileLikeContext to "hidden", then we'd have another problem: we'd get the same kind of warning, but for the Encoder, because the Encoder would not be "hidden":

Encoder declared with greater visibility than the type of its field Encoder::avioFileLikeContext_

We can't set the Encoder as hidden, that would make it invisible to custom ops and other things. It just wouldn't work.

[1] See comment below in this PR about the need for 2 different constructors: one for AVIOFileLikeContext, one for AVIOToTensorContext.

So here with __attribute__((visibility("default"))) we are explicitly setting the visibility of AVIOFileLikeContext to "default".
"default" actually means public, i.e. NOT hidden. This override pybind's desire of having the py::object fileLike_ to be hidden, and avoid the related warnings/errors. I do not think this is a great solution though, because IIUC, we're not actually solving the issue, we're mainly silencing a warning by forcing something to be "default"/public when it shouldn't. I think we are thus exposing ourselves to symbol conflicts:

As to why -fvisibility=hidden is necessary, because pybind modules could have been compiled under different versions of pybind itself, it is also important that the symbols defined in one module do not clash with the potentially-incompatible symbols defined in another. While Python extension modules are usually loaded with localized symbols (under POSIX systems typically using dlopen with the RTLD_LOCAL flag), this Python default can be changed, but even if it isn’t it is not always enough to guarantee complete independence of the symbols involved when not using -fvisibility=hidden.

If we don't want to do this (by this I mean setting AVIOFileLikeContext to "default"), then I think we'll need to avoid relying on AVIOFileLikeContext within the Encoder. That means we will only want to interact with the base AVIOContextHolder class, and I think that means we'll need to expose the getOutputTensor as virtual, even know it will only make sense for AVIOToTensorContext.

I hope this makes somewhat sense. I'm still wrapping my head around all this.

@NicolasHug, we might be able to get around this, see my comment below about storing and accepting AVIOContextHolder and doing a dynamic cast to AVIOToTensorContext as needed.

NicolasHug · 2025-07-06T13:23:59Z

src/torchcodec/_core/CMakeLists.txt

-        ${pybind_ops_library_name}
-        PUBLIC
-      "-fvisibility=hidden"
-    )


This can be removed now because the new __attribute__((visibility("default"))) has priority over any compilation flag. See other big comment above.

scotts · 2025-07-17T15:29:11Z

src/torchcodec/_core/pybind_ops.cpp

+      "format",
+      "file_like",
+      "bit_rate",
+      "num_channels");


Oh, wow, we're not listing the parameters for create_from_file_like(). Are there benefits to doing so?

scotts · 2025-07-17T16:55:21Z

src/torchcodec/_core/pybind_ops.cpp

+    std::optional<int64_t> bit_rate = std::nullopt,
+    std::optional<int64_t> num_channels = std::nullopt) {
+  // We assume float32 *and* contiguity, this must be enforced by the caller.
+  auto tensor_options = torch::TensorOptions().dtype(torch::kFloat32);


If we keep this technique, we can probably allow all dtypes by passing in the dtype from the Python side as ints. I assume the Python and C++ enums agree on values, but even if they don't, we can figure out the mapping. Ugly, but possible.

scotts · 2025-07-21T14:56:38Z

src/torchcodec/_core/Encoder.h

  AudioEncoder(
      const torch::Tensor& samples,
      int sampleRate,
      std::string_view formatName,
-      std::unique_ptr<AVIOToTensorContext> avioContextHolder,
+      std::unique_ptr<AVIOToTensorContext> AVIOToTensorContext,


Nit: variable name should start with lower case, maybe avioToTensorContext.

scotts · 2025-07-21T15:07:28Z

src/torchcodec/_core/Encoder.h

-  // Stores the AVIOContext for the output tensor buffer.
-  std::unique_ptr<AVIOToTensorContext> avioContextHolder_;
+  std::unique_ptr<AVIOToTensorContext> avioToTensorContext_;
+  std::unique_ptr<AVIOFileLikeContext> avioFileLikeContext_;



I think a simpler approach is to only store and accept a AVIOContextHolder, but when you need to access AVIOToTensorContext specific methods, do a std::dynamic_cast<AVIOToTensorContext*>. Dynamic casts are also ugly, but it localizes the ugliness to the one place we need it, and avoids having parallel variables and parallel constructors.

A dynamic cast, in C++, can be used to say "I only have a pointer to Base, but I know that this object is actually type Derived: return me a pointer to Derived." If it fails, you get back nullptr.

This might solve the linking-visibility problem above, I'm not sure. The reason I think it might is that we remove AVIOFileLikeContext from the declaration of Encoder. It will only appear inside the implementation.

NicolasHug added 11 commits July 5, 2025 13:10

WIP

4fd3c85

Add tests

8318fbe

Linter

10cdd5b

Renaming

67962d8

Add tests

78276a2

Avoid depending on numpy for bytes conversion

aa10ed1

Use string_view

dfa5bcb

make shape a vec

6adb7dc

dataptr is int64_t

8951870

lifetime management

9b6d9ee

Add contiguity check

b3fc714

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 5, 2025

NicolasHug marked this pull request as draft July 5, 2025 14:49

NicolasHug added 9 commits July 6, 2025 08:46

refac

a78ef8b

refac

fb6e463

mend

6e88ee6

WIP

1c6fad8

WIP

eb1b51d

bypass pybind warning

4d82cbb

Simplify some stuff

88b335a

Add comment

843ff79

Fix timeout

558c8f7

NicolasHug commented Jul 6, 2025

View reviewed changes

NicolasHug marked this pull request as ready for review July 6, 2025 14:51

NicolasHug marked this pull request as draft July 7, 2025 09:14

scotts reviewed Jul 17, 2025

View reviewed changes

scotts reviewed Jul 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support encoding to file-like object #754

Support encoding to file-like object #754

Uh oh!

NicolasHug commented Jul 5, 2025 •

edited

Loading

Uh oh!

NicolasHug Jul 6, 2025 •

edited

Loading

Uh oh!

scotts Jul 17, 2025 •

edited

Loading

Uh oh!

NicolasHug Jul 6, 2025 •

edited

Loading

Uh oh!

scotts Jul 21, 2025

Uh oh!

NicolasHug Jul 6, 2025

Uh oh!

scotts Jul 17, 2025

Uh oh!

scotts Jul 17, 2025

Uh oh!

scotts Jul 21, 2025

Uh oh!

scotts Jul 21, 2025

Uh oh!

Uh oh!

Support encoding to file-like object #754

Are you sure you want to change the base?

Support encoding to file-like object #754

Uh oh!

Conversation

NicolasHug commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug commented Jul 5, 2025 •

edited

Loading

NicolasHug Jul 6, 2025 •

edited

Loading

scotts Jul 17, 2025 •

edited

Loading

NicolasHug Jul 6, 2025 •

edited

Loading