-
Notifications
You must be signed in to change notification settings - Fork 364
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Description
I'm testing BERT with commands python perf_run.py --backends=dynamo --inputs="(512, 128)@int32;(512, 128)@int32" ...
and python perf_run.py --backends=dynamo --inputs="(256, 128)@int32;(256, 128)@int32" ...
. I only changed the inputs here and saved their TRT engines but the two engines have different sizes:
Using --inputs="(256, 128)@int32;(256, 128)@int32":
Number of layers: 1277
Number of inputs: 2
Number of outputs: 2
Input 0: input_ids, shape: (256, 128), dtype: DataType.INT32
Input 1: attention_mask, shape: (256, 128), dtype: DataType.INT32
Output 0: output0, shape: (256, 128, 768), dtype: DataType.FLOAT
Output 1: output1, shape: (256, 768), dtype: DataType.FLOAT
TRT Engine uses: 516.5105247497559 Mb of Memory
Using --inputs="(512, 128)@int32;(512, 128)@int32":
Number of layers: 1277
Number of inputs: 2
Number of outputs: 2
Input 0: input_ids, shape: (512, 128), dtype: DataType.INT32
Input 1: attention_mask, shape: (512, 128), dtype: DataType.INT32
Output 0: output0, shape: (512, 128, 768), dtype: DataType.FLOAT
Output 1: output1, shape: (512, 768), dtype: DataType.FLOAT
TRT Engine uses: 612.4900169372559 Mb of Memory
To Reproduce
Steps to reproduce the behavior:
cd tools/perf
python perf_run.py ...
Expected behavior
Torch-TensorRT Dynamo backend looks faster than Inductor, but a little bit slower than ONNX path. I was wondering the reason, so I pulled out the engines and directly ran them. When I check the engines, I found the engine size got larger and larger as inputs increase. However, the engine sizes exported from ONNX path keep the same.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working