Skip to content

[concurrency] Implement the Task allocator as bump-pointer allocator. #34880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 10, 2020

Conversation

eeckstein
Copy link
Contributor

@eeckstein eeckstein commented Nov 27, 2020

The task allocator is implemented by using the new StackAllocator utility.
A StackAllocator performs fast allocation and deallocation of memory by implementing a bump-pointer allocation strategy. In contrast to a pure bump-pointer allocator, it's possible to free memory.
Allocations and deallocations must follow a strict stack discipline.
In general, slabs which become unused are not freed, but reused for subsequent allocations. The first slab is tail allocated to the StackAllocator.

TODO: we could pass an initial pre-allocated first slab to the allocator, which is allocated on the stack or with the parent task's allocator.

rdar://problem/71157018

@eeckstein eeckstein requested a review from rjmccall November 27, 2020 19:14
@rxwei
Copy link
Contributor

rxwei commented Nov 28, 2020

Cool! I was just implementing an allocator in the Swift runtime for use with automatic differentiation — I used llvm::BumpPtrAllocator and it worked fine for me. Is there a reason to implement a new allocator instead of using llvm::BumpPtrAllocator? If so, perhaps AutoDiff can adopt this implementation as well.

@eeckstein
Copy link
Contributor Author

@rxwei There are several differences, most importantly that the task allocator also can free memory (in a stack discipline).
Also, the task allocator uses "inlined" memory for the first slab, which avoid malloc at all if only a small amount of memory is needed.

@eeckstein eeckstein force-pushed the task-allocator branch 2 times, most recently from edfb23b to 5eb6027 Compare November 30, 2020 18:06
@rxwei
Copy link
Contributor

rxwei commented Nov 30, 2020

@rxwei There are several differences, most importantly that the task allocator also can free memory (in a stack discipline).
Also, the task allocator uses "inlined" memory for the first slab, which avoid malloc at all if only a small amount of memory is needed.

These features would be needed for AutoDiff as well. I’m interested in using this to replace the allocator in #34886. Do you think it can be moved to a header file that we can import?

@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Dec 2, 2020

Performance: -O

Regression OLD NEW DELTA RATIO
UTF8Decode_InitFromBytes_ascii_as_ascii 449 533 +18.7% 0.84x (?)
 
Improvement OLD NEW DELTA RATIO
UTF8Decode_InitFromData_ascii_as_ascii 800 642 -19.7% 1.25x (?)

Code size: -O

Performance: -Osize

Improvement OLD NEW DELTA RATIO
FlattenListLoop 2503 1631 -34.8% 1.53x (?)
FlattenListFlatMap 5790 4143 -28.4% 1.40x (?)

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@eeckstein
Copy link
Contributor Author

@swift-ci smoke test

@eeckstein
Copy link
Contributor Author

@rxwei makes sense. I moved it to a separate file.

/// Reserved for the use of the task-local stack allocator and it's first
/// slab.
/// TODO: decide what's the optimal size, e.g. based on typical memory needs.
void *AllocatorPrivate[64];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every task, including child tasks, will have these 64 words of storage. I had thought of child tasks as being fairly lightweight abstractions, but this makes them significantly larger. Is that okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't every task need at least one context allocation? If so, then this inline-buffer avoids an malloc call.

The size is TBD, though. 64 is just an initial random number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"its"

I think this initial slab buffer should not be in the main AsyncTask structure, which is ABI. If you want to have swift_task_create preallocate some space and then hand it off to the allocator during initialization, that seems fine, but you should figure out a runtime-private way of doing it. That does mean it won't be tail to the allocator, but I think you should be able to deal with that.

@eeckstein
Copy link
Contributor Author

@swift-ci smoke test linux

/// Reserved for the use of the task-local stack allocator and it's first
/// slab.
/// TODO: decide what's the optimal size, e.g. based on typical memory needs.
void *AllocatorPrivate[64];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"its"

I think this initial slab buffer should not be in the main AsyncTask structure, which is ABI. If you want to have swift_task_create preallocate some space and then hand it off to the allocator during initialization, that seems fine, but you should figure out a runtime-private way of doing it. That does mean it won't be tail to the allocator, but I think you should be able to deal with that.

@@ -0,0 +1,76 @@
//===--- TaskStatus.cpp - Unit tests for the task-status API --------------===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskAlloc.cpp

/// A single linked list of all allocated slaps.
Slab *next = nullptr;

// Capacity and offset do not include this header fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these

/// This struct is actually just the slab header. The slab buffer is tail
/// allocated after Slab.
struct Slab {
/// A single linked list of all allocated slaps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slabs


// No space in the next slab. Although it's empty, the size exceeds its
// capacity.
// As we have to allocate a new slab anyway, free all successor slaps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slabs


// Return a slab which is suitable to allocate \p size memory.
Slab *getSlabForAllocation(size_t size) {
Slab *slab = (lastAllocation ? lastAllocation->slab : getFirstSlab());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the linked list of slabs starts at the pre-allocated slab, and then next points to the slab allocated after the last, right? So starting at lastAllocation->slab means you actually don't check to see if there's any space in any of the older slabs. Is that intentional? It's certainly possible that you have plenty of space for small allocations in older slabs but you needed a big new slab for the last one. But maybe not looking in them is the right trade-off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjmccall Yes, this is intention. As you said, it's a trade-off.

@eeckstein
Copy link
Contributor Author

I pushed a new version, which now does not use a pre-allocated first slab - to keep the task allocation small.
We can improve this in folllow-up work, e.g. by providing stack-allocated space or space allocated with the parent task's allocator.

@eeckstein
Copy link
Contributor Author

@swift-ci test

@swift-ci
Copy link
Contributor

swift-ci commented Dec 8, 2020

Build failed
Swift Test OS X Platform
Git Sha - 8220224b5b854d4c0e7357ef268d67099732ac12

@swift-ci
Copy link
Contributor

swift-ci commented Dec 8, 2020

Build failed
Swift Test Linux Platform
Git Sha - 8220224b5b854d4c0e7357ef268d67099732ac12

@eeckstein
Copy link
Contributor Author

@swift-ci test

@swift-ci
Copy link
Contributor

swift-ci commented Dec 8, 2020

Build failed
Swift Test Linux Platform
Git Sha - 635a89ca02067344409c76afe16c07e9181efac0

@swift-ci
Copy link
Contributor

swift-ci commented Dec 8, 2020

Build failed
Swift Test OS X Platform
Git Sha - 635a89ca02067344409c76afe16c07e9181efac0

A StackAllocator performs fast allocation and deallocation of memory by implementing a bump-pointer allocation strategy.
In contrast to a pure bump-pointer allocator, it's possible to free memory.
Allocations and deallocations must follow a strict stack discipline.

In general, slabs which become unused are _not_ freed, but reused for subsequent allocations.
The first slab can be placed into pre-allocated memory.
Use the StackAllocator as task allocator.

TODO: we could pass an initial pre-allocated first slab to the allocator, which is allocated on the stack or with the parent task's allocator.

rdar://problem/71157018
@eeckstein
Copy link
Contributor Author

@swift-ci test

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eeckstein eeckstein merged commit ad58de3 into swiftlang:main Dec 10, 2020
@eeckstein eeckstein deleted the task-allocator branch December 10, 2020 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants