Skip to content

Cherry-picking latest swifttailcc changes #2461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions llvm/docs/CodeGenerator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2064,11 +2064,12 @@ Tail call optimization
----------------------

Tail call optimization, callee reusing the stack of the caller, is currently
supported on x86/x86-64, PowerPC, and WebAssembly. It is performed on x86/x86-64
and PowerPC if:
supported on x86/x86-64, PowerPC, AArch64, and WebAssembly. It is performed on
x86/x86-64, PowerPC, and AArch64 if:

* Caller and callee have the calling convention ``fastcc``, ``cc 10`` (GHC
calling convention), ``cc 11`` (HiPE calling convention), or ``tailcc``.
calling convention), ``cc 11`` (HiPE calling convention), ``tailcc``, or
``swifttailcc``.

* The call is a tail call - in tail position (ret immediately follows call and
ret uses value of call or is void).
Expand All @@ -2093,6 +2094,10 @@ PowerPC constraints:
* On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected)
are supported.

AArch64 constraints:

* No variable argument lists are used.

WebAssembly constraints:

* No variable argument lists are used
Expand Down
36 changes: 22 additions & 14 deletions llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -181,9 +181,13 @@ static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",

STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");

/// Returns the argument pop size.
static uint64_t getArgumentPopSize(MachineFunction &MF,
MachineBasicBlock &MBB) {
// Returns how much of the incoming argument stack area we should clean up in an
// epilogue. For the C calling convention this will be 0, for guaranteed tail
// call conventions it can be positive (a normal return or a tail call to a
// function that uses less stack space for arguments) or negative (for a tail
// call to a function that needs more stack space than us for arguments).
static int64_t getArgumentStackToRestore(MachineFunction &MF,
MachineBasicBlock &MBB) {
MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
bool IsTailCallReturn = false;
if (MBB.end() != MBBI) {
Expand All @@ -197,7 +201,7 @@ static uint64_t getArgumentPopSize(MachineFunction &MF,
}
AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();

uint64_t ArgumentPopSize = 0;
int64_t ArgumentPopSize = 0;
if (IsTailCallReturn) {
MachineOperand &StackAdjust = MBBI->getOperand(1);

Expand Down Expand Up @@ -261,10 +265,10 @@ static unsigned getFixedObjectSize(const MachineFunction &MF,
const AArch64FunctionInfo *AFI, bool IsWin64,
bool IsFunclet) {
if (!IsWin64 || IsFunclet) {
// Only Win64 uses fixed objects, and then only for the function (not
// funclets)
return 0;
return AFI->getTailCallReservedStack();
} else {
assert(AFI->getTailCallReservedStack() == 0 &&
"don't know how guaranteed tail calls might work on Win64");
// Var args are stored here in the primary function.
const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
// To support EH funclets we allocate an UnwindHelp object
Expand Down Expand Up @@ -1644,9 +1648,9 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
}
});

// Initial and residual are named for consistency with the prologue. Note that
// in the epilogue, the residual adjustment is executed first.
uint64_t ArgumentPopSize = getArgumentPopSize(MF, MBB);
// How much of the stack used by incoming arguments this function is expected
// to restore in this particular epilogue.
int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);

// The stack frame should be like below,
//
Expand Down Expand Up @@ -1681,7 +1685,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);

uint64_t AfterCSRPopSize = ArgumentPopSize;
int64_t AfterCSRPopSize = ArgumentStackToRestore;
auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
// We cannot rely on the local stack size set in emitPrologue if the function
// has funclets, as funclets have different local stack size requirements, and
Expand All @@ -1699,8 +1703,10 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
// Converting the last ldp to a post-index ldp is valid only if the last
// ldp's offset is 0.
const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
// If the offset is 0, convert it to a post-index ldp.
if (OffsetOp.getImm() == 0)
// If the offset is 0 and the AfterCSR pop is not actually trying to
// allocate more stack for arguments (in space that an untimely interrupt
// may clobber), convert it to a post-index ldp.
if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0)
convertCalleeSaveRestoreToSPPrePostIncDec(
MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, false);
else {
Expand Down Expand Up @@ -1870,6 +1876,8 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
// assumes the SP is at the same location as it was after the callee-save save
// code in the prologue.
if (AfterCSRPopSize) {
assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
"interrupt may have clobbered");
// Find an insertion point for the first ldp so that it goes before the
// shadow call stack epilog instruction. This ensures that the restore of
// lr from x18 is placed after the restore from sp.
Expand All @@ -1885,7 +1893,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
adaptForLdStOpt(MBB, FirstSPPopI, LastPopI);

emitFrameOffset(MBB, FirstSPPopI, DL, AArch64::SP, AArch64::SP,
StackOffset::getFixed((int64_t)AfterCSRPopSize), TII,
StackOffset::getFixed(AfterCSRPopSize), TII,
MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
}
if (HasWinCFI)
Expand Down
9 changes: 7 additions & 2 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5311,6 +5311,11 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
// can actually shrink the stack.
FPDiff = NumReusableBytes - NumBytes;

// Update the required reserved area if this is the tail call requiring the
// most argument stack space.
if (FPDiff < 0 && FuncInfo->getTailCallReservedStack() < (unsigned)-FPDiff)
FuncInfo->setTailCallReservedStack(-FPDiff);

// The stack pointer must be 16-byte aligned at all times it's used for a
// memory operation, which in practice means at *all* times and in
// particular across call boundaries. Therefore our own arguments started at
Expand All @@ -5322,7 +5327,7 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
// Adjust the stack pointer for the new arguments...
// These operations are automatically eliminated by the prolog/epilog pass
if (!IsSibCall)
Chain = DAG.getCALLSEQ_START(Chain, NumBytes, 0, DL);
Chain = DAG.getCALLSEQ_START(Chain, IsTailCall ? 0 : NumBytes, 0, DL);

SDValue StackPtr = DAG.getCopyFromReg(Chain, DL, AArch64::SP,
getPointerTy(DAG.getDataLayout()));
Expand Down Expand Up @@ -5590,7 +5595,7 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
// we've carefully laid out the parameters so that when sp is reset they'll be
// in the correct location.
if (IsTailCall && !IsSibCall) {
Chain = DAG.getCALLSEQ_END(Chain, DAG.getIntPtrConstant(NumBytes, DL, true),
Chain = DAG.getCALLSEQ_END(Chain, DAG.getIntPtrConstant(0, DL, true),
DAG.getIntPtrConstant(0, DL, true), InFlag, DL);
InFlag = Chain.getValue(1);
}
Expand Down
11 changes: 11 additions & 0 deletions llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,12 @@ class AArch64FunctionInfo final : public MachineFunctionInfo {
/// callee is expected to pop the args.
unsigned ArgumentStackToRestore = 0;

// Space just below incoming stack pointer reserved for arguments being passed
// on the stack during a tail call. This will be the difference between the
// largest tail call argument space needed in this function and what's already
// available by reusing space of incoming arguments.
unsigned TailCallReservedStack = 0;

/// HasStackFrame - True if this function has a stack frame. Set by
/// determineCalleeSaves().
bool HasStackFrame = false;
Expand Down Expand Up @@ -180,6 +186,11 @@ class AArch64FunctionInfo final : public MachineFunctionInfo {
ArgumentStackToRestore = bytes;
}

unsigned getTailCallReservedStack() const { return TailCallReservedStack; }
void setTailCallReservedStack(unsigned bytes) {
TailCallReservedStack = bytes;
}

bool hasCalculatedStackSizeSVE() const { return HasCalculatedStackSizeSVE; }

void setStackSizeSVE(uint64_t S) {
Expand Down
9 changes: 7 additions & 2 deletions llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -888,6 +888,11 @@ bool AArch64CallLowering::lowerTailCall(
// actually shrink the stack.
FPDiff = NumReusableBytes - NumBytes;

// Update the required reserved area if this is the tail call requiring the
// most argument stack space.
if (FPDiff < 0 && FuncInfo->getTailCallReservedStack() < (unsigned)-FPDiff)
FuncInfo->setTailCallReservedStack(-FPDiff);

// The stack pointer must be 16-byte aligned at all times it's used for a
// memory operation, which in practice means at *all* times and in
// particular across call boundaries. Therefore our own arguments started at
Expand Down Expand Up @@ -929,12 +934,12 @@ bool AArch64CallLowering::lowerTailCall(
// sequence start and end here.
if (!IsSibCall) {
MIB->getOperand(1).setImm(FPDiff);
CallSeqStart.addImm(NumBytes).addImm(0);
CallSeqStart.addImm(0).addImm(0);
// End the call sequence *before* emitting the call. Normally, we would
// tidy the frame up after the call. However, here, we've laid out the
// parameters so that when SP is reset, they will be in the correct
// location.
MIRBuilder.buildInstr(AArch64::ADJCALLSTACKUP).addImm(NumBytes).addImm(0);
MIRBuilder.buildInstr(AArch64::ADJCALLSTACKUP).addImm(0).addImm(0);
}

// Now we can add the actual call instruction to the correct basic block.
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1974,7 +1974,7 @@ bool ARMExpandPseudo::ExpandMI(MachineBasicBlock &MBB,
}

auto NewMI = std::prev(MBBI);
for (unsigned i = 1, e = MBBI->getNumOperands(); i != e; ++i)
for (unsigned i = 2, e = MBBI->getNumOperands(); i != e; ++i)
NewMI->addOperand(MBBI->getOperand(i));


Expand Down
100 changes: 79 additions & 21 deletions llvm/lib/Target/ARM/ARMFrameLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,41 @@ ARMFrameLowering::canSimplifyCallFramePseudos(const MachineFunction &MF) const {
return hasReservedCallFrame(MF) || MF.getFrameInfo().hasVarSizedObjects();
}

// Returns how much of the incoming argument stack area we should clean up in an
// epilogue. For the C calling convention this will be 0, for guaranteed tail
// call conventions it can be positive (a normal return or a tail call to a
// function that uses less stack space for arguments) or negative (for a tail
// call to a function that needs more stack space than us for arguments).
static int getArgumentStackToRestore(MachineFunction &MF,
MachineBasicBlock &MBB) {
MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
bool IsTailCallReturn = false;
if (MBB.end() != MBBI) {
unsigned RetOpcode = MBBI->getOpcode();
IsTailCallReturn = RetOpcode == ARM::TCRETURNdi ||
RetOpcode == ARM::TCRETURNri;
}
ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();

unsigned ArgumentPopSize = 0;
if (IsTailCallReturn) {
MachineOperand &StackAdjust = MBBI->getOperand(1);

// For a tail-call in a callee-pops-arguments environment, some or all of
// the stack may actually be in use for the call's arguments, this is
// calculated during LowerCall and consumed here...
ArgumentPopSize = StackAdjust.getImm();
} else {
// ... otherwise the amount to pop is *all* of the argument space,
// conveniently stored in the MachineFunctionInfo by
// LowerFormalArguments. This will, of course, be zero for the C calling
// convention.
ArgumentPopSize = AFI->getArgumentStackToRestore();
}

return ArgumentPopSize;
}

static void emitRegPlusImmediate(
bool isARM, MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI,
const DebugLoc &dl, const ARMBaseInstrInfo &TII, unsigned DestReg,
Expand Down Expand Up @@ -773,7 +808,13 @@ void ARMFrameLowering::emitEpilogue(MachineFunction &MF,
"This emitEpilogue does not support Thumb1!");
bool isARM = !AFI->isThumbFunction();

unsigned ArgRegsSaveSize = AFI->getArgRegsSaveSize();
// Amount of stack space we reserved next next to incoming args for either
// varargs registers or stack arguments in tail calls made by this function.
unsigned ReservedArgStack = AFI->getArgRegsSaveSize();

// How much of the stack used by incoming arguments this function is expected
// to restore in this particular epilogue.
int IncomingArgStackToRestore = getArgumentStackToRestore(MF, MBB);
int NumBytes = (int)MFI.getStackSize();
Register FramePtr = RegInfo->getFrameRegister(MF);

Expand All @@ -787,8 +828,8 @@ void ARMFrameLowering::emitEpilogue(MachineFunction &MF,
DebugLoc dl = MBBI != MBB.end() ? MBBI->getDebugLoc() : DebugLoc();

if (!AFI->hasStackFrame()) {
if (NumBytes - ArgRegsSaveSize != 0)
emitSPUpdate(isARM, MBB, MBBI, dl, TII, NumBytes - ArgRegsSaveSize,
if (NumBytes - ReservedArgStack != 0)
emitSPUpdate(isARM, MBB, MBBI, dl, TII, NumBytes - ReservedArgStack,
MachineInstr::FrameDestroy);
} else {
// Unwind MBBI to point to first LDR / VLDRD.
Expand All @@ -802,7 +843,7 @@ void ARMFrameLowering::emitEpilogue(MachineFunction &MF,
}

// Move SP to start of FP callee save spill area.
NumBytes -= (ArgRegsSaveSize +
NumBytes -= (ReservedArgStack +
AFI->getFPCXTSaveAreaSize() +
AFI->getGPRCalleeSavedArea1Size() +
AFI->getGPRCalleeSavedArea2Size() +
Expand Down Expand Up @@ -874,9 +915,13 @@ void ARMFrameLowering::emitEpilogue(MachineFunction &MF,
if (AFI->getFPCXTSaveAreaSize()) MBBI++;
}

if (ArgRegsSaveSize)
emitSPUpdate(isARM, MBB, MBBI, dl, TII, ArgRegsSaveSize,
if (ReservedArgStack || IncomingArgStackToRestore) {
assert(ReservedArgStack + IncomingArgStackToRestore >= 0 &&
"attempting to restore negative stack amount");
emitSPUpdate(isARM, MBB, MBBI, dl, TII,
ReservedArgStack + IncomingArgStackToRestore,
MachineInstr::FrameDestroy);
}
}

/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
Expand Down Expand Up @@ -2195,31 +2240,39 @@ MachineBasicBlock::iterator ARMFrameLowering::eliminateCallFramePseudoInstr(
MachineBasicBlock::iterator I) const {
const ARMBaseInstrInfo &TII =
*static_cast<const ARMBaseInstrInfo *>(MF.getSubtarget().getInstrInfo());
ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
bool isARM = !AFI->isThumbFunction();
DebugLoc dl = I->getDebugLoc();
unsigned Opc = I->getOpcode();
bool IsDestroy = Opc == TII.getCallFrameDestroyOpcode();
unsigned CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;

assert(!AFI->isThumb1OnlyFunction() &&
"This eliminateCallFramePseudoInstr does not support Thumb1!");

int PIdx = I->findFirstPredOperandIdx();
ARMCC::CondCodes Pred = (PIdx == -1)
? ARMCC::AL
: (ARMCC::CondCodes)I->getOperand(PIdx).getImm();
unsigned PredReg = TII.getFramePred(*I);

if (!hasReservedCallFrame(MF)) {
// Bail early if the callee is expected to do the adjustment. If
// CalleePopAmount is valid but 0 anyway, Amount will be 0 too so it doesn't
// matter if we continue a bit longer.
if (IsDestroy && CalleePopAmount != 0)
return MBB.erase(I);

// If we have alloca, convert as follows:
// ADJCALLSTACKDOWN -> sub, sp, sp, amount
// ADJCALLSTACKUP -> add, sp, sp, amount
MachineInstr &Old = *I;
DebugLoc dl = Old.getDebugLoc();
unsigned Amount = TII.getFrameSize(Old);
unsigned Amount = TII.getFrameSize(*I);
if (Amount != 0) {
// We need to keep the stack aligned properly. To do this, we round the
// amount of space needed for the outgoing arguments up to the next
// alignment boundary.
Amount = alignSPAdjust(Amount);

ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
assert(!AFI->isThumb1OnlyFunction() &&
"This eliminateCallFramePseudoInstr does not support Thumb1!");
bool isARM = !AFI->isThumbFunction();

// Replace the pseudo instruction with a new instruction...
unsigned Opc = Old.getOpcode();
int PIdx = Old.findFirstPredOperandIdx();
ARMCC::CondCodes Pred =
(PIdx == -1) ? ARMCC::AL
: (ARMCC::CondCodes)Old.getOperand(PIdx).getImm();
unsigned PredReg = TII.getFramePred(Old);
if (Opc == ARM::ADJCALLSTACKDOWN || Opc == ARM::tADJCALLSTACKDOWN) {
emitSPUpdate(isARM, MBB, I, dl, TII, -Amount, MachineInstr::NoFlags,
Pred, PredReg);
Expand All @@ -2229,6 +2282,11 @@ MachineBasicBlock::iterator ARMFrameLowering::eliminateCallFramePseudoInstr(
Pred, PredReg);
}
}
} else if (CalleePopAmount != 0) {
// If the calling convention demands that the callee pops arguments from the
// stack, we want to add it back if we have a reserved call frame.
emitSPUpdate(isARM, MBB, I, dl, TII, -CalleePopAmount,
MachineInstr::NoFlags, Pred, PredReg);
}
return MBB.erase(I);
}
Expand Down
Loading