Skip to content

[include-cleaner] Handle symbols from system headers. #66089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hokein
Copy link
Collaborator

@hokein hokein commented Sep 12, 2023

This is an initial version -- we add a file-path-based mapping to handle symbols from system headers.

The mapping list is not completed (I just added the most frequent problematic system headers).

I'm looking for initial feedback. In the long run, we might replace the clangd's CanonicalInclude with this one.

@hokein hokein requested a review from kadircet September 12, 2023 14:07
This is the initial version -- we add a file-path-based mapping to handle
sysmbols from system headers.

The mapping list is not completed (I just added the most frequent
problematic system headers).

I'm looking for initial feedback. In the long run, we might replace the
clangd's CanonicalInclude with this one.
@hokein hokein force-pushed the system-include-mapping branch from 83c89f0 to 5aabc90 Compare September 14, 2023 13:11
Copy link
Member

@kadircet kadircet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot, and so sorry for taking so long on getting to this

// The first element is the suffix of the physical file path for the system
// header, the second element is the final verbatim header spelling.
const std::pair<llvm::StringRef, llvm::StringRef> IncludeMappings[] = {
{"bits/types/struct_timeval.h", "<sys/time.h>"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about extracting these into a SystemHeaderSuffixes.inc with entries like SYSTEM_HEADER_DETAIL(bits/types/struct_timeval.h, <sys/time.h>). that way we can have an easy way to inject more mappings going forward for other libraries in the future.

Comment on lines +235 to +249
assert(llvm::all_of(*SystemHeaderMap, [](const auto &KeyAndValue) {
return std::distance(
llvm::sys::path::begin(KeyAndValue.first,
llvm::sys::path::Style::posix),
llvm::sys::path::end(KeyAndValue.first)) <=
MaxSuffixComponents;
}));
// ... and precise.
assert(llvm::any_of(*SystemHeaderMap, [](const auto &KeyAndValue) {
return std::distance(
llvm::sys::path::begin(KeyAndValue.first,
llvm::sys::path::Style::posix),
llvm::sys::path::end(KeyAndValue.first)) ==
MaxSuffixComponents;
}));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can move these assertions into the lambda, no need to perform them every time we access the singleton

MaxSuffixComponents;
}));

Instance.SuffixHeaderMapping = SystemHeaderMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're modifying a global with every call here, in a non-thread safe way.

// A mapping for system headers based on a set of filename rules.
class SystemIncludeMap {
public:
static const SystemIncludeMap &get() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's important that we also make these mappings different per language. most of them will likely be language independent, but at least libc extensions are likely to have different spellings for c vs c++.

moreover these fell more like private -> public mappings we perform via PragmaIncludes, so what about just moving them there? that way we can also initialize the mapping based on language options and also benefit all the other users of pragma includes with similar mappings?

// Returns the overridden verbatim spellings for HeaderPath that can be
// directly included.
llvm::SmallVector<llvm::StringRef>
mapHeader(llvm::StringRef HeaderPath) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about making this one static instead and getting rid of the extra singleton interface?

for (auto It = FoundRange.first; It != FoundRange.second; ++It)
Headers.push_back(It->second);
}
return Headers;;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra ;

static SystemIncludeMap Instance;
static const auto *SystemHeaderMap = [] {
const size_t Size = sizeof(IncludeMappings) / sizeof(IncludeMappings[0]);
auto *HeaderMap = new std::multimap<llvm::StringRef, llvm::StringRef>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have a multipmap here instead of a just StringMap? do we expect to have suffixes that map to multiple umbrella headers?

@@ -188,6 +290,20 @@ llvm::SmallVector<Hinted<Header>> findHeaders(const SymbolLocation &Loc,
OptionalFileEntryRef FE = SM.getFileEntryRefForID(FID);
if (!FE)
return {};

if (SrcMgr::isSystem(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am afraid this check might be overly strict, maybe drop this initially and see if we have too many false-positives?

Results.emplace_back(Header, Hints::PublicHeader);
assert(!Results.empty());
Results.front().Hint |= Hints::PreferredHeader;
// FIXME: should we include the original header as well?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i think we should treat these similar to IWYU private -> public mappings.

// A hand-mainained list of include mappings for system headers.
// The first element is the suffix of the physical file path for the system
// header, the second element is the final verbatim header spelling.
const std::pair<llvm::StringRef, llvm::StringRef> IncludeMappings[] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed offline that this list is mostly for demonstration purposes, but wanted to remind it once again. we probably want to start with at least what we have in clangd, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants