CWG3015 Handling of header-names for #include and #embed

tkoeppe · tkoeppe · commit 91b1a21f9ca1 · 2025-06-24T13:20:13.000+01:00
diff --git a/source/lex.tex b/source/lex.tex
@@ -588,16 +588,23 @@
 the next preprocessing token is the longest sequence of
 characters that could constitute a preprocessing token, even if that
 would cause further lexical analysis to fail,
-except that a \grammarterm{header-name}\iref{lex.header} is only formed
+except that
 \begin{itemize}
 \item
-after the \tcode{include} or \tcode{import} preprocessing token in a
-\tcode{\#include}\iref{cpp.include} or
-\tcode{import}\iref{cpp.import} directive, or
-
+a \grammarterm{header-name}\iref{lex.header} is only formed
+\begin{itemize}
 \item
-within a \grammarterm{has-include-expression}.
-
+immediately after the \tcode{include}, \tcode{embed}, or \tcode{import} preprocessing token in a
+\tcode{\#include}\iref{cpp.include}, \tcode{\#embed}\iref{cpp.embed}, or
+\tcode{import}\iref{cpp.import} directive, respectively, or
+\item
+immediately after a preprocessing token sequence of \xname{has_include}
+or \xname{has_embed} immediately followed by \tcode{(}
+in a \tcode{\#if}, \tcode{\#elif}, or \tcode{\#embed} directive\iref{cpp.cond,cpp.embed} and
+\end{itemize}
+\item
+a \grammarterm{string-literal} token is never formed
+when a \grammarterm{header-name} token can be formed.
 \end{itemize}
 \end{itemize}
 
diff --git a/source/preprocessor.tex b/source/preprocessor.tex
@@ -363,7 +363,8 @@
 \indextext{\idxxname{has_embed}}%
 \begin{bnf}
 \nontermdef{has-embed-expression}\br
-    \terminal{\xname{has_embed}} \terminal{(} pp-balanced-token-seq \terminal{)}
+    \terminal{\xname{has_embed}} \terminal{(} header-name \opt{pp-balanced-token-seq} \terminal{)}\br
+    \terminal{\xname{has_embed}} \terminal{(} header-name-tokens \opt{pp-balanced-token-seq} \terminal{)}
 \end{bnf}
 
 \indextext{\idxxname{has_cpp_attribute}}%
@@ -405,17 +406,12 @@
 \tcode{\#undef}
 directive with the same subject identifier), \tcode{0} if it is not.
 
-\pnum
-The second form of \grammarterm{has-include-expression}
-is considered only if the first form does not match,
-in which case the preprocessing tokens are processed just as in normal text.
-
 \pnum
 The header or source file identified by
 the parenthesized preprocessing token sequence
 in each contained \grammarterm{has-include-expression}
 is searched for as if that preprocessing token sequence
-were the \grammarterm{pp-tokens} in a \tcode{\#include} directive,
+were the \grammarterm{pp-tokens} of a \tcode{\#include} directive,
 except that no further macro expansion is performed.
 If such a directive would not satisfy the syntactic requirements
 of a \tcode{\#include} directive, the program is ill-formed.
@@ -424,10 +420,11 @@
 to \tcode{0} if the search fails.
 
 \pnum
-The parenthesized \grammarterm{pp-balanced-token-seq} in each contained
+The parenthesized preprocessing token sequence of each contained
 \grammarterm{has-embed-expression} is processed as if that
-\grammarterm{pp-balanced-token-seq} were the \grammarterm{pp-tokens} in the
-third form of a \tcode{\#embed} directive\iref{cpp.embed}.
+preprocessing token sequence were the \grammarterm{pp-tokens}
+of a \tcode{\#embed} directive\iref{cpp.embed},
+except that no further macro expansion is performed.
 If such a directive would not satisfy the syntactic requirements of a
 \tcode{\#embed} directive, the program is ill-formed.
 The \grammarterm{has-embed-expression} evaluates to:
@@ -686,83 +683,81 @@
 \indextext{\idxcode{\#include}}%
 
 \pnum
-A
-\tcode{\#include}
-directive shall identify a header or source file
-that can be processed by the implementation.
+A \defnadj{header}{search} for a sequence of characters
+searches a sequence of places for a header
+identified uniquely by that sequence of characters.
+How the places are determined or the header identified
+is \impldef{determination of places and identification of headers during header search}.
+
+\pnum
+A \defnadj{source file}{search} for a sequence of characters
+attempts to identify a source file that is named by the sequence of characters.
+The named source file is searched for
+in an \impldef{search for source files during source file search} manner.
+If the implementation does not support a source file search
+for that sequence of characters, or if the search fails,
+the result of the source file search
+is the result of a header search for the same sequence of characters.
 
 \pnum
 A preprocessing directive of the form
 \begin{ncsimplebnf}
-\terminal{\# include <} h-char-sequence \terminal{>} new-line
+\terminal{\# include} header-name new-line
 \end{ncsimplebnf}
-searches a sequence of
-\impldef{sequence of places searched for a header}
-places
-for a header identified uniquely by the specified sequence
-between the
-\tcode{<}
-and
-\tcode{>}
-delimiters,
-and causes the replacement of that
-directive by the entire contents of the header.
-How the places are specified
-or the header identified
-is \impldef{search locations for \tcode{<>} header}.
+causes the replacement of that directive by the entire contents
+of the header or source file identified by \grammarterm{header-name}.
 
 \pnum
-A preprocessing directive of the form
+If the \grammarterm{header-name} is of the form
 \begin{ncsimplebnf}
-\terminal{\# include "} q-char-sequence \terminal{"} new-line
+\terminal{<} h-char-sequence \terminal{>}
 \end{ncsimplebnf}
-causes the replacement of that
-directive by the entire contents of the
-source file identified by the specified sequence between the
-\tcode{"}
-delimiters.
-The named source file is searched for in an
-\impldef{manner of search for included source file}
-manner.
-If this search is not supported,
-or if the search fails,
-the directive is reprocessed as if it read
+a header is identified by a header search for the sequence of characters
+of the \grammarterm{h-char-sequence}.
+
+\pnum
+If the \grammarterm{header-name} is of the form
 \begin{ncsimplebnf}
-\terminal{\# include <} h-char-sequence \terminal{>} new-line
+\terminal{"} q-char-sequence \terminal{"}
 \end{ncsimplebnf}
-with the identical contained sequence (including
-\tcode{>}
-characters, if any) from the original directive.
+the source file or header is identified by a source file search
+for the sequence of characters of the \grammarterm{q-char-sequence}.
+
+\pnum
+If a header search fails, or if a source file search or header search
+identifies a header or source file that cannot be processed by the implementation,
+the program is ill-formed.
+\begin{note}
+If the header or source file cannot be processed,
+the program is ill-formed even when evaluating \xname{has_include}.
+\end{note}
 
 \pnum
 A preprocessing directive of the form
 \begin{ncsimplebnf}
 \terminal{\# include} pp-tokens new-line
 \end{ncsimplebnf}
-(that does not match one of the two previous forms) is permitted.
+(that does not match the previous form) is permitted.
 The preprocessing tokens after
 \tcode{include}
 in the directive are processed just as in normal text
 (i.e., each identifier currently defined as a macro name is replaced by its
 replacement list of preprocessing tokens).
-If the directive resulting after all replacements does not match
-one of the two previous forms, the behavior is
-undefined.
+Then, an attempt is made to form a \grammarterm{header-name}
+preprocessing token\iref{lex.header} from the whitespace and the characters
+of the spellings of the resulting sequence of preprocessing tokens;
+the treatment of whitespace
+is \impldef{treatment of whitespace when processing a \tcode{\#include} directive}.
+If the attempt succeeds, the directive with the so-formed \grammarterm{header-name}
+is processed as specified for the previous form.
+Otherwise, the behavior is undefined.
 \begin{note}
 Adjacent \grammarterm{string-literal}s are not concatenated into
 a single \grammarterm{string-literal}
 (see the translation phases in~\ref{lex.phases});
 thus, an expansion that results in two \grammarterm{string-literal}s is an
 invalid directive.
 \end{note}
-The method by which a sequence of preprocessing tokens between a
-\tcode{<}
-and a
-\tcode{>}
-preprocessing token pair or a pair of
-\tcode{"}
-characters is combined into a single header name
-preprocessing token is \impldef{search locations for \tcode{""""} header}.
 
 \pnum
 The implementation shall provide unique mappings for
@@ -838,35 +833,58 @@
 
 \rSec2[cpp.embed.gen]{General}
 
+\pnum
+A \defnadj{bracket resource}{search} for a sequence of characters
+searches a sequence of places for a resource identified uniquely
+by that sequence of characters.
+How the places are determined or the resource identified
+is \impldef{determination of places and identification of resources during bracket resource search}.
+
+\pnum
+A \defnadj{quote resource}{search} for a sequence of characters
+attempts to identify a resource that is named by the sequence of characters.
+The named resource is searched for
+in an \impldef{search for resources during quote resource search} manner.
+If the implementation does not support a quote resource search
+for that sequence of characters, or if the search fails,
+the result of the quote resource search
+is the result of a bracket resource search for the same sequence of characters.
+
 \pnum
 A preprocessing directive of the form
 \begin{ncsimplebnf}
-\terminal{\# embed <} h-char-sequence \terminal{>} \opt{pp-tokens} new-line
+\terminal{\# embed} header-name \opt{pp-tokens} new-line
 \end{ncsimplebnf}
-searches a sequence of
-\impldef{sequence of places searched for an embedded resource}
-places for a resource identified uniquely by the specified sequence between
-the \tcode{<} and \tcode{>} delimiters.
-How the places are specified or the resource identified is
-\impldef{search locations for embedded resources specified with \tcode{<>}}.
+causes the replacement of that directive
+by preprocessing tokens derived from data
+in the resource identified by \grammarterm{header-name},
+as specified below.
 
 \pnum
-A preprocessing directive of the form
+If the \grammarterm{header-name} is of the form
 \begin{ncsimplebnf}
-\terminal{\# embed "} q-char-sequence \terminal{"} \opt{pp-tokens} new-line
+\terminal{<} h-char-sequence \terminal{>}
 \end{ncsimplebnf}
-searches for a resource identified by the specified sequence between the
-\tcode{"} delimiters.
-The named resource is searched for in an
-\impldef{manner of search for named resource}
-manner.
-If this search is not supported, or if the search fails, the directive is
-reprocessed as if it read
+the resource is identified by a bracket resource search
+for the sequence of characters of the \grammarterm{h-char-sequence}.
+
+\pnum
+If the \grammarterm{header-name} is of the form
 \begin{ncsimplebnf}
-\terminal{\# embed <} h-char-sequence \terminal{>} \opt{pp-tokens} new-line
+\terminal{"} q-char-sequence \terminal{"}
 \end{ncsimplebnf}
-with the identical contained sequence (including \tcode{>} characters, if any)
-from the original directive.
+the resource is identified by a quote resource search
+for the sequence of characters of the \grammarterm{q-char-sequence}.
+
+\pnum
+If a bracket resource search fails,
+or if a quote or bracket resource search identifies a resource
+that cannot be processed by the implementation, the program is ill-formed.
+\begin{note}
+If the resource cannot be processed, the program is ill-formed
+even when processing \tcode{\#embed} with \tcode{limit(0)}\iref{cpp.embed.param.limit}
+or evaluating \xname{has_embed}.
+\end{note}
 
 \pnum
 \recommended A mechanism similar to, but distinct from, the
@@ -987,53 +1005,48 @@
 \begin{ncsimplebnf}
 \terminal{\# embed} pp-tokens new-line
 \end{ncsimplebnf}
-(that does not match one of the two previous forms) is permitted.
+(that does not match the previous form) is permitted.
 The preprocessing tokens after \tcode{embed} in the directive are processed
 just as in normal text (i.e., each identifier currently defined as a macro
 name is replaced by its replacement list of preprocessing tokens).
-The directive resulting after all replacements of the third form shall match
-one of the two previous forms.
+Then, an attempt is made to form a \grammarterm{header-name}
+preprocessing token\iref{lex.header} from the whitespace and the characters
+of the spellings of the resulting sequence of preprocessing tokens immediately after embed;
+the treatment of whitespace
+is \impldef{treatment of whitespace when processing a \tcode{\#embed} directive}.
+If the attempt succeeds, the directive with the so-formed \grammarterm{header-name}
+is processed as specified for the previous form.
+Otherwise, the program is ill-formed.
 \begin{note}
 Adjacent \grammarterm{string-literal}{s} are not concatenated into a single
 \grammarterm{string-literal} (see the translation phases in \iref{lex.phases});
 thus, an expansion that results in two \grammarterm{string-literal}{s} is an
 invalid directive.
 \end{note}
-
-Any further processing as in normal text described for the two previous
-forms is not performed.
+Any further processing as in normal text described for the previous
+form is not performed.
 \begin{note}
 That is, processing as in normal text happens once and only once for the entire
 directive.
 \end{note}
-
 \begin{example}
-If the directive matches the third form, the whole directive is replaced.
-If the directive matches the first two forms, everything after the name is
-replaced.
-
+If the directive matches the second form, the whole directive is replaced.
+If the directive matches the first form, everything after the name is replaced.
 \begin{codeblock}
-#define prefix(ARG) suffix(ARG)
-#define THE_ADDITION "teehee"
-#define THE_RESOURCE ":3c"
-#embed ":3c"        prefix(THE_ADDITION)
-#embed THE_RESOURCE prefix(THE_ADDITION)
+#define EMPTY
+#define X myfile
+#define Y rsc
+#define Z 42
+#embed <myfile.rsc> prefix(Z)
+#embed EMPTY <X.Y>  prefix(Z)
 \end{codeblock}
-
 is equivalent to:
-
 \begin{codeblock}
-#embed ":3c" suffix("teehee")
-#embed ":3c" suffix("teehee")
+#embed <myfile.rsc> prefix(42)
+#embed <myfile.rsc> prefix(42)
 \end{codeblock}
 \end{example}
 
-\pnum
-The method by which a sequence of preprocessing tokens between a \tcode{<} and
-a \tcode{>} preprocessing token pair or a pair of \tcode{"} characters is
-combined into a single resource name preprocessing token is
-\impldef{search locations for \tcode{""""} resource}.
-
 \rSec2[cpp.embed.param]{Embed parameters}
 \rSec3[cpp.embed.param.limit]{limit parameter}
 \pnum
@@ -1783,17 +1796,15 @@
 Otherwise, the original spelling of each preprocessing token in the
 stringizing argument is retained in the character string literal,
 except for special handling for producing the spelling of
-\grammarterm{string-literal}s and \grammarterm{character-literal}s:
-a
-\tcode{\textbackslash}
-character is inserted before each
-\tcode{"}
-and
-\tcode{\textbackslash}
-character of a \grammarterm{character-literal} or \grammarterm{string-literal}
-(including the delimiting
-\tcode{"}
-characters).
+\grammarterm{header-name}s,
+\grammarterm{string-literal}s,
+and \grammarterm{character-literal}s:
+a \tcode{\textbackslash} character is inserted before each
+\tcode{"} and \tcode{\textbackslash} character of a
+\grammarterm{header-name},
+\grammarterm{character-literal},
+or \grammarterm{string-literal}
+(including the delimiting \tcode{"} characters).
 If the replacement that results is not a valid character string literal,
 the behavior is undefined. The character string literal corresponding to
 an empty stringizing argument is \tcode{""}.