From b4f901bae4e1a4b30446a20258915b442007554b Mon Sep 17 00:00:00 2001
From: Pavol Vaskovic <pali@pali.sk>
Date: Mon, 5 Nov 2018 18:00:28 +0100
Subject: [PATCH 1/5] [benchmark] Naming Convention

New benchmark naming convention for better readability and improved naming system that accounts for performance coverage growth going forward.
---
 benchmark/Naming.md                        | 148 +++++++++++++++++++++
 benchmark/scripts/Benchmark_Driver         |  21 ++-
 benchmark/scripts/test_Benchmark_Driver.py |  26 +++-
 3 files changed, 183 insertions(+), 12 deletions(-)
 create mode 100644 benchmark/Naming.md
diff --git a/benchmark/Naming.md b/benchmark/Naming.md
new file mode 100644
index 0000000000000..b8546e90eb0be
--- /dev/null
+++ b/benchmark/Naming.md
@@ -0,0 +1,148 @@
+# Naming Convention
+
+Historically, benchmark names in the Swift Benchmark Suite were derived from the
+name of the `runFunction`, which by convention started with prefix `run_`,
+followed by the benchmark name. Therefore most of the legacy benchmark names
+conform to the [`UpperCamelCase`](http://bit.ly/UpperCamelCase) convention.
+After introduction of
+[`BenchmarkInfo`](https://github.com/apple/swift/pull/12048)
+to describe the benchmark metadata, names can be any string. To create more
+cohesive and well structured system, names of newly added benchmarks should meet
+the following set of requirements:
+
+<ul>
+<li>
+<!-- The <li> content with <details> is pre-formatted as HTML, to work around
+Markdown renderer interrupting the paragraph, which creates an ugly gap. -->
+<strong>Letters, numbers and dots.</strong> Start with short unique name in
+<code>UpperCamelCase</code>.
+For a family of benchmarks, separate the name components with periods.
+<details>
+
+Very long compound names using `UpperCamelCase` are hard to read. Use `.` to
+increase readability and structure.
+
+Prefer unique and creative name to nondescript generic term, unless the
+benchmark is testing individual method on a concrete type.
+
+````
+⛔️ Dictionary2
+✅ AngryPhonebook
+✅ Dictionary.AnyHashable.String.update
+✅ Array.append.Array.Int
+````
+
+Benchmark names are used to run individual tests when passed as command line
+arguments to the benchmark driver. Special characters that could be interpreted
+by the shell would require quoting. Stick to ASCII letters, numbers and period.
+Exceptionally:
+
+* Use **`-`** only to denote control flow constructs like `for-in` or `if-let`.
+* Use **`!`** and **`?`** for optional types, conditional or forced downcasting
+and optional chaining.
+
+````
+✅ OCB.NSArray.AnyObject.as?.Array.NSString
+✅ OCB.NSArray.AnyObject.as!.Array.String
+✅ Array.append.Array.Int?
+✅ Flatten.Array.Tuple4.for-in.reserved
+````
+</details><p><!-- spacer --></p></li>
+<li>
+<strong>Use groups and variants</strong> to structure the benchmark family by
+degrees of freedom (e.g. different types implementing a protocol, value vs.
+reference types etc.). Use <strong>numbered suffixes</strong> to denote
+differently sized variants.
+<details>
+
+Benchmarks in a family can be grouped by the tested operation, method or varied
+by types and different workload sizes. It might be necessary to abbreviate some names to fit the size limit, based on the longest combination. Choose consistent names for the components throughout all members in the family, to allow for relative comparison across the different axis of variation.
+
+````
+✅ Seq.dropFirst.Array
+✅ Seq.dropLast.Range.lazy
+✅ Seq.dropWhile.UnfoldSeq
+✅ Seq.prefix.AnySeq.RangeIter.lazy
+✅ Seq.prefixWhile.AnyCol.Array
+✅ Seq.suffix.AnySeq.UnfoldSeq.lazy
+
+✅ Existential.Array.ConditionalShift.Ref1
+✅ Existential.Array.Mutating.Ref2
+✅ Existential.Array.method.1x.Ref3
+✅ Existential.Array.method.2x.Ref4
+✅ Existential.Array.Shift.Val0
+✅ Existential.MutatingAndNonMutating.Val1
+✅ Existential.Mutating.Val2
+✅ Existential.method.1x.Val3
+✅ Existential.method.2x.Val4
+✅ Existential.Pass2.method.1x.Ref1
+✅ Existential.Pass2.method.2x.Ref2
+
+✅ Set.isSubset.Int25
+✅ Set.symmetricDifference.Int50
+````
+
+</details><p><!-- spacer --></p></li>
+<li>
+<strong>Groups and types are <code>UpperCase</code>, methods are
+<code>lowerCase</code>.</strong>
+<details>
+
+Use periods to separate the name components in variants derived from specialised
+generic types or significant method chains.
+
+````
+⛔️ InsertCharacterStartIndex
+⛔️ InsertCharacterTowardsEndIndexNonASCII
+````
+
+There's no need to be literal with type names. **Be descriptive**:
+
+````
+✅ Flatten.Array.Tuple4.lazy.flatMap
+✅ String.insert.ASCIIChar.StartIndex
+✅ String.insert.EmojiChar.NearEnd
+````
+
+</details><p><!-- spacer --></p></li>
+<li>
+<strong>Keep it short.</strong> 40 characters at most. Abbreviate if necessary!
+<details>
+
+Benchmarking results are reported on GitHub and very long names are causing
+horizontal table scrolling which unfortunately obscures the columns with actual
+measurements. Fixed upper size limit also helps with the formatted console
+output, when measuring locally. *It is more important for benchmark's name to be
+unique and short, than overly descriptive.*
+
+Prefer concise names for potential benchmark family extensions. Leave out the
+nested types from variants if they aren't strictly necessary for disambiguation.
+If there's potentially valuable future variant, like testing `ContinuousArray`,
+keep the `Array` now, allowing for addition of `ContArr` variants later.
+
+Use **`Val`** and **`Ref`** as short descriptors for variants that compare value
+types (`struct`, `Int`) with reference types (often named with `Class` in the
+legacy-style).
+Prefer **`Char`** to `Character`, which can be combined with codepage or
+language prefix/suffix when necessary (`ASCIIChar`). For benchmarks that measure
+`String`'s Unicode performance for various languages, use
+[two letter codes](https://en.wikipedia.org/wiki/ISO_639-1) instead of spelling
+out the whole language names.
+
+In a pinch, even C-style naming with two letter prefixes `OC` (for Objective-C)
+or abbreviations like  `Str` and `Arr` are OK, if it helps to fit a system with
+descriptive names into 40 characters:
+
+````
+✅ CharCount.ContArr.Str.kr
+✅ Seq.prefixWhile.AnySeq.UnfoldSeq.lazy
+````
+
+As a last resort, use *numbered suffixes* to disambiguate between benchmarks
+with minor implementation variations.
+
+</details></li>
+</ul>
+
+Technically, the benchmark's name must match the following regular expression:
+`[A-Z][a-zA-Z0-9\-\.!?]+`
diff --git a/benchmark/scripts/Benchmark_Driver b/benchmark/scripts/Benchmark_Driver
index 3e9cf0fe8225a..961296d3f6405 100755
--- a/benchmark/scripts/Benchmark_Driver
+++ b/benchmark/scripts/Benchmark_Driver
@@ -293,7 +293,7 @@ class BenchmarkDoctor(object):
         self.log.addHandler(self.console_handler)
         self.log.debug('Checking tests: %s', ', '.join(self.driver.tests))
         self.requirements = [
-            self._name_matches_capital_words_convention,
+            self._name_matches_benchmark_naming_convention,
             self._name_is_at_most_40_chars_long,
             self._no_setup_overhead,
             self._reasonable_setup_time,
@@ -305,20 +305,29 @@ class BenchmarkDoctor(object):
         """Unregister handler on exit."""
         self.log.removeHandler(self.console_handler)
 
-    capital_words_re = re.compile('[A-Z][a-zA-Z0-9]+')
+    benchmark_naming_convention_re = re.compile(r'[A-Z][a-zA-Z0-9\-\.!?]+')
+    camel_humps_re = re.compile(r'[a-z][A-Z]')
 
     @staticmethod
-    def _name_matches_capital_words_convention(measurements):
+    def _name_matches_benchmark_naming_convention(measurements):
         name = measurements['name']
-        match = BenchmarkDoctor.capital_words_re.match(name)
+        match = BenchmarkDoctor.benchmark_naming_convention_re.match(name)
         matched = match.group(0) if match else ''
+        composite_words = len(BenchmarkDoctor.camel_humps_re.findall(name)) + 1
 
         if name != matched:
             BenchmarkDoctor.log_naming.error(
-                "'%s' name doesn't conform to UpperCamelCase convention.",
+                "'%s' name doesn't conform to benchmark naming convention.",
                 name)
             BenchmarkDoctor.log_naming.info(
-                'See http://bit.ly/UpperCamelCase')
+                'See http://bit.ly/BenchmarkNaming')
+
+        if composite_words > 4:
+            BenchmarkDoctor.log_naming.warning(
+                "'%s' name is composed of %d words.", name, composite_words)
+            BenchmarkDoctor.log_naming.info(
+                "Split '%s' name into dot-separated groups and variants. "
+                "See http://bit.ly/BenchmarkNaming", name)
 
     @staticmethod
     def _name_is_at_most_40_chars_long(measurements):
diff --git a/benchmark/scripts/test_Benchmark_Driver.py b/benchmark/scripts/test_Benchmark_Driver.py
index f3adeb6fa7edd..d1e43530a2be4 100644
--- a/benchmark/scripts/test_Benchmark_Driver.py
+++ b/benchmark/scripts/test_Benchmark_Driver.py
@@ -509,10 +509,14 @@ def test_measure_10_independent_1s_benchmark_series(self):
              'Measuring B1, 5 x i1 (2048 samples), 5 x i2 (2048 samples)'],
             self.logs['debug'])
 
-    def test_benchmark_name_matches_capital_words_conventions(self):
+    def test_benchmark_name_matches_naming_conventions(self):
         driver = BenchmarkDriverMock(tests=[
             'BenchmarkName', 'CapitalWordsConvention', 'ABBRName',
-            'wrongCase', 'Wrong_convention'])
+            'TooManyCamelCaseHumps',
+            'Existential.Array.method.1x.Val4',
+            'Flatten.Array.Array.Str.for-in.reserved',
+            'Flatten.Array.String?.as!.NSArray',
+            'wrongCase', 'Wrong_convention', 'Illegal._$%[]<>{}@^()'])
         with captured_output() as (out, _):
             doctor = BenchmarkDoctor(self.args, driver)
             doctor.check()
@@ -522,12 +526,22 @@ def test_benchmark_name_matches_capital_words_conventions(self):
         self.assertNotIn('BenchmarkName', output)
         self.assertNotIn('CapitalWordsConvention', output)
         self.assertNotIn('ABBRName', output)
+        self.assertNotIn('Existential.Array.method.1x.Val4', output)
+        self.assertNotIn('Flatten.Array.Array.Str.for-in.reserved', output)
+        self.assertNotIn('Flatten.Array.String?.as!.NSArray', output)
+        err_msg = " name doesn't conform to benchmark naming convention."
         self.assert_contains(
-            ["'wrongCase' name doesn't conform to UpperCamelCase convention.",
-             "'Wrong_convention' name doesn't conform to UpperCamelCase "
-             "convention."], self.logs['error'])
+            ["'wrongCase'" + err_msg, "'Wrong_convention'" + err_msg,
+             "'Illegal._$%[]<>{}@^()'" + err_msg], self.logs['error'])
         self.assert_contains(
-            ['See http://bit.ly/UpperCamelCase'], self.logs['info'])
+            ["'TooManyCamelCaseHumps' name is composed of 5 words."],
+            self.logs['warning'])
+        self.assert_contains(
+            ['See http://bit.ly/BenchmarkNaming'], self.logs['info'])
+        self.assert_contains(
+            ["Split 'TooManyCamelCaseHumps' name into dot-separated groups "
+             "and variants. See http://bit.ly/BenchmarkNaming"],
+            self.logs['info'])
 
     def test_benchmark_name_is_at_most_40_chars_long(self):
         driver = BenchmarkDriverMock(tests=[

From bc0064d28586cd4f39463df5a86cc634b295b29d Mon Sep 17 00:00:00 2001
From: Pavol Vaskovic <pali@pali.sk>
Date: Mon, 19 Nov 2018 10:03:02 +0100
Subject: [PATCH 2/5] [benchmark] Simpler naming convention regex

---
 benchmark/Naming.md                | 2 +-
 benchmark/scripts/Benchmark_Driver | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/benchmark/Naming.md b/benchmark/Naming.md
index b8546e90eb0be..fb08d669e499a 100644
--- a/benchmark/Naming.md
+++ b/benchmark/Naming.md
@@ -145,4 +145,4 @@ with minor implementation variations.
 </ul>
 
 Technically, the benchmark's name must match the following regular expression:
-`[A-Z][a-zA-Z0-9\-\.!?]+`
+`[A-Z][a-zA-Z0-9\-.!?]+`
diff --git a/benchmark/scripts/Benchmark_Driver b/benchmark/scripts/Benchmark_Driver
index 961296d3f6405..0eec383a36992 100755
--- a/benchmark/scripts/Benchmark_Driver
+++ b/benchmark/scripts/Benchmark_Driver
@@ -305,7 +305,7 @@ class BenchmarkDoctor(object):
         """Unregister handler on exit."""
         self.log.removeHandler(self.console_handler)
 
-    benchmark_naming_convention_re = re.compile(r'[A-Z][a-zA-Z0-9\-\.!?]+')
+    benchmark_naming_convention_re = re.compile(r'[A-Z][a-zA-Z0-9\-.!?]+')
     camel_humps_re = re.compile(r'[a-z][A-Z]')
 
     @staticmethod

From e828adc058bf16ec7d086be32c309090ee0b1a04 Mon Sep 17 00:00:00 2001
From: Pavol Vaskovic <pali@pali.sk>
Date: Mon, 19 Nov 2018 10:05:11 +0100
Subject: [PATCH 3/5] [benchmark] Naming: emphasize ABBR are last resort

---
 benchmark/Naming.md | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/benchmark/Naming.md b/benchmark/Naming.md
index fb08d669e499a..ee3732b06e27a 100644
--- a/benchmark/Naming.md
+++ b/benchmark/Naming.md
@@ -33,20 +33,22 @@ benchmark is testing individual method on a concrete type.
 ````
 
 Benchmark names are used to run individual tests when passed as command line
-arguments to the benchmark driver. Special characters that could be interpreted
-by the shell would require quoting. Stick to ASCII letters, numbers and period.
+arguments to the benchmark driver. Stick to ASCII letters, numbers and period.
 Exceptionally:
 
 * Use **`-`** only to denote control flow constructs like `for-in` or `if-let`.
-* Use **`!`** and **`?`** for optional types, conditional or forced downcasting
-and optional chaining.
+* Use **`!`** and **`?`** for optional types, conditional or forced downcasting,
+optional chaining etc.
 
 ````
-✅ OCB.NSArray.AnyObject.as?.Array.NSString
-✅ OCB.NSArray.AnyObject.as!.Array.String
 ✅ Array.append.Array.Int?
 ✅ Flatten.Array.Tuple4.for-in.reserved
+✅ OCB.NSArray.as!.Array.NSString
 ````
+
+Note: Special characters that could be interpreted by the shell require escaping
+(`\!`) or quoting the name, when running such benchmarks individually.
+
 </details><p><!-- spacer --></p></li>
 <li>
 <strong>Use groups and variants</strong> to structure the benchmark family by
@@ -56,7 +58,10 @@ differently sized variants.
 <details>
 
 Benchmarks in a family can be grouped by the tested operation, method or varied
-by types and different workload sizes. It might be necessary to abbreviate some names to fit the size limit, based on the longest combination. Choose consistent names for the components throughout all members in the family, to allow for relative comparison across the different axis of variation.
+by types and different workload sizes. It might be necessary to abbreviate some
+names to fit the size limit, based on the longest combination. Choose consistent
+names for the components throughout all members in the family, to allow for
+relative comparison across the different axis of variation.
 
 ````
 ✅ Seq.dropFirst.Array
@@ -106,7 +111,7 @@ There's no need to be literal with type names. **Be descriptive**:
 
 </details><p><!-- spacer --></p></li>
 <li>
-<strong>Keep it short.</strong> 40 characters at most. Abbreviate if necessary!
+<strong>Keep it short.</strong> 40 characters at most. Abbreviate if necessary.
 <details>
 
 Benchmarking results are reported on GitHub and very long names are causing
@@ -117,7 +122,7 @@ unique and short, than overly descriptive.*
 
 Prefer concise names for potential benchmark family extensions. Leave out the
 nested types from variants if they aren't strictly necessary for disambiguation.
-If there's potentially valuable future variant, like testing `ContinuousArray`,
+If there's potentially valuable future variant, like testing `ContiguousArray`,
 keep the `Array` now, allowing for addition of `ContArr` variants later.
 
 Use **`Val`** and **`Ref`** as short descriptors for variants that compare value
@@ -129,12 +134,13 @@ language prefix/suffix when necessary (`ASCIIChar`). For benchmarks that measure
 [two letter codes](https://en.wikipedia.org/wiki/ISO_639-1) instead of spelling
 out the whole language names.
 
-In a pinch, even C-style naming with two letter prefixes `OC` (for Objective-C)
-or abbreviations like  `Str` and `Arr` are OK, if it helps to fit a system with
-descriptive names into 40 characters:
+*In a pinch*, it's OK to use short C-style naming prefixes like `OCB` (for
+Objective-C Bridging) or consistent abbreviations in the benchmark family like
+`Str` and `Arr`, but *only if* it helps to fit a system with descriptive
+names into 40 characters:
 
 ````
-✅ CharCount.ContArr.Str.kr
+✅ OCB.NSDict.as!.Dictionary.NSString.NSNum
 ✅ Seq.prefixWhile.AnySeq.UnfoldSeq.lazy
 ````
 

From 9857b0df9eeca89d06627a6d87747d51a63b3f40 Mon Sep 17 00:00:00 2001
From: Pavol Vaskovic <pali@pali.sk>
Date: Wed, 5 Dec 2018 14:33:19 +0100
Subject: [PATCH 4/5] [benchmark] Replace OCB with Bridging

Removed the controversial mention of C-style name prefixes.
---
 benchmark/Naming.md | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/benchmark/Naming.md b/benchmark/Naming.md
index ee3732b06e27a..736a12e23faf6 100644
--- a/benchmark/Naming.md
+++ b/benchmark/Naming.md
@@ -43,7 +43,7 @@ optional chaining etc.
 ````
 ✅ Array.append.Array.Int?
 ✅ Flatten.Array.Tuple4.for-in.reserved
-✅ OCB.NSArray.as!.Array.NSString
+✅ Bridging.NSArray.as!.Array.NSString
 ````
 
 Note: Special characters that could be interpreted by the shell require escaping
@@ -97,16 +97,15 @@ Use periods to separate the name components in variants derived from specialised
 generic types or significant method chains.
 
 ````
-⛔️ InsertCharacterStartIndex
 ⛔️ InsertCharacterTowardsEndIndexNonASCII
 ````
 
 There's no need to be literal with type names. **Be descriptive**:
 
 ````
-✅ Flatten.Array.Tuple4.lazy.flatMap
-✅ String.insert.ASCIIChar.StartIndex
 ✅ String.insert.EmojiChar.NearEnd
+✅ String.insert.ASCIIChar.StartIndex
+✅ Flatten.Array.Tuple4.lazy.flatMap
 ````
 
 </details><p><!-- spacer --></p></li>
@@ -129,18 +128,16 @@ Use **`Val`** and **`Ref`** as short descriptors for variants that compare value
 types (`struct`, `Int`) with reference types (often named with `Class` in the
 legacy-style).
 Prefer **`Char`** to `Character`, which can be combined with codepage or
-language prefix/suffix when necessary (`ASCIIChar`). For benchmarks that measure
-`String`'s Unicode performance for various languages, use
+language prefix/suffix when necessary (`EmojiChar`, `ASCIIChar`). For benchmarks
+that measure `String`'s Unicode performance for various languages, use
 [two letter codes](https://en.wikipedia.org/wiki/ISO_639-1) instead of spelling
 out the whole language names.
 
-*In a pinch*, it's OK to use short C-style naming prefixes like `OCB` (for
-Objective-C Bridging) or consistent abbreviations in the benchmark family like
-`Str` and `Arr`, but *only if* it helps to fit a system with descriptive
-names into 40 characters:
+When necessary, use *consistent* abbreviations like `Str` and `Arr` within the
+benchmark family, to fit a system with descriptive names into 40 characters:
 
 ````
-✅ OCB.NSDict.as!.Dictionary.NSString.NSNum
+✅ Bridging.NSDict.as!.Dict.NSString.NSNum
 ✅ Seq.prefixWhile.AnySeq.UnfoldSeq.lazy
 ````
 

From 419ec8b238364439eda5527bb6e1bcfdf92b3451 Mon Sep 17 00:00:00 2001
From: Pavol Vaskovic <pali@pali.sk>
Date: Thu, 13 Dec 2018 09:49:26 +0100
Subject: [PATCH 5/5] [benchmark] Use for ISO 639-3 language variants

Matched the part of the ISO standard used for language code to match one used on https://www.unicode.org/udhr/translations.html

(I plan to use the Universal Declaration of Human Rights as corpus for string testing.)
---
 benchmark/Naming.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/benchmark/Naming.md b/benchmark/Naming.md
index 736a12e23faf6..a538473be9eef 100644
--- a/benchmark/Naming.md
+++ b/benchmark/Naming.md
@@ -130,8 +130,8 @@ legacy-style).
 Prefer **`Char`** to `Character`, which can be combined with codepage or
 language prefix/suffix when necessary (`EmojiChar`, `ASCIIChar`). For benchmarks
 that measure `String`'s Unicode performance for various languages, use
-[two letter codes](https://en.wikipedia.org/wiki/ISO_639-1) instead of spelling
-out the whole language names.
+the [three letter codes](https://en.wikipedia.org/wiki/ISO_639-3) instead of
+spelling out the whole language names.
 
 When necessary, use *consistent* abbreviations like `Str` and `Arr` within the
 benchmark family, to fit a system with descriptive names into 40 characters: