Skip to content

Locale display patterns importer #283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

aergonaut
Copy link

This adds support for CLDR's Locale Display Names elements to the gem. It includes:

  1. Importer to import the data from the raw data
  2. Rake task to run the importer
  3. Integration of the importer into standard_importer_classes so it runs automatically
  4. LocaleDisplayName class implementing the Locale Display Name Algorithm

@camertron This is the locale display name stuff I had mentioned wanting to integrate at the SF Ruby meetup late last year. I finally got around to doing a deep-dive into the source and figuring out how to add this!

Background

At my work, we have a lot of languages available for users to choose from, so we need to display a locale selector showing the names of all these languages, both in the target language itself, and in the currently active language. MT isn't perfect for this, and CLDR already contains all the data necessary to derive these display names, so I wanted to leverage officially vetted data as much as possible.

While there are several gems that bundle and re-export CLDR data, I noticed one flaw common to all of them. Most used CLDR's <languages> data, but none used <localeDisplayPattern>. The problem with this is that, while <languages> does contain the translated names of many languages, it usually does not contain translations of regional variants (e.g. fr-CA or pt-BR). Perhaps this was not noticed, because English does have overrides for regional variants as English prefers forms like "Brazilian Portuguese", and overrides do appear <languages>. But many other languages do not use overrides, meaning regional variants are not represented at all in <languages>.

TwitterCldr::Shared::Languages.from_code_for_locale("pt-BR", "en")
#=> "Brazilian Portuguese"

TwitterCldr::Shared::Languages.from_code_for_locale("pt-BR", "de")
#=> nil

In general, locales with region tags or script tags (zh-Hant-HK) will not have specific translations in <languages>. To generate display names for these locales, the <localeDisplayPattern> must be used.

This PR

I was pleased to find, after reading the gem's source, that CldrLocale already handled inheritance correctly, merging data up to the root data set. This is specifically important for <localeDisplayPattern> as many languages require inheritance back to root in order to derive the correct data.

It was relatively easy to implement LocaleDisplayPatternImporter following the example of TerritoriesImporter. I ran the importer for the supported locales and spot-checked the results before committing them.

I made LocaleDisplayName to implement the algorithm from CLDR. I was also happy to find that Locale has a permutations method to give me all the combinations of present subtags, as this is part of the algorithm to find the longest pre-translated subtag for use as the base name.

Combining the exported data and the algorithm, I verified that the output is as expected.

TwitterCldr::Shared::LocaleDisplayName.from_code_for_locale("zh-HK", :en)
#=> "Chinese (Hong Kong SAR China)"

TwitterCldr::Shared::LocaleDisplayName.new("pt-BR").display_name
#=> "Brazilian Portuguese"

TwitterCldr::Shared::LocaleDisplayName.new("pt-BR", :de).display_name
#=> "Portugiesisch (Brasilien)"

I gave LocaleDisplayName some class methods that match the API of Languages, for consistency.

Shortcomings

The gem does not currently support script, variant, or T/U subtags, as far as I could tell. These are all part of the full display name algorithm. Since they were not immediately relevant to my use case (displaying regional variants), I omitted them. In the future, I would like to revisit this and add these data points to the gem, so full display names can be generated for any locale.

I wasn't sure if there was anywhere else this should be integrated. Maybe it could be integrated into Locale or LocalizedSymbol? My usecase didn't need anything more than this, but I'm open to suggestions.

@CLAassistant
Copy link

CLAassistant commented Apr 22, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants