The Rosette Platform

Rosette is an open-source, git-based, modular, internationalization framework written in Ruby.

intro

So what is internationalization?

Internationalization, or i18n for short, is the process of preparing a project to support multiple languages and/or markets. Most web and mobile frameworks (i.e. Rails, Django, iOS, Android, etc) have a mechanism for providing translations for a project’s text. In the Rails world for example, you’ll find a file containing English key/value pairs in config/locales/en.yml. These key/value pairs represent static string keys and dynamic English values. Although the values might change (maybe you want to explain something more clearly on your site), the keys rarely do. If you’d like to make your site available in Spanish, you’d create a similar file at config/locales/es.yml. This file contains the same keys as en.yml, but values written in Spanish instead of English.

en.yml:

1
hello_intro: Hello, world!

es.yml

1
hello_intro: ¡Hola, mundo!

How Can Rosette Help?

Rosette automates the process of translating your project’s content. It’s capable of extracting strings from a number of file formats (including source code), which it then stores in the configured data store. Once phrases have been extracted, it’s time for translation. Rosette doesn’t handle translation by itself - that part is up to you. There are a number of 3rd-party translation companies out there, like Transifex. You’ll need to consider using their services or translate the phrases yourself. Check to see if an adapter (or ‘integration’ in Rosette parlance) already exists for your service provider by looking at the list of projects in the rosette-proj Github organization. Although Rosette doesn’t provide mechanisms for translating text, it does store translations in its data store and associates them with their phrase counterparts. Once translations have been received, Rosette can export (or ‘serialize’) the phrase/translation pairs to a file you can use in your apps.

How is Rosette Different?

There are a few internationalization platforms out there, but Rosette sets itself apart because of its deep integration with the source control system Git. In fact, Rosette requires that your source code be managed by git. If it isn’t, Rosette won’t work for you.

Git encourages small incremental changes, which are made in units known as “commits”. Each commit has its own unique identifier, called a commit id (often referred to as a SHA). When Rosette extracts phrases, it does so at the commit level. That is to say, new or changed phrases are tagged with the commit id they came from. This means that you can retrieve a list of the phrases and translations at any point in the history of your repository. As a cool side effect, this also means you can differentiate easily between the phrases and translations that pertain to different features, i.e. features in different git branches.

Life of a Phrase

  • Engineers make changes to source code and add new phrases.
  • Engineers commit changes to git (git commit).
  • Engineers commit changes to Rosette (git rosette commit). note: this can be automated.
  • Phrases get translated and stored.
  • Translations get exported from Rosette.
components

Projects

Rosette is not one large project, but rather a number of smaller projects that can be configured to work together. Each of the individual projects is distributed as a Ruby gem. There are several top-level projects (rosette-core and rosette-server) as well as a number of smaller components. Here’s what the top-level projects do:

  • rosette-core: Core Rosette classes. All the brains.
  • rosette-server: An API layer you can use (via rosette-client) to ask Rosette questions about itself, process new commits, submit and export translations, etc.

Components

Components are divided up into five categories:

  • Extractors: Responsible for reading and parsing files that contain translatable phrases.
  • Data Stores: Store and retrieve phrases and translations as well as other metadata.
  • Serializers: Responsible for writing phrases and translations to files.
  • Pre-Processors: Transform translations immediately before they get serialized. Can be used for text normalization, etc.
  • Integrations: Plugins that hook into Rosette to provide access to 3rd-party functionality or resources.

All of Rosette’s official projects and components can be found on Github under the rosette-proj organization.

Usage

Generally speaking, you won’t use any of the components directly. Instead, you’ll use them via Rosette’s #build_config method. For example, this is how you’d add a repository with an extractor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
require 'rosette/core'
require 'rosette/extractors/yaml-extractor'

config = Rosette.build_config do |config|
  config.add_repo('my_awesome_repo') do |repo_config|
    repo_config.set_path('/path/to/workspace/my_awesome_repo/.git')
    repo_config.set_description('A bowl of awesomeness')

    repo_config.add_extractor('yaml/rails') do |ext_config|
      ext_config.set_conditions do |cond|
        cond.match_path('config/locales/en.yml')
      end
    end
  end
end

You can use this config to stand up an instance of rosette-server, although we haven’t configured it to use a datastore yet. Most of rosette-server’s functionality won’t work without a datastore, so don’t push this to production just yet:

1
2
3
4
5
6
7
8
require 'puma' # or whatever webserver you want
require 'logger'
require 'rosette/server'

Rosette.logger = Logger.new(STDOUT)
Rosette::Server::V1.set_configuration(config)

run Rosette::Server::V1

Since rosette-server is a rack application, save all this code in a file called config.ru and run rackup to start the server. If you’re not familiar with Ruby or Rails and/or don’t know what a “rack application” is, you can get up to speed quickly by cloning the rosette-template and following the instructions. The template contains a preconfigured Rosette instance ready to start processing commits.

config

Repos and Extractors

In the previous section, I described how to configure a repository and add an extractor to it. Let’s go over each line in some detail.

First, we add a repository by calling the #add_repo method:

1
config.add_repo('my_awesome_repo') do |repo_config|

#add_repo yields a repo configuration object and takes a string argument that semantically identifies the repo. This repo name doesn’t really mean anything to Rosette - it’s only used to uniquely identify the repo. That said, it’s a good idea to use name of your repository here, since rosette-client will use it when making API calls. For example, if your repository is on disk at /path/to/my_awesome_repo, then you should use ‘my_awesome_repo’ as the repo name.

Next, we tell Rosette where to find the repo on disk and give it a semantic description:

1
2
repo_config.set_path('/path/to/workspace/my_awesome_repo/.git')
repo_config.set_description('A bowl of awesomeness')

Note that the path you specify should point to your repo’s .git directory.

Finally, we add an extractor to the repo. As described above, extractors are responsible for reading and parsing files that contain translatable content:

1
2
3
4
5
repo_config.add_extractor('yaml/rails') do |ext_config|
  ext_config.set_conditions do |cond|
    cond.match_path('config/locales/en.yml')
  end
end

In the same way #add_repo yields a repo configuration object, so does #add_extractor yield an extractor configuration object. Here we use the #set_conditions method to tell the extractor which files it should read and parse. Any of the files in the repository that match these conditions will have their phrases extracted and recorded in the configured datastore (for a full list of matchers, see rosette-core’s yard docs, specifically this).

You can chain matchers using #and and #or to build sets of complex matching rules. For example, if my Rails app also has an engine named ‘cart’ that has its own translations, I might configure the extractor like this:

1
2
3
4
5
6
7
repo_config.add_extractor('yaml/rails') do |ext_config|
  ext_config.set_conditions do |cond|
    cond.match_path('config/locales/en.yml').or(
      cond.match_path('cart/config/locales/en.yml')
    )
  end
end

Serializers and Pre-Processors

Extractors handle pulling strings out of your repository, but eventually you’ll need to write translations back out to files. That’s what serializers are for. Serializers handle the process of converting the translation entries in Rosette’s datastore to files your application can read and make use of. For a Rails app for example, we need to be able to serialize Spanish translations into the file at config/locales/es.yml.

Let’s configure a Rails yaml serializer for our repo:

1
2
require 'rosette/serializers/yaml-serializer'
repo_config.add_serializer('rails', format: 'yaml/rails')

That’s really all we have to do. Serializers optionally yield a serializer configuration object that we can use to configure pre-processors. Pre-processors modify translation text before it gets serialized and written. For example, I might want to normalize my translations using the Unicode normalization algorithm (in short, a process by which characters are “fixed” so they’re all consistently represented).

1
2
3
4
require 'rosette/preprocessors/normalization-preprocessor'
ser_config.add_preprocessor('normalization') do |pre_config|
  pre_config.set_normalization_form(:nfc)
end

Datastores and Caches

Up until this point, we haven’t configured a way for Rosette to store phrases and translations. To do so, we’ll go back to the top-level and add one using the #use_datastore method. Rosette expects the datastore to be implementation independent - in other words, Rosette doesn’t care how the datastore actually works under the hood. All that’s required is that the datastore conform to a well-defined datastore interface.

In our examples here, we’re going to use a datastore backed by ActiveRecord, the data access layer provided by Rails. Support for ActiveRecord comes in the form of the rosette-active-record gem, and we can configure it like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
require 'rosette/data_stores/active_record_data_store'
config.use_datastore(
  'active-record', {
    host: 'localhost',
    port: 3306,
    username: 'myname'
    password: 'secretsecret',
    adapter: 'jdbcmysql',
    database: 'rosette',
    encoding: 'utf8',
    pool: 16
  }
)

For this example, you’ll also need to add the activerecord-jdbcmysql-adapter gem to your gemfile. Since Rosette runs on top of JRuby, you’ll need to always use a JDBC-based adapter if you want to use rosette-active-record.

Next, let’s configure a cache. Configuring a cache can help Rosette’s performance. Under the hood, Rosette uses ActiveSupport::Cache, so you’ll need to choose one of the caches it supports. In this example, we’ll use memcached via the dalli gem:

1
config.use_cache(:dalli_store, namespace: 'whatever', compress: true)

Integrations

Integrations provide a mechanism for adding support for 3rd-party services to your Rosette configuration. For example, you may want to add an integration for the translation management system you’ve selected (like Transifex or Smartling). In our example, we’re going to show how to integrate with Rollbar, an error reporting service. We’d like to report extraction errors as well as anything else that might go wrong to help us track down and fix bugs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
require 'rollbar'

notifier = Rollbar::Notifier.new.tap do |notifier|
  notifier.configure do |c|
    c.access_token = 'secretsecret'
    c.environment = 'production'
    c.logger = Rosette.logger
    c.enabled = true
  end
end

require 'rosette/integrations/rollbar_integration'

config.add_integration('rollbar') do |rollbar_config|
  rollbar_config.set_rollbar_notifier(notifier)
end

Now, any errors caught by Rosette will be sent to Rollbar.

server

rosette-server

Back in the components section, I described briefly how to stand up an instance of rosette-server. The server’s main goal is to give you programmatic access to your Rosette system via an API. Once your server is up and running, you should be able to visit http://localhost:9292/v1/swagger_doc to see a list of all available API endpoints:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
  "apiVersion": "v1",
  "swaggerVersion": "1.2",
  "produces": ["application/json"],
  "apis": [
    {
      "path": "/extractors.{format}",
      "description": "Information about configured extractors."
    },{
      "path": "/git.{format}",
      "description": "Perform various git-insipired operations on phrases and translations"
    },{
      "path": "/translations.{format}",
      "description": "Perform various operations on translations"
    }
  ]
}

The most interesting parts here are the ‘git’ and ‘translations’ sections. To see more information about git-based operations, visit http://localhost:9292/v1/swagger_doc/git. Here’s a snippet of what you should see:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
  "apiVersion": "v1",
  "swaggerVersion": "1.2",
  "resourcePath": "/git",
  "produces": ["application/json"],
  "apis": [
    {
      "path": "/v1/git/commit.{format}",
      "operations": [
        {
          "notes": "",
          "summary": "Extract phrases from a commit and store them in the datastore.",
          "nickname": "GET--version-git-commit---format-",
          "method": "GET",
          "parameters": [
            {
              "paramType": "query",
              "name": "repo_name",
              "description": "The name of the repository to examine.
                Must be configured in the current Rosette config.",
              "type": "string",
              "required": true,
              "allowMultiple": false
            },{
              "paramType": "query",
              "name": "ref",
              "description": "The git ref to commit phrases from.
                Can be either a git symbolic ref (i.e. branch name) or a git commit id.",
              "type": "string",
              "required": true,
              "allowMultiple": false
            }
          ],
          "type": "void"
        }
      ]
    }
  ],
  "basePath": "http://localhost:9292"
}

rosette-client

Great, your server is working. Time now to interact with it a little. The rosette-client gem will allow you to do just that. This gem is capable of making API calls to a running instance of rosette-server and nicely printing the results. You can either add rosette-client as a dependency of another project and use it to make API calls by hand, or install it with rubygems and use it via the command line.

The command line version installs an executable named git-rosette that you can use inside git repositories. To set up rosette-client, create a file at ~/.rosette/config.yml (in your home directory). It should have the following contents:

1
2
3
:host: localhost  # assuming you're running rosette-server locally
:port: 9292
:version: v1

Next, install rosette-client by executing gem install rosette-client. If you installed JRuby with rbenv, don’t forget to run rbenv rehash to make the git-rosette executable available on your path.

Finally, change directory into your git repository, commit a few nonsense phrases, and run git rosette commit. If everything is configured properly, you should see something like this:

1
2
3
Added: 2
Removed: 0
Modified: 0

Now that you have some phrases committed, you should be able to run git rosette show:

1
2
3
diff --rosette a/config/locales/en.yml b/config/locales/en.yml
+ testing test (my_app.header.foobar)
+ purple eggplants (my_app.testing.megazork)