Multi-language and language associations
J4 essay - Why should Joomla! do Routing and Record's Language Association for you
Hugo Azevedo 782

J4 essay - Why should Joomla! do Routing and Record's Language Association for you

Note: This document is part of a set of documents which constitute an essay about what Joomla! is, or can be in the future. These documents do not constitute the current state of Joomla!'s development, nor are they just rants or brain dumps. They list a set of structured ideas about what Joomla! can be, and provide some ideas on how to get there. Their main purpose is to generate discussion over Joomla!'s community.

If you have developed content generating extensions in Joomla!, you'll know that there's a lot of repeated code that goes into your new extension. Routing and Language Associations are examples of this repeated code.

Routing

Routing plays a big part in MVC Components, on an application that uses SEF URLs. It's not needed if you're only using query strings in your URLs.

It can take many shapes and forms, but its main purpose is to convert the SEF's URL segments, and map them to a Controller's Method in a Component (ex:. /component/controller/method/record). One of its biggest strengths, is that it can route a SEF URL to a record's request, without needing to connect to a database. It does this by identifying in the URL's segments what the user wants, and forwarding that request to a method that will deliver the intended result.

In Joomla!, this happens a bit differently. Joomla! uses a record's stub, which is a very unaesthetic solution that joins a record's id to its alias, in order to identify the record in a SEF URL. Although not pretty, it's a very clever and efficient solution, in order to display the record's alias and still retain the record's id, that can easily be retrieved by type casting the URL segment to an integer.

Implementing routing in your Components, is not a peaceful task either. Joomla! lets you develop your own router for your Component, and you can implement it with relative freedom, but if you implement it any differently than the core extensions do, you'll end up having different looking URLs in your website.

Components shouldn't really have to deal with routing in Joomla!. That's Joomla!'s job! If a route is configured on the system (and it should be configurable), all extensions should be able to use it, without having to implement it themselves, which frees Joomla! to implement any king of routes. If Joomla! implements a default route, then all extensions should look the same. Some examples of routes could be:

In these examples, the only thing Joomla! should need, in an unique record alias in the last segment, and from there, it should be able to find even the Controller's Method to execute. This means, that record aliases should also be centralized.

But in order to achieve this, maybe Joomla! should use record mapping, instead of individual routers.

Because record aliases are dependent on languages, a possible implementation for record mapping will be described below.

Language associations

By default, Joomla! is a multi-language CMS, which is a good thing. Even if your website only has (initially) one language, it's still a multi-language website. Not because it will allow multiple simultaneous languages, but because it will allow you to set a different default language from a multiple selection of languages, and because it will allow you to translate any field/output it produces.

One of the biggest problems I have with this approach is, why should I have to develop the same code every single time in order to achieve this, and therefore, have to maintain it as well in case anything on Joomla!'s side changes. Another thing is, what exactly constitutes a record's translation in Joomla!. I should probably start with this last one.

What is a Record?

So, what is a record? In Joomla!, and considering the way Joomla! deals with multi-language translations, a record isn't really a record. A record is actually a set of independent records associated by the user. This approach creates a set of problems that proper record translation wouldn't have. Let me be more specific.

When you create an Article in Joomla!, you choose a primary language for that Article. If you need a translation for that Article, you'll need to create a second independent article, and associate the two in the end. Now, the problem is, although they talk about the same thing in different languages, they are independent records, and not translations.

Joomla! has a set of predefined database fields that hold information about a record:

Now, just think of this little example. Imagine you have a couple hundred Articles (/Products) created, in 5 or 6 languages, and want to know which Article is the most read. How would you do it?

Problems this approach creates

From the UX point of view, there's a problem (which you can test) that quickly stands out. When you create an Article for the first time, and even though you specify a language for it, Joomla! still doesn't know what language that Article is in, until you save it. This means that you'll need to save it first, just to get the Associations tab to filter the Article's language out, because initially, it will give you the impression that you can associate that Article with another one for the same language, which is misleading.

Another UX problem that comes from this approach, is that you'll need to use the filter (if available) in order to list all "records" from a specific language, and make the list shorter. If you don't, you'll see "repeated" Articles in the list, making the list much larger. Although a bit annoying, in the list View this doesn't constitute a big problem, you'll just need to change the language filter, which is only one more user interaction. But lets think about being already inside a specific translation of an Article/Product. There's no easy wait to jump straight to another language, without going back to the list View, then change the search filters, look for the right "independent" Article, and get back inside in the edit View. Now think about this previous example, where you have more than one language, and a couple hundred Articles...

If you still don't see a problem here, think about managing Categories on top of Articles/Products/etc. Think about a complex and extensive Category tree found in an e-commerce website, and try to set publishing dates to Categories and/or Products, or setting specific user permissions, or even setting them as featured...

From the data point of view, this also presents a number of problems. What is exactly a record? If you want to know when was a specific content first created, or how many hits it as, or who created it first, how do you know? The truth is, whatever your approach might be, it's not going to be very straight forward. You'll always have to use a work-around solution.

Another problem regarding data, is that the records (associations) aren't really connected on the database. Their association's data is stored in the database, but the association itself, is done via the application. This will always be slower.

Changing language associations to record translations

To achieve this, Joomla! needs to treat a record as a (single) record. That record has common properties that propagate to any translation it might have. It has an ID, it belongs to a Category, it has a Create and last Modified date, it has a Creator and a user that last modified it, and it's either Checked-in or not. And all of this shouldn't be dealt with by any single extension. It should be handled by Joomla! itself.

Here's a possible structure for a record (content) table:

CREATE TABLE `record` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Record ID',
  `catid` int(11) NOT NULL COMMENT 'Category ID',
  `created` datetime NOT NULL COMMENT 'Date the record was created',
  `created_by` int(11) NOT NULL DEFAULT '0' COMMENT 'Creators user ID',
  `modified` datetime NOT NULL COMMENT 'Date the record was last modified',
  `modified_by` int(11) NOT NULL DEFAULT '0' COMMENT 'Last modification user ID',
  `checked_out` int(11) DEFAULT NULL COMMENT 'The ID of the user that checked out the record',
  `checked_out_time` datetime DEFAULT NULL COMMENT 'The date the record was checked out',
  `state` int(11) NOT NULL DEFAULT '1' COMMENT 'The Record state',
  `publish_up` datetime DEFAULT NULL COMMENT 'Publication statring date',
  `publish_down` datetime DEFAULT NULL COMMENT 'Publication ending date',
  `ordering` int(11) NOT NULL DEFAULT '1' COMMENT 'The record order in a list',
  `hits` int(11) NOT NULL DEFAULT '0' COMMENT 'The record hit counter',
  `featured` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'If the record is a featured record',
  `extension` varchar(45) NOT NULL DEFAULT 'com_content' COMMENT 'The extension that will process the record',
  `task` varchar(45) NOT NULL DEFAULT 'controller.task' COMMENT 'The task that will execute to display this record',
  PRIMARY KEY (`id`) COMMENT 'Table primary key',
  CONSTRAINT `record_category`
    FOREIGN KEY (`catid`) REFERENCES `categories` (`id`)
    ON DELETE CASCADE
    ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

As you can see, the record's table doesn't even have a Title and/or Alias, but then again, why should it!!! Titles and Aliases are language specific, and in this example, a record doesn't need to know, or care to know about any Languages. It should only care about the content identification.

So, how would Joomla! distinguish between different Extension's content, if they all use the same content table? Well, that's were the last two columns get into play. The last two columns are 'extension' and 'task'. If you previously read the post about J4 essay - Why Joomla! needs both a Framework and a Platform, and the one about J4 essay - How could Joomla! change and improve its MVC pattern implementation, then you'll start to see the bigger picture.

A record will have a Controller's Method associated with it, that will handle what is displayed. And a Controller belongs to an Extension. The user would set the appropriate method to use, from a predefined list of methods, in the same way the user sets a Category for the record.

For Translations, you would have a secondary table that would handle them. The structure could be something like this:

CREATE TABLE `translations` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Translation ID',
  `recid` int(11) NOT NULL COMMENT 'The Record id',
  `language` varchar(10) NOT NULL COMMENT 'The language code for the Translation',
  `state` int(11) NOT NULL DEFAULT '0' COMMENT 'The Translation state',
  `title` varchar(250) NOT NULL COMMENT 'The title for the Translation',
  `alias` varchar(250) NOT NULL COMMENT 'The alias for the Translation',
  PRIMARY KEY (`id`),
  UNIQUE KEY `translation_unique_lang` (`recid`,`language`) COMMENT 'Single translation per record',
  UNIQUE KEY `translation_unique_alias` (`language`,`alias`) COMMENT 'Single alias per language',
  CONSTRAINT `translation_record`
    FOREIGN KEY (`recid`) REFERENCES `record` (`id`)
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `translation_language`
    FOREIGN KEY (`language`) REFERENCES `languages` (`language`)
    ON DELETE CASCADE
    ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

As you can see, this secondary table needs a reference to the record's table. Every record for a translation has to be attached to a single record. It doesn't make any sense to translate an nonexistent record. This would be enforced by a FOREIGN KEY to the record's table.

As a protection, and a way to speed up SELECT queries, a second UNIQUE INDEX would be created to include the record's ID and the Language, locking every distinct combination of the two. The system shouldn't have two translations for the same record, in the same language.

If information about translation is necessary, some of the columns defined in the record's table could also exist in the translation's table. Examples of this are the 'hits', or 'created' and 'modified' columns, that could also exist in the Translations table, just like 'state' also does.

So, how would a developer set a table for his/her extension?

The same way that a developer now adds a table with a 'catid' column, he/she will could now just add a column called 'transid'. But the truth is, the developer doesn't even need this 'extra' column, because just the ID could do the trick.

Think of an Extension for product management, that will hold 3 product properties. A price, a description and a photo. Today, that Extension's table could look something like this:

CREATE TABLE `joomla`.`product` (
    `id` INT NOT NULL AUTO_INCREMENT COMMENT 'Product ID',
    `catid` INT NOT NULL COMMENT 'Category ID',
    `created` DATETIME NOT NULL COMMENT 'Date the product was created',
    `created_by` INT NOT NULL COMMENT 'Creator ID',
    `modified` DATETIME NOT NULL COMMENT 'Date the product was last modified',
    `modified_by` INT NOT NULL COMMENT 'Last modification user ID',
    `checked_out` INT NULL DEFAULT NULL COMMENT 'The ID of the user that checked out the product',
    `checked_out_time` DATETIME NULL DEFAULT NULL COMMENT 'The date the product was checked out',
    `state` INT NOT NULL COMMENT 'The products state',
    `publish_up` DATETIME NULL DEFAULT NULL COMMENT 'Publication starting date',
    `publish_down` DATETIME NULL DEFAULT NULL COMMENT 'Publication ending date',
    `ordering` INT NOT NULL DEFAULT 1 COMMENT 'The product ordering',
    `hits` INT NOT NULL DEFAULT 0 COMMENT 'The product hit counter',
    `featured` TINYINT NOT NULL DEFAULT 0 COMMENT 'If the product is a featured product',
    `price` DOUBLE NOT NULL DEFAULT 0 COMMENT 'Products price',
    `description` TEXT NOT NULL COMMENT 'Products description',
    `image` VARCHAR(45) NOT NULL COMMENT 'Products image',
PRIMARY KEY (`id`)  COMMENT 'Tables primary key');

If the developer didn't follow this data table's structure, he/she wouldn't be unlocking all of Joomla!'s available functionality.

In the new structure, the developer would only have to create the following table:

CREATE TABLE `joomla`.`product` (
    `id` INT NOT NULL COMMENT 'Translation ID',
    `price` DOUBLE NOT NULL DEFAULT 0 COMMENT 'Product price',
    `description` TEXT NOT NULL COMMENT 'Product description',
    `image` VARCHAR(45) NOT NULL COMMENT 'Product image',
    KEY `product_index` (`id`),
    CONSTRAINT `product_translation`
        FOREIGN KEY (`id`) REFERENCES `translations` (`id`)
        ON DELETE CASCADE
        ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Because all other fields are in Joomla!'s core data tables, the developer wouldn't need to worry about managing them. Not only would it be more maintainable for the developer, but could also represent more freedom for Joomla! to develop the database's schemas, without any comparability issues.

As you can see, this last table doesn't really have a PRIMARY KEY set, only a UNIQUE INDEX and a FOREIGN KEY CONSTRAINT. The Product's ID would reference the corresponding Translation ID, and in this sense, the Extension's table would only provide extra fields that the core data tables wouldn't have.

This would still allow the developer to have any kind of structure in his/her tables. In fact, instead of having to replicate the same structure every single time he/she creates a new table, they would only need to enforce the UNIQUE INDEX and the FOREIGN KEY CONSTRAINT on one single column.

This would also facilitate Joomla!'s job in trying to understand what the client is requesting. The following query string '?option=ecommerce&view=product&layout=edit&format=html&&lang=en-US&id=4' would just become '?lang=en-US&id=4', because all other information is already set in the record's table. If other functionality would be necessary, the task variable would be used. The supplied ID is the Record's ID, not the Product's ID. The Language will give the translation, and from there, you'll get to the Product.

Benchmarking the solution

First, I created a Categories data table with only two columns (id, name), in order to have a FK in the Record's table. I inserted 1K (1000) categories in the table, which seams to be a very fair sized website.

Then I populated the Record's table, with groups of 1K randomly ordered records (SELECT INTO) until it had 1M records, which seamed to me that it would be a very respectable sized website. Keep in mind that, the Record's data table is supposed to be holding every single record in the system, independent of which extension a record might belong to. I ended up with a 110MB data table on disk.

After the Record's data table was created, I needed to create the Translations data table, but before that, I need a Languages data table so I could have a FK on the Translations. I added 6 distinct languages to it, and then I used them to insert randomly all the records in the Record's data table linked with each language, into the Translations data table. This gave me a 6M unordered records data table with 1GB.

I've finished by creating a Product's data table, and populated it with 100k randomly selected records (from Translations). From a system with 1M records, in 6 different languages (6M records), I assumed 100K were related to Products.

All the data tables were using MySQL's INNODB engine to make the tables ACID compliant, which makes them slower.

I also created a MyISAM table, with no FK's, with the current Joomla!'s structure that will hold the same information, and populated it with the exact same records. Here are the results:

Command executed. 1st MyISAM Min. MyISAM 1st INNODB Min. INNODB
SELECT command with simple paging (10 results, no index) 0,041s 0,012s 0,211s 0,134s
SELECT command filtering by date 0,00080s 0,00025s 0,0089s 0,0015s
SELECT command filtering by alias (single record) 0,025s 0,025s 0,142s 0,106s
UPDATE command (change product's price) 0,00019s 0,00019s 0,0064s 0,0064s
UPDATE command (change product's title) 0,00035s 0,00020s 0,0064s 0,0057s
UPDATE command (change product's modified date) 0,00055s 0,00020s 0,0064s 0,0055s
INSERT new product 0,00013s 0,00011s 0,0088s 0,0053s

As expected, MyISAM holds the best results, but the point of this exercise was, to show that a solution that distributes any Record information across 3 tables, using an ACID compliant engine, is viable. Running core Joomla!'s tables in INNODB, can make the system far more reliable.

Benefits for this approach

There are a several benefits for this approach. To begin with, any content on a Joomla! system, would have a single, and unique ID, independently of which Extension the content might belong to.

If Record and Translations tables are part of Joomla!'s core, then Joomla! can handle their data for the developer. Models would deal with these two tables for the developer. The developer would only have to deal with Extension's specific data table fields.

Extension's routing wouldn't be needed. Routing would become centralized as well, and therefore, would be uniform across all the website, independently of how many Extensions the User/Implementor has. Only one central Router is needed, and it can take any shape or form.

Centralized routing would also make the infamous ItemID variable obsolete, in and out of the query string. Joomla! wouldn't be so dependent on Menu Items for routing and/or SEF URLs. The Translation's data table enforces the uniqueness of a record's alias, and by linking back to the Record's data table, Joomla! can know which extension and method it should invoke. No more URLs like /component/extension/...

Module loading, wouldn't need any Menu Item either. Modules would be loaded according to the loaded extension (context), which is what makes sense.

Language Associations would cease to exist completely. Record translations would be automatically linked together, due to the fact that they would point to the same Record. This means that the Associations Tab in the UI would disappear, and Extension developers wouldn't have to worry about replicated extra code in their classes.

From the UX point of view, there are also a few benefits. On a list View (ex:. articles) the user would only see the records in the language he/she set to the UI, or the default language if the first one wasn't found. The truth is, the user wouldn't have any need to filter the list by language. There wouldn't be a purpose for it.

By editing a record, and selecting a new language in the language drop down list, the user would just need to input the information, and Joomla! would create/edit the correspondent records and translations. The records (ex:. Articles/Products/...) would be automatic linked together, because they would just act as translations to a Record in the parent table.

And at last, Joomla!'s query string would become a lot leaner. It would only need the Language variable (can come from session), and an ID. From there, Joomla! could load the correct Controller->Method combination, that would return View the developer intended, in the correct format, partial or full, etc...

SmarterQueue - The smartest way to do social media.

SmarterQueue - The smartest way to do social media.

LRTT - Limited Resource Teacher Training

LRTT - Limited Resource Teacher Training

shareOptic - Cyber security monitoring solution

shareOptic - Cyber security monitoring solution

RAD Php Blockz web framework

RAD Php Blockz web framework