Community Forums

Development Forum

Charset problem. ATutor is not fully UTF-8 compliant


You must be signed-in to post.

AuthorSubject
 
Page: 1
tasmi
Subject: Charset problem. ATutor is not fully UTF-8 compliantQuote this post in your reply
Hi all.

I'm the administrator or an ATutor system which have a lot of courses with Spanish content.

As you know, the Spanish language have some special chars like "tildes" (á,é,í,ó,ú) and "n tilde" (ñ).

Since ATutor "become" UTF-8 (version 1.6), I have been viewing this special chars "malformed" en phpMyAdmin (which connects to the database using UTF-8), for example, the "e tilde" (é) is displayed in phpMyAdmin like "é". However, these characters are correctly displayed in the browser, using UTF-8 as page encoding.

I have been very confused during a long time about the reason of this strange behavior, but recently I have investigated about it and I found the problem.

The problem is that PHP's MySQL extension uses 'latin1' as the default character set for connect to the database. So, even if the mysql server is configured to use utf8, the php's connection uses latin1, introducing the data in the DB in 'latin1'.

The way to solve it is quite simple: "After the connection, tell the server that you want to use utf8 for all transactions". We can do that with the command mysql_query("SET NAMES 'utf8'",$db_link) for MySQL versions older than 5.2.3, or the command mysql_set_charset('utf8',$db_link) for MySQL >= 5.2.3

Actually, ATutor doesn't use these commands, so the connection uses latin1 as encoding.
This is not a problem with the English language because it hasn't chars that have "different encoding" in UTF-8 and Latin1, but, in Spanish, French, Russian, etc., this characters are stored in the DB in latin1 even though the table and field have utf-8 as character set.

The consequences: I have a DB with a lot of characters like é,ó,ñ,...

You can check the encoding used by ATutor in the mysql connections including this lines in a atutor script (for example in the login page):
$sql = "show variables like 'c%'";
$result = mysql_query($sql,$db);
while ($row = mysql_fetch_array($result)) {
echo $row['Variable_name'].': '.$row['Value']."\n";
}

I have had some test with a new installation of ATutor 1.6.4 adding the "mysql_set_charset" command in the ATutor mysql connection file (include/lib/mysql_connect.inc.php), but then, I found another problems:
- A lot of PHP's string functions doesn't work with multibyte characters (special chars encoded in utf-8). So functions like str_replace, strtoupper, etc. crash.
- In other functions like htmlentities or htmlspecialchars, there is mandatory to specify the charset of the text.

To manipulated successfully string with multibyte characters, we need to use the MultiByte String Functions of PHP (http://es.php.net/manual/en/ref.mbstring.php)

I don't know home many work is needed to solve this problems, but I think there is a very important matter to focus in the next version of ATutor.

Here are some resources with a lot of information about the PHP-MySQL-Charset problems:
www.phpwact.org/php/i18n/charsets
www.phpwact.org/php/i18n/utf-8/mysql
adviesenzo.nl/examples/php_mysql_charset_fix/
marc.info/?l=php-db&m=117026794909888&w=2


My environment:
Operating system ATutor is installed on - Windows, Linux, MacOS
ATutor version - From 1.6 to 1.6.4
Patch #s applied - All
ATutor theme name - Default and Owned theme
PHP version - 5.2.6
MySQL version - 5.0.37 and 5.0.51a
Webserver & version - Apache 2.2
Encoding of PHP files: UTF-8
Posted: 2010-01-11 12:27:15
harris

Avatar for harris
Subject: Re: Charset problem. ATutor is not fully UTF-8 compliantQuote this post in your reply
Hi,

Thank you for the detailed report. This is confirmed and we are looking into it. We have also added this to our bug tracker.

I don't think it will be a simple fix by just adding the "SET NAMES 'UTF8'" however, the current content will be encoded/decoded differently and data will be lost on the client's end. We will issue a patch once we have a solution for it.

Thank you again for spending the time on this detailed report, we appreciated it.


Regards,
Harris
Posted: 2010-01-11 14:35:46
tasmi
Subject: Re: Charset problem. ATutor is not fully UTF-8 compliantQuote this post in your reply
> I don't think it will be a simple fix by just adding the "SET NAMES 'UTF8'" however, the current content will be encoded/decoded differently and data will be lost on the client's end. We will issue a patch once we have a solution for it.

Sure, for existing installations of ATutor which have the content incorrectly encoded, there is a very big problem, because they have to "translate" all the content frown

Theoretically, MySQL can convert automatically the content if we change the type of the fields to binary and then, again to text with the correct encoding, but this method doesn't work in this case because the content have invalid characters.

I've been working in a script to convert automatically the text fields (char, varchar, text, etc.) of all the database. I've tested in my database and it works quite well for me. If you want, I can send it to you.

Regards.
Posted: 2010-01-12 03:44:53
harris

Avatar for harris
Subject: Re: Charset problem. ATutor is not fully UTF-8 compliantQuote this post in your reply
That would be great! You can send it to my email or just attach it here. I have a system with multiple languages on it to run a test on. Thanks!
Posted: 2010-01-12 09:54:20
tasmi
Subject: Re: Charset problem. ATutor is not fully UTF-8 compliantQuote this post in your reply
I have to do some "tricks" to run the script successfully in my DB, because I had some special chars badly encoded twice! (some like Ñ). So, I have to run the script more than one time in some tables (specifically at_messages and at_messages_sent).

I'm looking through the script to "generalize" it, but I've some problems with PHP and his "strange behavior with encodings" frown

I hope send you it soon.

Regards.
Posted: 2010-01-13 07:55:27
tasmi
convert_data_charset.zip
Subject: Re: Charset problem. ATutor is not fully UTF-8 compliantQuote this post in your reply
Hi harris,

Here is the script. I've tested it some times yesterday and I think it's ready.

Regards.
Posted: 2010-01-15 02:46:45
harris

Avatar for harris
Subject: Re: Charset problem. ATutor is not fully UTF-8 compliantQuote this post in your reply
Thanks! I will run it on my test system and let you know the results asap.
Posted: 2010-01-15 10:00:48
jjcaicedob
Subject: Re: Charset problem. ATutor is not fully UTF-8 compliantQuote this post in your reply
Can you help me with the script? Please send me to webmaster@educadorvirtual.org

In reply to:
Hi harris,

Here is the script. I've tested it some times yesterday and I think it's ready.

Regards.

Posted: 2010-06-15 09:39:13
tasmi
Subject: Re: Charset problem. ATutor is not fully UTF-8 compliantQuote this post in your reply
Hi jjcaicedob,

You can download the scrip attached by me some months ago in a previous message.
The download link is:
www.atutor.ca/forums/dl_attachment.php?pid=19428;f=convert_data_charset.zip;m=e61caada4030ed74b563f4628265d42c
Posted: 2010-06-15 10:19:03
jjcaicedob
Subject: Re: Charset problem. ATutor is not fully UTF-8 compliantQuote this post in your reply
Sorry, I can't download the ZIP file. I see a blank page when I clicked and in a previous message I don´t see a link.

Thanks,

In reply to:
Hi jjcaicedob,

You can download the scrip attached by me some months ago in a previous message.
The download link is:
www.atutor.ca/forums/dl_attachment.php?pid=19428;f=convert_data_charse...

Posted: 2010-06-17 17:47:20
tasmi
Subject: Re: Charset problem. ATutor is not fully UTF-8 compliantQuote this post in your reply
I've just sent you the script.

Regards.
Posted: 2010-06-18 02:20:58
 
Page: 1

You must be signed-in to post.

Related Articles