Kotoistus - Finnish exemplarCharacters suggestion 1.

Kotoistus - Finnish exemplarCharacters suggestion 1.

CLDR versio 1.2
Locales: fi fi_FI

exemplarCharacters: Two versions exist. The standard exemplarCharacters and exemplarCharacters auxiliary.
Only lower case letters need to be defined in exemplarCharacters.
exemplarCharacters defines the characters in use in the Finnish locale. It is used in the following two ways:

Check if some text contains only characters from the Finnish locale.
Check if some encoding (or font) contains all characters needed in the Finnish locale.

Possible other uses for exemplarCharacters:

OCR (define characters we expect to find)
subtitles
header font selection
database indexes. Use exemplarCharacters to convert words to their basic form before indexing.
keyboard layout. A keyboard used in a certain locale should be able to produce characters in exemplarCharacters without difficulty.
applications can presume that characters in exemplarCharacters are available without special arrangements
characters in the auxiliary exemplarCharacters can appear in text, but the user might need help in producing them. E.g. a keyboard tool (ot at least instructions) on a web form. A text processing application might provide keyboard shortcuts for these characters.

CLDR data must not contain characters missing from exemplarCharacters. E.g. country names must be written using only characters in exemplarCharacters.
The right hand side column contains comments (in Finnish).

exemplarCharacters (fi):
Characters needed for grammatically correct Finnish.
Suggestion: [ a-z å ä ö š ž ]

å   (00e5)
ä   (00e4)
ö   (00f6)
š   (0161)  hattu s
ž   (017e)  hattu z

exemplarCharacters auxiliary (fi):
Characters which are used in foreign words in Finnish.
Suggestion: [ a-z å ä ö š ž á à ã é è ë ï õ ô ü æ ø œ č ç ñ ř ß ]

Our criteria for inclusion in the auxiliary set has been whether a character is in wide spread use in Finnish newspapers, books etc..

Names for characters contains a list of Finnish character names.

Some characters' collation order in Finnish is different from the Unicode default. If a character's Finnish collation order differs from the Unicode default, it can be given a Finnish collation order, even if it doesn't appear in exemplarCharacters. However, this is problematic (or at least inconsistent), since only characters in common use should collate different from the default. (See "thorn" for a Finnish exception.)

á    	(00e1)		pohjoissaame,inarinsaame
à 	(00e0) 		yksikkömerkki (esim. kynät à 1,00€).
ã 	(00e3) 		portugali
é 	(00e9) 		suomen-ruotsalaiset nimet
è 	(00e8) 		ranska
ë 	(00eb) 		ranska (esim. Noël, Citroën)
ï 	(00ef) 		ranska
õ 	(00f5) 		viro, koltansaame, portugali
ô 	(00f4) 		ranska
ü    	(00fc)		saksalainen y
æ    	(00e6)		norja,tanska
ø    	(00f8)		norja,tanska
œ   	(0153)		norja,tanska
č   	(010d)		pohjoissaame,inarinsaame,koltansaame
ç    	(00e7)		ranska, portugali
ñ  	(00f1)		espanja (esim. mañana). 
ř  	(0159)		tšekki
ß   	(00df)		saksalainen kaksois s

exemplarCharacters (fi_FI):
Preliminary version. Should include characters from other domestic languages (saame, roma and swedish).
Suggestion: [ a-z å ä ö š ž á à ã é è ë ï õ ô ü æ ø œ č ç ñ ř ß ʒ ǯ â đ ǥ ǧ ȟ ǩ ŋ ŧ ń ]

The intention is that, at least in theory, a programmer/system implementer can create exemplarCharacters supersets; e.g. an EU superset could be generated by taking a union of all EU countries' exemplarCharacters values (*_FI, *_SE, *_DE, *_FR, etc.).

ʒ   	(0292)		koltansaame
ǯ 	(01ef)		koltansaame
â    	(00e2)		inarinsaame,koltansaame
đ   	(0111)		pohjoissaame,inarinsaame,koltansaame
ǥ    	(01e5)		koltansaame
ǧ    	(01e7)		koltansaame
ȟ   	(021f)		romani
ǩ    	(01e9)		koltansaame
ŋ   	(014b)		pohjoissaame,koltansaame
ŧ    	(0167)		pohjoissaame
ń 	(0144) 		luulajansaame. On oletettavaa, että 
   		       	saamenkielen lukijat kohtaavat tämän merkin usein.

The following characters have been considered but not included in exemplarCharacters:

ð    	(00f0)		islanti (REMOVED from collation)
þ    	(00fe)		islanti (REMOVED from collation)
í  	(00ed)		islanti 
ű    	(0171)		unkari
ő   	(0151)		unkari
ā 	(0101) 		pitkä a.
ō 	(014d) 		pitkä o.
ū    	(016b) 		pitkä u.

Linkit:

* * * * *