Difference between revisions of "UTF8 Tools"

From Free Pascal wiki
Jump to navigationJump to search
 
(12 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 +
{{LanguageBar|UTF8 Tools}}
 +
 +
__TOC__
 +
 
== About ==
 
== About ==
  
Sharing some of my code
+
This code allows to process Unicode text and determine for unicode chars:
 +
 
 +
* if char "letter"
 +
* if char "digit"
 +
* if char upper-case, lower-case
 +
* if char "white space"
 +
* if char "punctuation"
 +
* etc.
  
----
+
Also it has class to read/write Unicode from/to TStream.
  
= UTF-8 Tools =
+
= Units =
== Purpose ==
+
== Using streams ==
Some tools for common problems with UTF-8 / Unicode.
 
  
* charencstreams.pas: Load and save data from almost any text source like
+
Unit "charencstreams": load/save data from/to almost any text source:
** ansi, UTF8, UTF16, UTF32
+
 
** big or little endian
+
* ANSI, UTF8, UTF16, UTF32
** BOM or no BOM
+
* big-endian, little-endian
 +
* with/without BOM
 
      
 
      
''Simple demo:''
+
Demo:
fCES := TCharEncStream.Create;
 
fCES.LoadFromFile(OpenDialog1.FileName);
 
Memo1.text := fCES.UTF8Text; 
 
fCES.free;
 
  
 +
<syntaxhighlight lang="pascal">
 +
f := TCharEncStream.Create;
 +
f.LoadFromFile(OpenDialog1.FileName);
 +
Memo1.Text := f.UTF8Text; 
 +
f.Free;
 +
</syntaxhighlight>
  
 +
== Character info ==
  
* character.pas: Get Information about code points using the TCharacter class.
+
Unit "character": get information about code points using the TCharacter class. Demo:
  
''Demo''
+
<syntaxhighlight lang="pascal">
  if TCharacter.IsLetter(s[i]) then s[i] := TCharacter.toLower(s[i]);
+
  if TCharacter.IsLetter(s[i]) then
 +
    s[i] := TCharacter.toLower(s[i]);
 +
</syntaxhighlight>
  
  
 +
== Access UTF-8 by code index ==
  
* utf8scanner.pas: Access UTF-8 strings by code index, use case statements on UTF-8 strings and more...
+
Unit "utf8scanner": access UTF-8 strings by code index, use case statements on UTF-8 strings and more. Demo:
  
''Index demo''
+
<syntaxhighlight lang="pascal">
  s := TUTF8Scanner.Create(Memo1.text);
+
  s := TUTF8Scanner.Create(Memo1.Text);
 
  for i := 1 to s.Length do
 
  for i := 1 to s.Length do
if TCharacter.IsLetter(s[i]) then s[i] := TCharacter.toLower(s[i]);
+
  if TCharacter.IsLetter(s[i]) then
 +
    s[i] := TCharacter.toLower(s[i]);
 
  Memo1.Text := s.UTF8String;
 
  Memo1.Text := s.UTF8String;
  s.free;
+
  s.Free;
 +
</syntaxhighlight>
  
''Case demo''
+
Case demo:
   s := TUTF8Scanner.Create(Memo1.text);
+
 
 +
<syntaxhighlight lang="pascal">
 +
   s := TUTF8Scanner.Create(Memo1.Text);
 
   s.FindChars := 'öäü';
 
   s.FindChars := 'öäü';
 
   repeat
 
   repeat
Line 49: Line 70:
 
   until s.Done;
 
   until s.Done;
 
   Memo1.Text := s.UTF8String;
 
   Memo1.Text := s.UTF8String;
   s.free;  
+
   s.Free;
 +
</syntaxhighlight>
 +
 
 +
= Download =
 +
 
 +
[http://www.theo.ch/lazarus/utf8tools.zip Download utf8tools.zip]
  
== Download ==
+
[[Category:Unicode]]
[http://www.theo.ch/lazarus/utf8tools.zip Donwload utf8tools.zip]
 

Latest revision as of 16:48, 12 December 2018

English (en) русский (ru)

About

This code allows to process Unicode text and determine for unicode chars:

  • if char "letter"
  • if char "digit"
  • if char upper-case, lower-case
  • if char "white space"
  • if char "punctuation"
  • etc.

Also it has class to read/write Unicode from/to TStream.

Units

Using streams

Unit "charencstreams": load/save data from/to almost any text source:

  • ANSI, UTF8, UTF16, UTF32
  • big-endian, little-endian
  • with/without BOM

Demo:

 f := TCharEncStream.Create;
 f.LoadFromFile(OpenDialog1.FileName);
 Memo1.Text := f.UTF8Text;  
 f.Free;

Character info

Unit "character": get information about code points using the TCharacter class. Demo:

 if TCharacter.IsLetter(s[i]) then
    s[i] := TCharacter.toLower(s[i]);


Access UTF-8 by code index

Unit "utf8scanner": access UTF-8 strings by code index, use case statements on UTF-8 strings and more. Demo:

 s := TUTF8Scanner.Create(Memo1.Text);
 for i := 1 to s.Length do
   if TCharacter.IsLetter(s[i]) then
     s[i] := TCharacter.toLower(s[i]);
 Memo1.Text := s.UTF8String;
 s.Free;

Case demo:

  s := TUTF8Scanner.Create(Memo1.Text);
  s.FindChars := 'öäü';
  repeat
    case s.FindIndex(s.Next) of
  {ö} 0: s.Replace('oe');
  {ä} 1: s.Replace('ae');
  {ü} 2: s.Replace('ue');
    end;
  until s.Done;
  Memo1.Text := s.UTF8String;
  s.Free;

Download

Download utf8tools.zip