Difference between revisions of "UTF8 Tools"
From Free Pascal wiki
Jump to navigationJump to search(9 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
+ | {{LanguageBar|UTF8 Tools}} | ||
+ | |||
+ | __TOC__ | ||
+ | |||
== About == | == About == | ||
− | + | This code allows to process Unicode text and determine for unicode chars: | |
+ | |||
+ | * if char "letter" | ||
+ | * if char "digit" | ||
+ | * if char upper-case, lower-case | ||
+ | * if char "white space" | ||
+ | * if char "punctuation" | ||
+ | * etc. | ||
− | + | Also it has class to read/write Unicode from/to TStream. | |
− | = | + | = Units = |
− | == | + | == Using streams == |
− | |||
+ | Unit "charencstreams": load/save data from/to almost any text source: | ||
− | + | * ANSI, UTF8, UTF16, UTF32 | |
− | + | * big-endian, little-endian | |
− | + | * with/without BOM | |
− | * | ||
− | + | Demo: | |
− | |||
− | |||
− | |||
− | |||
+ | <syntaxhighlight lang="pascal"> | ||
+ | f := TCharEncStream.Create; | ||
+ | f.LoadFromFile(OpenDialog1.FileName); | ||
+ | Memo1.Text := f.UTF8Text; | ||
+ | f.Free; | ||
+ | </syntaxhighlight> | ||
+ | == Character info == | ||
− | + | Unit "character": get information about code points using the TCharacter class. Demo: | |
− | + | <syntaxhighlight lang="pascal"> | |
− | if TCharacter.IsLetter(s[i]) then s[i] := TCharacter.toLower(s[i]); | + | if TCharacter.IsLetter(s[i]) then |
+ | s[i] := TCharacter.toLower(s[i]); | ||
+ | </syntaxhighlight> | ||
+ | == Access UTF-8 by code index == | ||
− | + | Unit "utf8scanner": access UTF-8 strings by code index, use case statements on UTF-8 strings and more. Demo: | |
− | + | <syntaxhighlight lang="pascal"> | |
− | s := TUTF8Scanner.Create(Memo1. | + | s := TUTF8Scanner.Create(Memo1.Text); |
for i := 1 to s.Length do | for i := 1 to s.Length do | ||
− | + | if TCharacter.IsLetter(s[i]) then | |
+ | s[i] := TCharacter.toLower(s[i]); | ||
Memo1.Text := s.UTF8String; | Memo1.Text := s.UTF8String; | ||
− | s. | + | s.Free; |
+ | </syntaxhighlight> | ||
+ | |||
+ | Case demo: | ||
− | + | <syntaxhighlight lang="pascal"> | |
− | s := TUTF8Scanner.Create(Memo1. | + | s := TUTF8Scanner.Create(Memo1.Text); |
s.FindChars := 'öäü'; | s.FindChars := 'öäü'; | ||
repeat | repeat | ||
Line 50: | Line 70: | ||
until s.Done; | until s.Done; | ||
Memo1.Text := s.UTF8String; | Memo1.Text := s.UTF8String; | ||
− | s. | + | s.Free; |
+ | </syntaxhighlight> | ||
+ | |||
+ | = Download = | ||
+ | |||
+ | [http://www.theo.ch/lazarus/utf8tools.zip Download utf8tools.zip] | ||
− | + | [[Category:Unicode]] | |
− | [ |
Latest revision as of 17:48, 12 December 2018
│ English (en) │ русский (ru) │
About
This code allows to process Unicode text and determine for unicode chars:
- if char "letter"
- if char "digit"
- if char upper-case, lower-case
- if char "white space"
- if char "punctuation"
- etc.
Also it has class to read/write Unicode from/to TStream.
Units
Using streams
Unit "charencstreams": load/save data from/to almost any text source:
- ANSI, UTF8, UTF16, UTF32
- big-endian, little-endian
- with/without BOM
Demo:
f := TCharEncStream.Create;
f.LoadFromFile(OpenDialog1.FileName);
Memo1.Text := f.UTF8Text;
f.Free;
Character info
Unit "character": get information about code points using the TCharacter class. Demo:
if TCharacter.IsLetter(s[i]) then
s[i] := TCharacter.toLower(s[i]);
Access UTF-8 by code index
Unit "utf8scanner": access UTF-8 strings by code index, use case statements on UTF-8 strings and more. Demo:
s := TUTF8Scanner.Create(Memo1.Text);
for i := 1 to s.Length do
if TCharacter.IsLetter(s[i]) then
s[i] := TCharacter.toLower(s[i]);
Memo1.Text := s.UTF8String;
s.Free;
Case demo:
s := TUTF8Scanner.Create(Memo1.Text);
s.FindChars := 'öäü';
repeat
case s.FindIndex(s.Next) of
{ö} 0: s.Replace('oe');
{ä} 1: s.Replace('ae');
{ü} 2: s.Replace('ue');
end;
until s.Done;
Memo1.Text := s.UTF8String;
s.Free;