UTF8 Tools
From Free Pascal wiki
Jump to navigationJump to searchThe printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
│ English (en) │ русский (ru) │
About
This code allows to process Unicode text and determine for unicode chars:
- if char "letter"
- if char "digit"
- if char upper-case, lower-case
- if char "white space"
- if char "punctuation"
- etc.
Also it has class to read/write Unicode from/to TStream.
Units
Using streams
Unit "charencstreams": load/save data from/to almost any text source:
- ANSI, UTF8, UTF16, UTF32
- big-endian, little-endian
- with/without BOM
Demo:
f := TCharEncStream.Create;
f.LoadFromFile(OpenDialog1.FileName);
Memo1.Text := f.UTF8Text;
f.Free;
Character info
Unit "character": get information about code points using the TCharacter class. Demo:
if TCharacter.IsLetter(s[i]) then
s[i] := TCharacter.toLower(s[i]);
Access UTF-8 by code index
Unit "utf8scanner": access UTF-8 strings by code index, use case statements on UTF-8 strings and more. Demo:
s := TUTF8Scanner.Create(Memo1.Text);
for i := 1 to s.Length do
if TCharacter.IsLetter(s[i]) then
s[i] := TCharacter.toLower(s[i]);
Memo1.Text := s.UTF8String;
s.Free;
Case demo:
s := TUTF8Scanner.Create(Memo1.Text);
s.FindChars := 'öäü';
repeat
case s.FindIndex(s.Next) of
{ö} 0: s.Replace('oe');
{ä} 1: s.Replace('ae');
{ü} 2: s.Replace('ue');
end;
until s.Done;
Memo1.Text := s.UTF8String;
s.Free;