Difference between revisions of "AnsiString"

From Free Pascal wiki
Jump to navigationJump to search
(Created AnsiString English page)
 
m (Alextpp moved page Ansistring to AnsiString)
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{LanguageBar}}
 
{{LanguageBar}}
  
The [[Data_type|data type]] [[Ansistring|AnsiString]] has no size limit and comprises an array of [[Char]] (1 byte per character).
+
'''<syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight>''' is a variable-length string [[Data type|data type]].
 +
It can store characters that have a size of one Byte.
  
Definition of a data field of the AnsiString type:
+
== implementation ==
 +
In [[FPC]] an <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> is implemented as a [[Pointer|pointer]].
 +
It is a managed data type.
 +
As such it is initialized with [[Nil|<syntaxhighlight lang="pascal" inline>nil</syntaxhighlight>]] as soon as it enters the scope.
 +
Memory for the character sequence is dynamically allocated and freed.
  
<syntaxhighlight lang="pascal">
+
An <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> points to the ''first'' character.
  Var
+
This facilitates interfacing to libraries or foreign functions expecting [[PChar|<syntaxhighlight lang="pascal" inline>pChar</syntaxhighlight> strings]].
    a: AnsiString;
+
For that, an <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> always concludes with a null [[Byte]].
</syntaxhighlight>
+
''In [[Pascal]]'', this terminating null Byte has no significance as to the string’s value (including its length).
 +
An <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> always entails some management data ''before'' the first character.
 +
These are
 +
* a code page
 +
* the size of a character
 +
* a reference count
 +
* the length of the string.
 +
{| class="wikitable" style="text-align: center; margin: auto;"
 +
| <syntaxhighlight lang="pascal" inline>253</syntaxhighlight>
 +
| <syntaxhighlight lang="pascal" inline>233</syntaxhighlight>
 +
|  <syntaxhighlight lang="pascal" inline>0</syntaxhighlight> 
 +
|  <syntaxhighlight lang="pascal" inline>1</syntaxhighlight> 
 +
|  <syntaxhighlight lang="pascal" inline>0</syntaxhighlight> 
 +
|  <syntaxhighlight lang="pascal" inline>0</syntaxhighlight> 
 +
|  <syntaxhighlight lang="pascal" inline>0</syntaxhighlight> 
 +
|  <syntaxhighlight lang="pascal" inline>1</syntaxhighlight> 
 +
|  <syntaxhighlight lang="pascal" inline>0</syntaxhighlight> 
 +
|  <syntaxhighlight lang="pascal" inline>0</syntaxhighlight> 
 +
|  <syntaxhighlight lang="pascal" inline>0</syntaxhighlight> 
 +
|  <syntaxhighlight lang="pascal" inline>3</syntaxhighlight> 
 +
| <syntaxhighlight lang="pascal" inline>'F'</syntaxhighlight>
 +
| <syntaxhighlight lang="pascal" inline>'o'</syntaxhighlight>
 +
| <syntaxhighlight lang="pascal" inline>'o'</syntaxhighlight>
 +
|  <syntaxhighlight lang="pascal" inline>#0</syntaxhighlight>
 +
|-
 +
! colspan="2" | code page
 +
! colspan="2" | maximum character size
 +
! colspan="4" | reference count
 +
! colspan="4" | length
 +
! colspan="3" | payload
 +
! complimentary Null
 +
|-
 +
| colspan="13" style="text-align: right;" | pointer points here&nbsp;⤴&nbsp;&nbsp;
 +
| colspan="3" |
 +
|+ <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> memory layout sample ([[32 bit|32-bit]] platform)
 +
|}
 +
Only the length field has significance in Pascal.
 +
In Pascal, an <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> may contain <syntaxhighlight lang="pascal" inline>#0</syntaxhighlight> characters.
  
Examples of valid value assignment:
+
An <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> can furthermore be associated with a code page (since [[FPC New Features 3.0.0#Support for codepage-aware strings|3.0.0]]).
  
<syntaxhighlight lang="pascal">
+
== application ==
    a: = '0123ABCabc456';
+
The data type <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> can be used like any other string data type.
    a: = a + '! "§ $% & / () =?';
+
You may [[Becomes|assign]] string literals to an <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> variable as normal.
    a: = a + IntToStr (45);
+
String values can be compared ([[Equal|<syntaxhighlight lang="pascal" inline>=</syntaxhighlight>]]) just as usual.
</syntaxhighlight>
+
The entire pointer-characteristic is transparent.
  
Examples of invalid value assignment:
+
Characters in <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> have a 1-based index.
 +
<syntaxhighlight lang="pascal" inline>myAnsiString[1]</syntaxhighlight> refers to the first character.
 +
{{Note|The linear character index is only guaranteed to work for strings that have a maximum character size of <syntaxhighlight lang="pascal" inline>1</syntaxhighlight>. That means, using an integer index for example on an UTF-8 encoded string (not exclusively containing ASCII characters) will produce erroneous results.}}
  
<syntaxhighlight lang="pascal">
+
The {{Doc|package=RTL|unit=system|identifier=length|text=<syntaxhighlight lang="pascal" inline>length</syntaxhighlight> function}}, and for that matter also <syntaxhighlight lang="pascal" inline>high</syntaxhighlight>, will return a string’s length by examining the length data field.
  a: = True;
 
  a: = 4;
 
</syntaxhighlight>
 
  
In the last two examples, the value has not been converted to the AnsiString type.
+
Because an <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> is essentially a pointer, ''copying'' strings of this type is fast, since only the reference is copied and the reference count increased.
 +
Modifications may trigger a <abbr title="copy-on-write">COW</abbr>.
  
[[Category:Data_types]]
+
== caveats ==
 +
* The compiler directive [[$H|<syntaxhighlight lang="pascal" inline>{$longStrings on}</syntaxhighlight>]] (or <syntaxhighlight lang="pascal" inline>{$H+}</syntaxhighlight>) [[Defensive programming techniques#Do you know your String Type? Really?|aliases <syntaxhighlight lang="pascal" inline>string</syntaxhighlight>]] (without a specified length) to <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight>.
 +
* <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> as a ''managed'' data type introduces a certain overhead. See [[Avoiding implicit try finally section]] for more explanations.
 +
* The [[SizeOf|<syntaxhighlight lang="pascal" inline>sizeOf</syntaxhighlight>]] value of an <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> variable is merely the size of a pointer.
 +
* Assigning an empty string <syntaxhighlight lang="pascal" inline>''</syntaxhighlight> to an <syntaxhighlight lang="pascal" inline>AnsiString</syntaxhighlight> variable will in fact assign <syntaxhighlight lang="pascal" inline>nil</syntaxhighlight> to the variable and, if the reference count hit zero, release underlying memory (if any was previously allocated at all). Empty strings are ''not'' [[#implementation|stored as described above]].
 +
 
 +
== see also ==
 +
* [[Character and string types]]
 +
* [https://www.freepascal.org/docs-html/current/ref/refsu9.html#x32-370003.2.4 Ansistrings] in the reference guide
 +
* [https://www.freepascal.org/docs-html/current/prog/progsu161.html#x205-2160008.2.7 Ansistrings] in the programmer’s guide
 +
 
 +
[[Category: Data types]]

Latest revision as of 13:27, 22 December 2023

English (en)

AnsiString is a variable-length string data type. It can store characters that have a size of one Byte.

implementation

In FPC an AnsiString is implemented as a pointer. It is a managed data type. As such it is initialized with nil as soon as it enters the scope. Memory for the character sequence is dynamically allocated and freed.

An AnsiString points to the first character. This facilitates interfacing to libraries or foreign functions expecting pChar strings. For that, an AnsiString always concludes with a null Byte. In Pascal, this terminating null Byte has no significance as to the string’s value (including its length). An AnsiString always entails some management data before the first character. These are

  • a code page
  • the size of a character
  • a reference count
  • the length of the string.
253 233  0   1   0   0   0   1   0   0   0   3  'F' 'o' 'o'  #0
code page maximum character size reference count length payload complimentary Null
pointer points here ⤴  
AnsiString memory layout sample (32-bit platform)

Only the length field has significance in Pascal. In Pascal, an AnsiString may contain #0 characters.

An AnsiString can furthermore be associated with a code page (since 3.0.0).

application

The data type AnsiString can be used like any other string data type. You may assign string literals to an AnsiString variable as normal. String values can be compared (=) just as usual. The entire pointer-characteristic is transparent.

Characters in AnsiString have a 1-based index. myAnsiString[1] refers to the first character.

Light bulb  Note: The linear character index is only guaranteed to work for strings that have a maximum character size of 1. That means, using an integer index for example on an UTF-8 encoded string (not exclusively containing ASCII characters) will produce erroneous results.

The length function, and for that matter also high, will return a string’s length by examining the length data field.

Because an AnsiString is essentially a pointer, copying strings of this type is fast, since only the reference is copied and the reference count increased. Modifications may trigger a COW.

caveats

  • The compiler directive {$longStrings on} (or {$H+}) aliases string (without a specified length) to AnsiString.
  • AnsiString as a managed data type introduces a certain overhead. See Avoiding implicit try finally section for more explanations.
  • The sizeOf value of an AnsiString variable is merely the size of a pointer.
  • Assigning an empty string '' to an AnsiString variable will in fact assign nil to the variable and, if the reference count hit zero, release underlying memory (if any was previously allocated at all). Empty strings are not stored as described above.

see also