ISO Pascal is defined using an alphabet of symbols all of which can be represented with ASCII. Vector Pascal uses Unicode to permit a wider range of symbols to be used in programs.
Programs should be submitted to the compiler in UTF-8
encoded Unicode. Since the 7 bit ASCII is a subset of UTF-8, all valid ASCII
encoded Vector Pascal programs are also valid UTF-8 programs.
ISO Pascal allows the Latin letters A-Z to be used in identifiers. Vector Pascal extends this by allowing letters from the Greek, Cyrillic, Katakana and Hiragana character sets.
Alphabet |
Position |
glyph |
code |
Greek |
low |
Α |
0391 |
|
high |
Ω |
03a9 |
Cyrillic |
low |
А |
0410 |
|
high |
Я |
042f |
Katakana |
low |
ァ |
30a0 |
|
high |
ヺ |
30fa |
Hiragana |
low |
ぁ |
3041 |
|
high |
ゔ |
0394 |
Treatment of identifiers is case indifferent, in that upper case and lower case versions of a given letter are treated as equivalent. Thus the identifier “δ” may al so be written “Δ”. Identifiers drawn from these alphabets can be strings of letters or digits starting with a letter.
Vector Pascal allows the use of Ideographs drawn from the unified Chinese, Japanese and Korean sets (Unicode range 4e00-9fff) to act as identifiers.
When using Unicode certain mathematical operations that are encoded as a sequence of ASCII characters can be represented as a single Unicode character.
Operation |
ASCII
form |
Extended
form |
Unicode |
Set
membership |
in |
∈ |
2208 |
Assignment |
:= |
← |
2190 |
Integer
division |
div |
÷ |
00f7 |
Nary
summation |
\+ |
∑ |
2211 |
Nary
product |
\* |
∏ |
220f |
Square
root |
sqrt |
√ |
221a |
Less
than or equal |
<= |
≤ |
2264 |
Greater
than or equal |
>= |
≥ |
2265 |
Not
equal |
<> |
≠ |
2260 |
Negation
|
not |
¬ |
00ac |
Logical
and |
and |
∧ |
2227 |
Logical
or |
or |
∨ |
2228 |
Multiplication |
* |
✕ |
2715 |
Index
generation |
iota |
⍳ |
2373 |
The following shows the use of Unicode operators in place of the Ascii
ones used on previous releases of Vector Pascal.
Program proddemo;
{ prints the product and square root of the integers
1..5 }
Var a:array[1..5] of Integer; Y:integer;
Begin
{
unicode version}
a ← ⍳ 0; { form
integers from 1 to 5 }
Writeln(a);
Y ←∏ a; { get their
product }
Writeln( y, √y);
{ now using ascii }
a:= iota 0;
writeln(a);
y:= \ * a;
writeln (y, sqrt(y));
End.
The built in char type in Pascal is represented with 16 bits in Vector Pascal. This allows any Unicode character to be handled.
Strings are held as arrays of char with a length word held in the first character. This is a simple extension of the mechanism used in Turbo Pascal. It potentially allows strings to be up to 65535 characters long. The type STRING written without a length specification stands for a string of length MAXSTRING.
When reading strings or characters from a text file, conversion is automatically performed from utf-8 to Unicode format.
When characters or strings are output to a text file, they are converted from the internal Unicode format to the utf-8 format.
Here is an example program to print out a page of the Unicode character set. It illustrates the use of Unicode for variable names, Unicode within strings, and the manipulation of 16 bit characters.
Program
printUnicode;
Procedure
printpage(p:integer);
Var
σ:string; { a
Greek variable name}
c:char;
I,j:integer;
Begin
Writeln(p);
For I:=0 to 15 do
Begin
σ :=’- ↣ - ‘; { Unicode arrow symbol in string}
For j:=0 to 15 do
{
concatenate a 16 bit char onto a string }
σ:=σ+ chr(j+ 16*I
+ 256* p);
Writeln( σ);
End;
End;
Var
page:integer;
Begin
Write(‘
Unicode page:’);
Readln(page);
Printpage(page);
End.
post 16.1.06 hits
View
My Stats