Chapter 12. String Manipulation

Table of Contents

1. String Comparison
1.1. The Compare Method
1.2. The ansi.strcmp method
2. String Search within a String Set
3. Substring Search
3.1. The Contains Method
3.2. The ansi.strstr Method
3.3. Searching at the Beginning or the End
3.4. Position of a Substring
3.5. The IsEMail Method
3.6. The IsDate Method
3.7. The IsNumeric Method
4. String Transformation
4.1. Highlighting Substrings
4.2. Character Case Modification
4.3. Substring Extraction
4.4. Substring Insertion and Elimination
4.5. Replacement
4.6. String to Array Conversion
4.7. Addition of Characters (Padding)
4.8. Removing Superfluous Space
5. String Encoding and Decoding
5.1. ISO 8859-1 Format Encoding
5.2. Encoding of Special Characters in a URL
6. Number to String Conversion
7. The Eval Function

Biferno provides a rich variety of string manipulation functionality. In this chapter we analyze the most significant methods implemented by the string and ansi classes. A detailed list of all methods and properties of Biferno predefined classes is contained in "Biferno: Reference Guide".

1. String Comparison

Comparing two string means to compare one by one their characters by starting from the first (the leftmost), until either two different characters are found, or the end of one of the two strings is reached.

Two strings are considered equal if they consist of the same characters in the same sequence. If two different characters are found in a corresponding position, the relationship between these two characters determines the relationship between the strings.

A character is considered "less" than another character if it precedes it in the ASCII character table. A string is considered "less" than another if the first different character is "less" than the corresponding character in the other string. If the first characters of the longest string are exactly the same as the shortest string, the longest string is always considered greater. Notice that the result of character comparison for characters with ASCII code greater than 128 can depend on the operating system. These characters can assume different values on different systems.

The simplest method to compare two Biferno strings is to use logical operators. Using logical operators the comparison is case sensitive, i.e. the two strings "sun" and "Sun" are considered different.

1.1. The Compare Method

An alternative is to use the Compare method of the string class, with prototype:

int Compare(string str, boolean caseSense = false)
     

This method takes as parameters the string to compare to and a Boolean value to indicate if the comparison should be case sensitive or not (the default is false). The method returns an integer value, which can be 0 (zero), if the two strings are equal, 1, if the string to be compared is greater, or -1 if the string to be compared is smaller. An example is:

<?
	str1 = "a"
	str2 = "b"
	$str1.Compare(str2) // This instruction prints the value 1
	str1 = "sun"
	str2 = "Sun"
	$str1.Compare(str2) // This instruction prints the value 0
	$str1.Compare(str2, true) // This instruction prints the value -1
?>
     

1.2. The ansi.strcmp method

A third way of comparing two strings is to use the strcmp method of the ansi class:

static int strcmp(string str1, string str2)
     

This method is called statically and takes as parameters the two strings to be compared. The result of the comparison is an integer value, which can be 0 (zero), if the two strings are equal, a positive value if the first string is greater than the second, or a negative value if the first string is smaller than the second. If the result is non-zero, its value is the difference between the ASCII codes of the first two characters that differ between the strings.

In the following example the two strings str1 and str2 differ starting from the second character. The strcmp returns the value -14, which indicates that str1 is smaller than str2, and is the difference between the ASCII code of the a character (97) in str1 and the ASCII code of the corresponding character (o, code 111) in str2.

<?
	str1 = "salty"
	str2 = "solitary"
	$ansi.strcmp(str1, str2) // The instruction prints the value -14
?>
     

The comparisons executed by methods of the ansi class are always case sensitive.

2. String Search within a String Set

It is sometimes necessary to compare a string at the same time with several different strings. This is the same as asking if the string is contained in a given string set. To this end the string class provides the In method with prototype:

boolean In(string str, char sep = ",")
    

The string set is passed as a single string in which the individual strings of the set are separated by a special character, called a separator. The default separator is the comma. An example is:

<?
	if (user.GetUsername().In("john,paul,george,ringo"))
		print("<p>Welcome " + user.firstName + "</p>\n")
	else
		print("<p>Unknown user. Access denied.</p>\n")
?>
    

The wildcard character * (star) can be used in the string set. In the following example the In method returns the true value if the keyword string either starts by "comp", or is "software", or is "hardware".

<?
	if (keyword.In("comp*,software,hardware"))
		category = 1
?>
    

3. Substring Search

3.1. The Contains Method

To establish if a string contains another string we use the Contains method of the string class, with prototype:

boolean Contains(string str, boolean caseSense = false)
      

This method returns true if the str string is contained in the string that the method is applied to, or false otherwise.

<?
	str = client.userAgent
	if (str.Contains("MSIE", true))
		print("Your browser is Internet Explorer")
?>
     

Variants of this method are:

  • ContainsWordBegin, that determines if the string that the method is applied to contains a word beginning with the string passed as parameter. A word is defined as a group of contiguous characters delimited by spaces or other separators (punctuation marks, tabs, etc.).

  • ContainsWordEnd, that determines if the string that the method is applied to contains a word ending with the string passed as parameter.

  • ContainsWordExact, that determines if the string that the method is applied to contains a word matching exactly the string passed as parameter.

3.2. The ansi.strstr Method

The ansi class provides the strstr method with prototype:

static string strstr(string str1, string str2)
     

This method searches for str2 in str1 and, if str2 is contained in str1, returns a string containing the substring of str1 that starts with str2 and reaches up to the last character of str1. In the following example the string str2 is "MSIE 5.0; Mac_PowerPC)":

<?
	str1 = "Mozilla/4.0 (compatible; MSIE 5.0; Mac_PowerPC)"
	str2 = ansi.strstr(str1, "MSIE")
?>
     

3.3. Searching at the Beginning or the End

If we need to check if a string starts or ends with another string, we can use the Begins and Ends methods of the string class, with prototypes:

boolean Begins(string str, boolean caseSense = false)
boolean Ends(string str, boolean caseSense = false)
     

3.4. Position of a Substring

It can be sometimes useful to know the position of the first character of a substring. We can use the Find method of the string class with prototype:

int Find(string str, boolean caseSense = false, int from)
     

The Find method returns the position of the first instance of the str string in the string the method is applied to, starting from the character position specified by the from parameter (the default is to start from the first character of the string); if the str string is not found the value 0 (zero) is returned.

<?
	str = "Biferno is an object oriented language"
	pos = str.Find("Biferno") // pos is 1
	pos = str.Find("lang") // pos is 31
	pos = str.Find("x") // pos is 0
?>
     

3.5. The IsEMail Method

The IsEMail method of the string class returns true if the string has the format of a valid email address.

<?
	userEmail = "john@domain.com"
	check_email = userEmail.IsEMail() // The method returns true
?>
     

The IsEmail method has the following prototype:

boolean IsEMail(boolean exists, string *msg)
     

If the exists parameter is true and the string is is a legally formatted email address, the IsEmail method contacts the mail exchange host for the corresponding email domain and verifies if a user corresponding to the supplied email address exists. The msg parameter will return an error message from the smtp server, if there is one.

Contacting an smtp server over the Internet to verify an email address is a slow operation and may take up to several seconds. On the other hand verification can be useful to avoid sending email to non existing addresses or to check that a user did supply an existing email address to a subscription form.

3.6. The IsDate Method

The IsDate method of the string class returns true if the string has the format of a valid date. The method has prototype:

      boolean IsDate(string format) 
     

The format parameter specifies the expected order of day, month, year (the default is the one defined by the application variable DATE_FORMAT). A dash (-), a forward slash (/), or the separator defined in DATE_FORMAT can be used as separators:

<?
	aDate = "24-10-2001"
	check_date = aDate.IsDate("d-m-y") // The method returns true
	check_date = aDate.IsDate("y-m-d") // The method returns false
?>
     

3.7. The IsNumeric Method

The IsNumeric method of the string class returns true if the string has the format of a valid number. The method has prototype:

boolean IsNumeric(void)
     

The methods recognizes the thousand separator (as defined by the application variable application THOUSAND_SEP or by the curScript.SetNumFormat method) and the decimal separator (as defined by the application variable application DECIMAL_SEP or by the curScript.SetNumFormat method).

<?
	/* Assume:
		thousand separator = '.'
		decimal separator = ','
	*/
	aNum = "30.000"
	check_num = aNum.IsNumeric() // The method returns true
	aNum = "300.00"
	check_num = aNum.IsNumeric() // The method returns false
?>
     

4. String Transformation

The methods of the string class described in this paragraph transform a string into another string, e.g. by eliminating some characters, extracting substrings, or replacing a group of characters with another. Remember that it is always possible to access the individual characters of a string (for reading or writing) using the char property, which contains the string representation as an array of characters.

4.1. Highlighting Substrings

The Hilite method searches for one or more strings in a text and inserts a given string before and after each occurrence of the string(s). This method has prototype:

string Hilite(boolean cs, boolean skipHTML, string pre, string post, obj strN...)
     

The cs parameter determines if the search should be case sensitive. The skipHTML parameter can be used to exclude from the search text areas that are HTML tags. The pre and post parameters provide the strings to be inserted before and after each occurrence of the string. The strN parameter provides one or more strings (or array of strings, or search, see Chapter 15, Database Interaction) to highlight. If an array is passed, all strings in the array are highlighted. Simple strings and strings arrays can be mixed.

<?
	str = "Welcome to the Biferno user manual"
	str = str.Hilite(false, true, "<b>", "</b>", "manual", "Biferno")
	print(str)
?>
     

This example generates the following HTML code:

Welcome to the <?b?>Biferno<?/b?> user <?b?>manual<?/b?>
     

Which produces the following output:

      Welcome to the Biferno user manual
     

The same example can also be written as:

<?
	str = " Welcome to the Biferno user manual"
	arrHilite = array("manual", "Biferno")
	str = str.Hilite(false, true, "<b>", "</b>", arrHilite)
	print(str)
?>
     

The pre and post strings can contain some special symbols that are replaced with the string str to be highlighted:

  • The ** characters are replaced by str.

  • The $$ characters are replaced by str coded by the UrlEncode method (see the following section on string encoding).

  • The ## characters are replaced by str with ISO Latin encoding (see the following section on string encoding).

<?
	str = " Welcome to the Biferno user manual "
	str = str.Hilite(false, true, "<a href=\"http://www.tabasoft.it/**/\">", 
		"</a>", "Biferno")
	print(str)
?>
     

This example generates the following HTML code:

Welcome to the
<a href="http://www.tabasoft.it/biferno/">Biferno</a> user manual
     

Which produces the following output:

      Welcome to the Biferno user manual
     

If a variable of the search class is passed to Hilite, all substrings contained in the search are highlighted (see Chapter 15, Database Interaction).

4.2. Character Case Modification

The LowToUpper and UpToLower methods operate on the character case, transforming lowercase characters into uppercase characters, and vice versa. These methods have prototypes:

string LowToUpper(int from, int len)
string UpToLower(int from, int len)
     

The from parameter indicates the position of the first character in the string that should be converted (the default value is 1). The len parameter limits the number of characters to convert. The default is to convert all characters starting from the position indicated by the from parameter until the end of the string.

<?
	str = "biferno"
	str = str.LowToUpper()
	print(str + "<br>\n")
	str = str.UpToLower(3, 3)
	print(str + "<br>\n")
	str = str.UpToLower("len":2)
	print(str + "<br>\n")
	str = str.UpToLower(6)
	print(str + "<br>\n")
?>
     

The example above produces the following result:

BIFERNO
BIferNO
biferNO
biferno

The Capitalize method transforms into uppercase the first character of all words contained in a string, where a word is a group of characters delimited by spaces, line breaks, tabulators, punctuation marks, quotes, parentheses, etc.

<?
	text = " The PEN is mightier than the sword"
	text = text.Capitalize()
	// text is "The Pen Is Mightier Than The Sword"
?>
     

Notice that the Capitalize method acts on all characters of a word, not just on the first character.

4.3. Substring Extraction

The SubString method returns a substring of any length of the given string, starting from a given character index. The method has prototype:

string SubString(int from, int len)
     

The int and len parameters have a meaning similar to the one described for the LowToUpper and UpToLower methods, i.e. the method extracts len characters starting from the position indicated by the from parameter. If the latter is omitted, all characters until the end of the string are extracted.

<?
	str = "This is a text"
	$str.SubString(1, 4) // prints "This"
	$str.SubString(6) // prints "is a text"
	$str.SubString(6, 2) // prints "is"
?>
     

4.4. Substring Insertion and Elimination

The InsertSubString method allows to insert a string within another string starting from a given character index. The method has prototype:

string InsertSubString(int pos, string subString)
     

E.g.:

<?
	str = "What a day!"
	str = str.InsertSubString(8, "nice ") // str is " What a nice 
day!"
?>
     

The RemoveSubString method removes a certain number of characters from a string starting from a given position and has prototype:

string RemoveSubString(int from, int len)
     

This method has the same parameters of the SubString method and returns the string obtained by removing len characters from the original string starting from the position with index from, as in:

<?
	str = "What a nice day!"
	str = str.RemoveSubString(8, 5) // str is "What a day!"
?>
     

4.5. Replacement

The Substitute method replaces a substring of the given string with another and has prototype:

string Substitute(string oldString, string newString, 
boolean cs = false, boolean skipHTML = false)
     

The cs and skipHTML parameters have the same meaning as in the Hilite method. The oldString and newString parameters are, respectively, the substring to search for and the string to be substituted.

<?
	str = "john,paul,george,ringo"
	str = str.Substitute(",","+") // str is "john+paul+george+ringo"
?>
     

4.6. String to Array Conversion

The ToArray method converts a string into an array of strings. The array elements are substrings of the original string which are extracted if delimited by the specified separator. The original string is left unmodified. This method has prototype:

array ToArray(string separator=", ")
      

The default separator value is ", ", i.e. a comma followed by a space. In the following example the ToArray method applied to the str string generates an array of four elements containing the strings "john", "paul", "george", "ringo" in this order.

<?
	str = "john,paul,george,ringo"
	myArray = str.ToArray(",")
	$myArray[2] // prints the string "paul"
?>
     

4.7. Addition of Characters (Padding)

The Pad method is used to add a certain number of repetitions of a given character in front or at the end of a string, until a predefined length is reached. This method has the following prototype:

string Pad(int totChars, char padChar, boolean before = false)
      

An example is:

<?
	str = "123"
	str = str.Pad(8, "0", true) // str is "00000123"
	str = "123"
	str = str.Pad(6, "*") // str is "123***"
?>
     

4.8. Removing Superfluous Space

The Zap method removes all white space at the beginning and at the end of a string . This method has no parameters.

<?
myString = "   John Smith "
myString = myString.Zap() // myString is "John Smith"
?>
     

5. String Encoding and Decoding

5.1. ISO 8859-1 Format Encoding

The Encode method encodes a string according to the ISO 8859-1 format (Latin-1). More precisely, Biferno uses the ANSI character set (Windows-1252), which is an extension of the ISO 8859-1 set. E.g., the À character is encoded as &#224; (ampersand + hash + decimal numeric code of the character + semicolon). This encoding is necessary to be able to output special characters on a Web page, such as vowels with accents, and symbols that might otherwise not be visualized correctly by the browser (this is a potential issue on the MacOS platform).

The Encode method has the following prototype:

string Encode(boolean alsoCR=false, boolean tagsVisible=false, 
	boolean entities=false, obj tagList)
     

The alsoCR parameter specifies if we want to encode new line characters with <br> tags (default: no). The tagsVisible parameters indicates if we want to encode the < and > that delimit HTML tags with &#60; and &#62; (default: no). This is useful if one wants to visualize a fragment of HTML code within a Web page avoiding interpretation by the browser. The < and > characters are always encoded when they are not part of an HTML tag. The extTags parameter determines what constitutes a tag. When tagList is true, all words introduced by the < character are valid tags (as in XML). When extTags is false, only HTML tags (such as <b>, <body>, etc.) are considered tags. If an associative array is passed as the value of the extTags parameter, the names of the elements of the associative array determine what is considered a tag. Notice that the actual values of the elements of the associative array are ignored. Finally, the entities parameters specifies if we want that encoding is performed using the so-called HTML entities instead of numerical codes (e.g. the à character is encoded as &agrave; and not as &224;).

In the following example we demonstrate how the effect of the Encode method changes by changing the values of these parameters:

<?
	str = "<p>Letters with accents: à, è, ì, ò, ù\nSpecial characters: 
\", &, ©, ℗</p>"
	$str.Encode() + "<br>\n"
	$str.Encode(true) + "<br>\n"
	$str.Encode(false, true)
?>
     

This example generates the following HTML code:

<p>Letters with accents: &#224;, &#232;, &#236;, &#242;, &#249;
Special characters: &#34;, &#38;, &#169;, &#174;</p><br>
<p>Letters with accents: &#224;, &#232;, &#236;, &#242;, &#249;
<br>Special characters: &#34;, &#38;, &#169;, &#174;</p><br>
&#60;p&#62;Letters with accents: &#224;, &#232;, &#236;, &#242;,&#249;
Special characters: &#34;, &#38;, &#169;, &#174;&#60;/p&#62;
     

which in turn generates the following output:

Letters with accents: à, è, ì, ò, ù Special characters: ", &, ©, ℗


Letters with accents: à, è, ì, ò, ù
Special characters: ", &, ©, ℗


<p> Letters with accents: à, è, ì, ò, ù Special characters: ", &, ©, ®<p>

     

The Decode method allows to decode a string encoded in ISO 8859-1 format and has the following prototype:

string Decode(boolean alsoCR = false)
     

The alsoCR parameter indicates if we want to convert <br> tags into new line characters (default: no).

5.2. Encoding of Special Characters in a URL

A URL, acronym of Uniform Resource Locator, is a convention to describe a resource available on the Internet. The UrlEncode method applies to a string the encoding rules used in constructing a URL. All non-alphanumeric special characters and spaces are replaced by the percent character (%) followed by a two-digit hexadecimal code (e.g. the à character is replaced by %E0). This encoding is necessary to avoid misinterpretation of special characters in a URL during network transmission.

This method has the following prototype:

string UrlEncode(boolean spaceToPlus, string pre)
     

The spaceToPlus parameter specifies if we want to replace space characters with %20 (standard encoding) or with the + character (default: no). The pre parameter provides the string to use in front of hexadecimal character codes (default: "%").

Let's see a couple of examples of use of the UrlEncode method.

<?
	str = "This string contains the special characters: / $ & %"
	$str.UrlEncode()
?>
     

This code fragment produces the following output:

This%20string%20contains%20the%20special%20characters%3A%20%2F%20%24%20%26%20%25
      

It is often useful to encode a string in order to pass it a parameter in a URL by using UrlEncode:

<?
	city = "Mexico City"
	url = "http://www.xyz-travels.com/search.bfr?dest=" + city.UrlEncode()
?>
     

In this example the city string is converted to : "Mexico%20City".

The UrlDecode method allows to decode a URL-encoded string and has prototype:

string UrlDecode(boolean plusToSpace, string pre)
     

The plusToSpace parameter specifies if we want to replace + characters in the string with spaces (default: no). The pre parameter specifies the string in front of hexadecimal character codes in the text to be decoded (default: "%").

The pre parameter can be used to execute other types of encoding. E.g. some Javascript calls require an encoding with a "\x" prefix string, as in "\x2E" (notice that the backslash character must be escaped using another backslash character in Biferno to avoid its interpretation as a special character), and sometimes in emails the text must be coded using a "=" prefix string, as in "=2E".

6. Number to String Conversion

We have described implicit conversion methods that allow automatic typecast from numbers (integers or with a decimal part) into strings. This kind of conversion is performed using an internal default format for numerical strings.

When it is necessary to convert a number into a string using a format other than the default, we can use the ToString method of the primitive numeric classes (int, long, double, ecc.), with prototype:

string ToString(boolean wantThousandSep = false, int decimals = 2, 
boolean cutRightZero = true)
    

It is possible to specify if we want a thousand separator (default: no), the number of digits after the comma (default: 2), and if the decimal part should be padded with zeroes to reach the required length (default: no).

The ToString method should not be confused with the tostring method (see Chapter 11, User classes), which allows automatic string conversion for a user class (remember that identifiers in Biferno are case sensitive).

The curScript.SetNumFormat static method allows to specify for a single current script within an application the decimal and thousand separators to be used during string conversion, temporarily replacing the application defaults defined in the "Biferno.config.bfr" file (THOUSAND_SEP and DECIMAL_SEP). This method has prototype:

void curScript.SetNumFormat(char thousSep, char decimSep)
    

The following example clarifies the use of these methods.

<? 
a = 1234.567 // a is of classe double 
b = a.ToString() //b is "1234,57" – notice rounding 
c = a.ToString(true) //c is "1.234,57" 
d = a.ToString(true, 1) //d is "1.234,6" 
curScript.SetNumFormat(",",".") 
e = a.ToString(true, 5) //e is "1,234.567" 
f = a.ToString(true, 5, false) //f is "1,234.56700" 
?>
    

7. The Eval Function

The Eval function processes a string of text containing Biferno code. The prototype is:

string	Eval(string textToEval, boolean resume)
     

We can write:

<?
	textToEval = "a = 3"
	Eval(textToEval)
	$a
?>
    

Line 3 of the example will print the value of the variable a, which is 3. The a variable has been defined and assigned the value 3 when the text passed to the Eval function was processed. Notice that the textToEval string does not start with the "<?" characters. This is because the Eval implicitly assumes a "<?" tag before processing the text. If the textToEval string contains plain text, the text would have to be prefixed with the "?>" characters. An example is:

<?
	textToEval = "a = 3?><b>Hello Word</b>"
	result = Eval(textToEval)
	$result
?>
    

Notice that this behavior is different from the include behavior. In an included file, before writing Biferno code, the "<?" must be explicitly used.

The last example also shows the meaning of the return variable of the function, which contains the entire text sent as output during execution. The first example generated no output, and the function returned the empty string. In the second example the result string contains the text: "<b>Hello Word</b>".

A possible use of the Eval function is to process a text before sending it via email. The following code sends three emails using a text file that is interpreted via the Eval function:

<?
	email_host = "mailserver.mydomain.com"
	email_from = "me@mydomain.com"
	email_to = "him@hisdomain.com"
	userArr = array("John", "Bob", "Carl") 
	for (i = 1; i <= 3; i++)
	{	username = userArr[i]
		email_text = "Subject: SendMail Test \r\n\r\n"
		email_text += Eval(file("myMailBody.bfr").Get())
		status = smtp.SendMail(email_host, email_from, email_to, email_text)
	}
?>
    

The text file is a template containing variable parameters that are subject to change each time the code is run (in particular the username variable). The content of the file could look like:

Dear $username$,
Your subscription has expired!
    

The Eval function can also be used to invoke a function (or class member) whose name is contained in a variable and is not known in advance, as in the following example:

<?
	// myFunc is a value passed to the script (e.g. the string "Encode")
	myString = "John & Co."
	textToEval = "print(myString." + myFunc + "())"
	// textToEval is "print(myString.Encode())"
	result = Eval(textToEval)
?>
    

After execution of this script, assuming the string "Encode" has been passed to the script in the myFunc variable, the result variable will have the value: "John &#38; Co.". Assuming the string "UrlEncode" was passed, the value would be "John%20%26%20Co%2E".

Notice that the textToEval must contains the print command (or $, $$) to output the result and therefore to be able to retrieve it from the result variable.

What happens if the text passed to the Eval function (textToEval string) generates an error? In this case the value of the third parameter (resume, left to its default value, false, in the previous examples) is crucial.

If an error is generated the following applies:

resume is false:

We can control if Eval will interrupt our script using the error.Resume call in the textToEval string with the error handling rules that will be described in Chapter 16, Error Handling and Debugging. Notice that, as we will discuss in that chapter, some errors will interrupt code execution even if error.Resume has been called, as e.g. the Err_BadSyntax error. In any case, the code line after the call to the Eval function the global variable global err will contain the code of the generated error.

resume is true:

The Eval function will interrupt the execution of the text contained in textToEval according to the error handling rules, but, instead of interrupting the execution of the calling script upon an error, will return the name of the generated error in the return string. In any case, on the code line after the call to the Eval function the global variable global err will contain the code of the generated error.