Strings #1

#0 | #1 | #2 | #3 | #4 | #5

Programmers deal with all kinds of 'data', but it essentially boils down to a couple of types: strings and numbers. In these pages we will talk about strings and how to manipulate them.

Strings (and numbers) are known as 'scalar' data. A scalar data (variable) holds only one item. But that item could be a string or a number.

Scalar data in Perl is denoted by the character '$'. Perhaps to remind us that it is '$calar'.

Arrays on the other hand are known as 'list' items - they hold a list of values, such as a list of strings or a list of numbers.

Arrays are referred to by the character '@'. Perhaps to remind us that they are '@rrays'.

When we refer to a single item in an array, we use the '$' character since we are referring to a single scalar item in the array, such as $foo[4].

Variables in Perl are case sensitive. $thisvar is not the same as $Thisvar.

Strings are scalar data so a variable containing a string starts with $, as in '$This' or '$Name'.

We've seen one example of a string in the Arrays pages: '$_', but we will work with variables of our own creation for a bit.

As you will see, Perl has lots of ways to manipulate strings.

String Interpolation

In most programming languages, strings can be 'literal', or assigned to a variable. Literal strings contain actual characters that you want to print or do something with:
print "This is a literal string.".

Whereas, $MyString = "This is a string." this assigns a variable to a string.

Interpolation is the process of inserting a list or scalar value into another value ('interpreting' the variable).

Wrapping a string variable or pattern in double quotes activates Perl's variable interpolation, so that the VALUE of the variable is substituted, NOT the variable itself:
print "$MyString" and print qq{$MyString}
both print "This is a string.".


Perl has several methods for quoting text, depending on the circumstances. For example, if the string you are quoting contains double-quotes, how would you quote it? Likewise, a string containing single-quotes or an apostrophe can't be quoted using single-quotes.

To give you some choices, Perl provides the following alternatives to double and single-quotes:

GenericMeaningInterpolation
q//LiteralNone
qq//LiteralBackslash and variable
qw//Word listNone
m//Pattern matchBackslash and variable
s///SubstitutionBackslash and variable
y///TranslationBackslash only

Perl also allows us to use these delimiters in pairs, besides the backslash:

  • Parentheses ( )
  • Braces { }
  • Brackets [ ]
  • Angle brackets < >

Most strings in Perl are enclosed in double-quotes ("this is a string") but may also be enclosed in single quotes. There are exceptions and special cases, but in these pages we'll follow the status quo. If you're the curious kind, look up 'barewords' in a Perl site.

To begin, let's create a string variable to work with. In your text editor, with a new file opened, start with our 3-line Perl startup:

#!/usr/bin/perl
use strict;
use warnings;

Save the file now as 'strings1.pl' in the same directory as your 'arrays1.pl' and 'arrays2.pl' files.

This may be a good time to explain those 2nd and 3rd lines.

'use strict;' puts Perl into a more vigilant state when you run scripts. It checks several things for proper syntax and basically tightens up the rules - no slackers here. This means you have to be more explicit about some things. In our examples it affects how we declare new variables.

'use warnings;' tells us if we only use a variable once, or perhaps we've declared a variable twice in our block of code, plus several other things.

You will notice that each time we introduce a NEW variable, we use 'my' in front of it.

This helps in many ways, but most notably it catches spelling mistakes (yes programmers make spelling mistakes).

For example if we declare a new variable '$string1', then later in our script try to print '$sting1', Perl would stop with an error message. That saves a lot of head-scratching, believe me.

If you had not included both lines, and tried the same thing, Perl would gladly run with no errors. But your output would probably not be what you expected.

So, to be safe and sound, make sure ALL your scripts have those 2 lines at the beginning.

OK, rant over.

Now add the following lines:

my $string1 = "This a string to play with in Perl.";
print "$string1\n";

Yes, there is a minor typo in that string but we can fix it.

We can use a function called substr (meaning substring).

This function lets us pull out specific parts of a string, but it also lets us replace part of a string with another string.

For now, let's print the word 'string'.

print substr($string1,7,6);
print "\n";

This is almost self-explanatory (Perl is like that). The arguments to the 'substr' function are:

  1. the string to work on
  2. the offset - where to start getting the substring (remember starting at 0)
  3. the length - how many characters to grab
So in our example, we are going to print 6 characters, starting at position 7, from '$string1'. Which is exactly what we did.

Now to fix our little typo above. We need to add 'is' between 'This' and 'a'.

The 'substr' function will actually take 4 arguments. The last will be what we want to replace.

substr($string1,5,2,"is a ");
print "\n$string1\n";

Cool!

We're starting at position 5 of $string1 ('a'), and replacing 2 characters, with the string 'is a '.

Let's try another one - adding something to the end of a string. Some new code to add:

my $string2="01234567";
print "\n$string2\n";
substr($string2,8,2," is a string too!");
print "$string2\n\n";

The 'substr' function changes the original string - '$string2' was changed in this code.

You might want to experiment with this to get a good understanding of how it works. For example, how to add something to the beginning of a string?

In our next refreshment we talk more about string theory - no, not that one - the Perl string theory.