Whether it's named directly or indirectly, and whether it's in a variable, or an array element, or is just a temporary value, a scalar always contains a single value. This value may be a number, a string, or a reference to another piece of data. Or, there might even be no value at all, in which case the scalar is said to be undefined. Although we might speak of a scalar as "containing" a number or a string, scalars are typeless: you are not required to declare your scalars to be of type integer or floating-point or string or whatever.[9]
[9] Future versions of Perl will allow you to insert int, num, and str type declarations, not to enforce strong typing, but only to give the optimizer hints about things that it might not figure out for itself. Generally, you'd only consider doing this in tight code that must run very fast, so we're not going to tell you how to do it yet. Optional types are also used by the pseudohash mechanism, in which case they can function as types do in a more strongly typed language. See Chapter 8, "References" for more.
Perl stores strings as sequences of characters, with no arbitrary constraints on length or content. In human terms, you don't have to decide in advance how long your strings are going to get, and you can include any characters including null bytes within your string. Perl stores numbers as signed integers if possible, or as double-precision floating-point values in the machine's native format otherwise. Floating-point values are not infinitely precise. This is important to remember because comparisons like (10/3 == 1/3*10) tend to fail mysteriously.
Perl converts between the various subtypes as needed, so you can treat a number as a string or a string as a number, and Perl will do the Right Thing. To convert from string to number, Perl internally uses something like the C library's atof(3) function. To convert from number to string, it does the equivalent of an sprintf(3) with a format of "%.14g" on most machines. Improper conversions of a nonnumeric string like foo to a number count as numeric 0; these trigger warnings if you have them enabled, but are silent otherwise. See Chapter 5, "Pattern Matching", for examples of detecting what sort of data a string holds.
Although strings and numbers are interchangeable for nearly all intents, references are a bit different. They're strongly typed, uncastable pointers with built-in reference-counting and destructor invocation. That is, you can use them to create complex data types, including user-defined objects. But they're still scalars, for all that, because no matter how complicated a data structure gets, you often want to treat it as a single value.
By uncastable, we mean that you can't, for instance, convert a reference to an array into a reference to a hash. References are not castable to other pointer types. However, if you use a reference as a number or a string, you will get a numeric or string value, which is guaranteed to retain the uniqueness of the reference even though the "referenceness" of the value is lost when the value is copied from the real reference. You can compare such values or extract their type. But you can't do much else with the values, since there's no way to convert numbers or strings back into references. Usually, this is not a problem, because Perl doesn't force you to do pointer arithmetic--or even allow it. See Chapter 8, "References" for more on references.
Numeric literals are specified in any of several customary[10] floating-point or integer formats:
$x = 12345; # integer $x = 12345.67; # floating point $x = 6.02e23; # scientific notation $x = 4_294_967_296; # underline for legibility $x = 0377; # octal $x = 0xffff; # hexadecimal $x = 0b1100_0000; # binary
[10]Customary in Unix culture, that is. If you're from a different culture, welcome to ours!
Because Perl uses the comma as a list separator, you cannot use it to separate the thousands in a large number. Perl does allow you to use an underscore character instead. The underscore only works within literal numbers specified in your program, not for strings functioning as numbers or data read from somewhere else. Similarly, the leading 0x for hexadecimal, 0b for binary, and 0 for octal work only for literals. The automatic conversion of a string to a number does not recognize these prefixes--you must do an explicit conversion[11] with the oct function--which works for hex and binary numbers, too, as it happens, provided you supply the 0x or 0b on the front.
[11]Sometimes people think Perl should convert all incoming data for them. But there are far too many decimal numbers with leading zeros in the world to make Perl do this automatically. For example, the Zip Code for the O'Reilly & Associates office in Cambridge, MA, is 02140. The postmaster would get confused if your mailing label program turned 02140 into 1120 decimal.
String literals are usually surrounded by either single or double quotes. They work much like Unix shell quotes: double-quoted string literals are subject to backslash and variable interpolation, but single-quoted strings are not (except for \' and \\, so that you can embed single quotes and backslashes into single-quoted strings). If you want to embed any other backslash sequences such as \n (newline), you must use the double-quoted form. (Backslash sequences are also known as escape sequences, because you "escape" the normal interpretation of characters temporarily.)
A single-quoted string must be separated from a preceding word by a space because a single quote is a valid--though archaic--character in an identifier. Its use has been replaced by the more visually distinct :: sequence. That means that $main'var and $main::var are the same thing, but the second is generally considered easier to read for people and programs.
Double-quoted strings are subject to various forms of character interpolation, many of which will be familiar to programmers of other languages. These are listed in Table 2-1.
The \N{NAME} notation is usable only in conjunction with the use charnames pragma described in Chapter 31, "Pragmatic Modules". This allows you to specify character names symbolically, as in \N{GREEK SMALL LETTER SIGMA}, \N{greek:Sigma}, or \N{sigma}--depending on how you call the pragma. See also Chapter 15, "Unicode".
There are also escape sequences to modify the case or "meta-ness" of subsequent characters. See Table 2-2.
You may also embed newlines directly in your strings; that is, they can begin and end on different lines. This is often useful, but it also means that if you forget a trailing quote, the error will not be reported until Perl finds another line containing the quote character, which may be much further on in the script. Fortunately, this usually causes an immediate syntax error on the same line, and Perl is then smart enough to warn you that you might have a runaway string where it thought the string started.
Besides the backslash escapes listed above, double-quoted strings are subject to variable interpolation of scalar and list values. This means that you can insert the values of certain variables directly into a string literal. It's really just a handy form of string concatenation.[12] Variable interpolation may be done for scalar variables, entire arrays (but not hashes), single elements from an array or hash, or slices (multiple subscripts) of an array or hash. Nothing else interpolates. In other words, you may only interpolate expressions that begin with $ or @, because those are the two characters (along with backslash) that the string parser looks for. Inside strings, a literal @ that is not part of an array or slice identifier but is followed by an alphanumeric character must be escaped with a backslash (\@), or else a compilation error will result. Although a complete hash specified with a % may not be interpolated into the string, single hash values or hash slices are okay, because they begin with $ and @ respectively.
[12]With warnings enabled, Perl may report undefined values interpolated into strings as using the concatenation or join operations, even though you don't actually use those operators there. The compiler created them for you anyway.
The following code segment prints out "The price is $100.":
$Price = '$100'; # not interpolated print "The price is $Price.\n"; # interpolated
As in some shells, you can put braces around the identifier to distinguish it from following alphanumerics: "How ${verb}able!". An identifier within such braces is forced to be a string, as is any single identifier within a hash subscript. For example:
can be written as:$days{'Feb'}
and the quotes will be assumed. Anything more complicated in the subscript is interpreted as an expression, and then you'd have to put in the quotes:$days{Feb}
In particular, you should always use quotes in slices such as:$days{'February 29th'} # Ok. $days{"February 29th"} # Also ok. "" doesn't have to interpolate. $days{ February 29th } # WRONG, produces parse error.
@days{'Jan','Feb'} # Ok. @days{"Jan","Feb"} # Also ok. @days{ Jan, Feb } # Kinda wrong (breaks under use strict)
Apart from the subscripts of interpolated array and hash variables, there are no multiple levels of interpolation. Contrary to the expectations of shell programmers, backticks do not interpolate within double quotes, nor do single quotes impede evaluation of variables when used within double quotes. Interpolation is extremely powerful but strictly controlled in Perl. It happens only inside double quotes, and in certain other "double-quotish" operations that we'll describe in the next section:
print "\n"; # Ok, print a newline. print \n ; # WRONG, no interpolative context.
Although we usually think of quotes as literal values, in Perl they function more like operators, providing various kinds of interpolating and pattern-matching capabilities. Perl provides the customary quote characters for these behaviors, but also provides a more general way for you to choose your quote character for any of them. In Table 2-3, any nonalphanumeric, nonwhitespace delimiter may be used in place of /. (The newline and space characters are no longer allowed as delimiters, although ancient versions of Perl once allowed this.)
Customary | Generic | Meaning | Interpolates |
---|---|---|---|
'' | q// | Literal string | No |
"" | qq// | Literal string | Yes |
`` | qx// | Command execution | Yes |
() | qw// | Word list | No |
// | m// | Pattern match | Yes |
s/// | s/// | Pattern substitution | Yes |
y/// | tr/// | Character translation | No |
"" | qr// | Regular expression | Yes |
Some of these are simply forms of "syntactic sugar" to let you avoid putting too many backslashes into quoted strings, particularly into pattern matches where your regular slashes and backslashes tend to get all tangled.
If you choose single quotes for delimiters, no variable interpolation is done even on those forms that ordinarily interpolate. If the opening delimiter is an opening parenthesis, bracket, brace, or angle bracket, the closing delimiter will be the corresponding closing character. (Embedded occurrences of the delimiters must match in pairs.) Examples:
The last example demonstrates that you can use whitespace between the quote specifier and its initial bracketing character. For two-element constructs like s/// and tr///, if the first pair of quotes is a bracketing pair, the second part gets its own starting quote character. In fact, the second pair needn't be the same as the first pair. So you can write things like s<foo>(bar) or tr(a-f)[A-F]. Because whitespace is also allowed between the two inner quote characters, you could even write that last one as:$single = q!I said, "You said, 'She said it.'"!; $double = qq(Can't we get some "good" $variable?); $chunk_of_code = q { if ($condition) { print "Gotcha!"; } };
Whitespace is not allowed, however, when # is being used as the quoting character. q#foo# is parsed as the string 'foo', while q #foo# is parsed as the quote operator q followed by a comment. Its delimiter will be taken from the next line. Comments can also be placed in the middle of two-element constructs, which allows you to write:tr (a-f) [A-F];
s {foo} # Replace foo {bar}; # with bar. tr [a-f] # Transliterate lowercase hex [A-F]; # to uppercase hex
A name that has no other interpretation in the grammar will be treated as if it were a quoted string. These are known as barewords.[13] As with filehandles and labels, a bareword that consists entirely of lowercase letters risks conflict with future reserved words. If you have warnings enabled, Perl will warn you about barewords. For example:
@days = (Mon,Tue,Wed,Thu,Fri); print STDOUT hello, ' ', world, "\n";
[13] Variable names, filehandles, labels, and the like are not considered barewords because they have a meaning forced by a preceding token or a following token (or both). Predeclared names such as subroutines aren't barewords either. It's only a bareword when the parser has no clue.
sets the array @days to the short form of the weekdays and prints "hello world" followed by a newline on STDOUT. If you leave the filehandle out, Perl tries to interpret hello as a filehandle, resulting in a syntax error. Because this is so error-prone, some people may wish to avoid barewords entirely. The quoting operators listed earlier provide many convenient forms, including the qw// "quote words" construct which nicely quotes a list of space-separated words:
You can go as far as to outlaw barewords entirely. If you say:@days = qw(Mon Tue Wed Thu Fri); print STDOUT "hello world\n";
then any bareword will produce a compile-time error. The restriction lasts through the end of the enclosing scope. An inner scope may countermand this by saying:use strict 'subs';
Note that the bare identifiers in constructs like:no strict 'subs';
are not considered barewords since they're allowed by explicit rule rather than by having "no other interpretation in the grammar"."${verb}able" $days{Feb}
An unquoted name with a trailing double colon, such as main:: or Dog::, is always treated as the package name. Perl turns the would-be bareword Camel:: into the string "Camel" at compile time, so this usage is not subject to rebuke by use strict.
Array variables are interpolated into double-quoted strings by joining all elements of the array with the separator specified in the $" variable[14] (which contains a space by default). The following are equivalent:
Within search patterns, which also undergo double-quotish interpolation, there is an unfortunate ambiguity: is /$foo[bar]/ to be interpreted as /${foo}[bar]/ (where [bar] is a character class for the regular expression) or as /${foo[bar]}/ (where [bar] is the subscript to array @foo)? If @foo doesn't otherwise exist, it's obviously a character class. If @foo exists, Perl takes a good guess about [bar], and is almost always right.[15] If it does guess wrong, or if you're just plain paranoid, you can force the correct interpretation with braces as shown earlier. Even if you're merely prudent, it's probably not a bad idea.$temp = join( $", @ARGV ); print $temp; print "@ARGV";
[14]$LIST_SEPARATOR if you use the English module bundled with Perl.
[15]The guesser is too boring to describe in full, but basically takes a weighted average of all the things that look like character classes (a-z, \w, initial ^) versus things that look like expressions (variables or reserved words).
A line-oriented form of quoting is based on the Unix shell's here-document syntax. It's line-oriented in the sense that the delimiters are lines rather than characters. The starting delimiter is the current line, and the terminating delimiter is a line consisting of the string you specify. Following a <<, you specify the string to terminate the quoted material, and all lines following the current line down to but not including the terminating line are part of the string. The terminating string may be either an identifier (a word) or some quoted text. If quoted, the type of quote determines the treatment of the text, just as it does in regular quoting. An unquoted identifier works as though it were in double quotes. A backslashed identifier works as though it were in single quotes (for compatibility with shell syntax). There must be no space between the << and an unquoted identifier, although whitespace is permitted if you specify a quoted string instead of the bare identifier. (If you insert a space, it will be treated as a null identifier, which is valid but deprecated, and matches the first blank line--see the first Hurrah! example below.) The terminating string must appear by itself, unquoted and with no extra whitespace on either side, on the terminating line.
Just don't forget that you have to put a semicolon on the end to finish the statement, because Perl doesn't know you're not going to try to do this:print <<EOF; # same as earlier example The price is $Price. EOF print <<"EOF"; # same as above, with explicit quotes The price is $Price. EOF print <<'EOF'; # single-quoted quote All things (e.g. a camel's journey through A needle's eye) are possible, it's true. But picture how the camel feels, squeezed out In one long bloody thread, from tail to snout. -- C.S. Lewis EOF print << x 10; # print next line 10 times The camels are coming! Hurrah! Hurrah! print <<"" x 10; # the preferred way to write that The camels are coming! Hurrah! Hurrah! print <<`EOC`; # execute commands echo hi there echo lo there EOC print <<"dromedary", <<"camelid"; # you can stack them I said bactrian. dromedary She said llama. camelid funkshun(<<"THIS", 23, <<'THAT'); # doesn't matter if they're in parens Here's a line or two. THIS And here's another. THAT
If you want your here docs to be indented with the rest of the code, you'll need to remove leading whitespace from each line manually:print <<'odd' 2345 odd + 10000; # prints 12345
You could even populate an array with the lines of a here document as follows:($quote = <<'QUOTE') =~ s/^\s+//gm; The Road goes ever on and on, down from the door where it began. QUOTE
@sauces = <<End_Lines =~ m/(\S.*\S)/g; normal tomato spicy tomato green chile pesto white wine End_Lines
A literal that begins with a v and is followed by one or more dot-separated integers is treated as a string literal composed of characters with the specified ordinal values:
These are called v-strings, short for "vector strings" or "version strings" or anything else you can think of that starts with "v" and deals with lists of integers. They provide an alternate and more legible way to construct strings when you want to specify the numeric values of each character. Thus, v1.20.300.4000 is a more winsome way to produce the same string value as any of:$crlf = v13.10; # ASCII carriage return, line feed
If such a literal has two or more dots (three or more integers), the leading v may be omitted."\x{1}\x{14}\x{12c}\x{fa0}" pack("U*", 1, 20, 300, 4000) chr(1) . chr(20) . chr(300) . chr(4000)
V-strings are useful for representing IP address and version numbers. In particular, since characters can have an ordinal value larger than 255 these days, v-strings provide a way to represent version numbers of any size that can be correctly compared with a simple string comparison.print v9786; # prints UTF-8 encoded SMILEY, "\x{263a}" print v102.111.111; # prints "foo" print 102.111.111; # same thing use 5.6.0; # require a particular Perl version (or later) $ipaddr = 204.148.40.9; # the IPv4 address of oreilly.com
Version numbers and IP addresses stored in v-strings are not human readable, since the individual integers are stored as arbitrary characters. To produce something legible, use the v flag in a printf mask, like "%vd", as described under sprintf in Chapter 29, "Functions". For more on Unicode strings, see Chapter 15, "Unicode" and the use bytes pragma in Chapter 31, "Pragmatic Modules"; for comparing version strings using string comparison operators, see $^V in Chapter 28, "Special Names"; and for representing IPv4 addresses, see gethostbyaddr in Chapter 29, "Functions".
You should consider any identifier that both begins and ends with a double underscore to be reserved for special syntactic use by Perl. Two such special literals are __LINE__ and __FILE__, which represent the current line number and filename at that point in your program. They may only be used as separate tokens; they will not be interpolated into strings. Likewise, __PACKAGE__ is the name of the package the current code is being compiled into. If there is no current package (due to an empty package; directive), __PACKAGE__ is the undefined value. The token __END__ (or alternatively, a Control-D or Control-Z character) may be used to indicate the logical end of the script before the real end-of-file. Any following text is ignored, but may be read via the DATA filehandle.
The __DATA__ token functions similarly to the __END__ token, but opens the DATA filehandle within the current package's namespace, so that files you require can each have their own DATA filehandles open simultaneously. For more information, see DATA in Chapter 28, "Special Names".
Copyright © 2001 O'Reilly & Associates. All rights reserved.