class String
Overview
A String
represents an immutable sequence of UTF-8 characters.
A String
is typically created with a string literal, enclosing UTF-8 characters
in double quotes:
"hello world"
A backslash can be used to denote some characters inside the string:
"\"" # double quote
"\\" # backslash
"\e" # escape
"\f" # form feed
"\n" # newline
"\r" # carriage return
"\t" # tab
"\v" # vertical tab
You can use a backslash followed by an u and four hexadecimal characters to denote a unicode codepoint written:
"\u0041" # == "A"
Or you can use curly braces and specify up to six hexadecimal numbers (0 to 10FFFF):
"\u{41}" # == "A"
A string can span multiple lines:
"hello
world" # same as "hello\n world"
Note that in the above example trailing and leading spaces, as well as newlines, end up in the resulting string. To avoid this, you can split a string into multiple lines by joining multiple literals with a backslash:
"hello " \
"world, " \
"no newlines" # same as "hello world, no newlines"
Alternatively, a backslash followed by a newline can be inserted inside the string literal:
"hello \
world, \
no newlines" # same as "hello world, no newlines"
In this case, leading whitespace is not included in the resulting string.
If you need to write a string that has many double quotes, parentheses, or similar characters, you can use alternative literals:
# Supports double quotes and nested parentheses
%(hello ("world")) # same as "hello (\"world\")"
# Supports double quotes and nested brackets
%[hello ["world"]] # same as "hello [\"world\"]"
# Supports double quotes and nested curlies
%{hello {"world"}} # same as "hello {\"world\"}"
# Supports double quotes and nested angles
%<hello <"world">> # same as "hello <\"world\">"
To create a String
with embedded expressions, you can use string interpolation:
a = 1
b = 2
"sum = #{a + b}" # "sum = 3"
This ends up invoking Object#to_s(IO)
on each expression enclosed by #{...}
.
If you need to dynamically build a string, use String#build
or IO::Memory
.
Non UTF-8 valid strings
String might end up being conformed of bytes which are an invalid
byte sequence according to UTF-8. This can happen if the string is created
via one of the constructors that accept bytes, or when getting a string
from String.build
or IO::Memory
. No exception will be raised, but
invalid byte sequences, when asked as chars, will use the unicode replacement
char (value 0xFFFD). For example:
# here 255 is not a valid byte value in the UTF-8 encoding
string = String.new(Bytes[255, 97])
string.valid_encoding? # => false
# The first char here is the unicode replacement char
string.chars # => ['�', 'a']
One can also create strings with specific byte value in them by using octal and hexadecimal escape sequences:
# Octal escape sequences
"\101" # # => "A"
"\12" # # => "\n"
"\1" # string with one character with code point 1
"\377" # string with one byte with value 255
# Hexadecimal escape sequences
"\x41" # # => "A"
"\xFF" # string with one byte with value 255
The reason for allowing strings that don't have a valid UTF-8 sequence is that the world is full of content that isn't properly encoded, and having a program raise an exception or stop because of this is not good. It's better if programs are more resilient, but show a replacement character when there's an error in incoming data.
Included Modules
- Comparable(String)
Defined in:
crystal_on_steroids/string/remove.crcrystal_on_steroids/string/squish.cr
crystal_on_steroids/string/truncation.cr
Instance Method Summary
-
#remove(*patterns)
Alters the string by removing all occurrences of the patterns.
-
#squish
Remove first and last whitespace and reduce to one all the others in the same sentence
-
#truncate(truncate_at, options = {} of Symbol => String)
Truncates a given
text
after a givensize
iftext
is longer thansize
: -
#truncate_words(words_count, options = {} of Symbol => String)
Truncates a given
text
after a given number of words (words_count
):
Instance methods inherited from class Object
in?(another_object)
in?,
presence
presence,
presence_in(another_object)
presence_in,
present?
present?,
to_param
to_param,
to_query(namespace)to_query to_query
Class methods inherited from class Object
❨╯°□°❩╯︵┻━┻
❨╯°□°❩╯︵┻━┻
Instance Method Detail
Alters the string by removing all occurrences of the patterns.
str = "foo bar test"
str.remove(" test", /bar/) # => "foo "
str # => "foo "
source: Rails ActiveSupport
Remove first and last whitespace and reduce to one all the others in the same sentence
" foo bar \n \t boo".squish # => "foo bar boo"
source: Rails ActiveSupport
Truncates a given text
after a given size
if text
is longer than size
:
"Once upon a time in a world far far away".truncate(27)
# => "Once upon a time in a wo..."
Pass a string or regexp :separator
to truncate text
at a natural break:
"Once upon a time in a world far far away".truncate(27, separator: " ")
# => "Once upon a time in a..."
The last characters will be replaced with the :omission
string (defaults to "...")
for a total size not exceeding size
:
"And they found that many people were sleeping better.".truncate(25, omission: "... (continued)")
# => "And they f... (continued)"
source: Rails ActiveSupport
Truncates a given text
after a given number of words (words_count
):
`
"Once upon a time in a world far far away".truncate_words(4)
=> "Once upon a time..."
`
Pass a string :separator
to specify a different separator of words:
"Once<br>upon<br>a<br>time<br>in<br>a<br>world".truncate_words(5, separator: "<br>")
# => "Once<br>upon<br>a<br>time<br>in..."
The last characters will be replaced with the :omission
string (defaults to "..."):
"And they found that many people were sleeping better.".truncate_words(5, omission: "... (continued)")
# => "And they found that many... (continued)"
source: Rails ActiveSupport