Список регулярных выражений

note

For a full list of supported metacharacters and syntax, see ICU Regular Expressions documentation


Термин

Representation/Use

Любой символ

The given character, unless it is a regular expression meta character. The list of meta characters follows in this table.

.

Any single character except a line break or a paragraph break. For example, the search term "sh.rt" matches both "shirt" and "short".

^

The beginning of a paragraph or cell. Special objects such as empty fields or character-anchored frames, at the beginning of a paragraph are ignored. Example: "^Peter" matches the word "Peter" only when it is the first word of a paragraph.

$

The end of a paragraph or cell. Special objects such as empty fields or character-anchored frames at the end of a paragraph are ignored. Example: "Peter$" matches only when the word "Peter" is the last word of a paragraph, note "Peter" cannot be followed by a period.

$ сам по себе означает конец абзаца. С его помощью возможно искать и заменять разрывы абзацев.

*

Zero or more of the regular expression term immediately preceding it. For example, "Ab*c" matches "Ac", "Abc", "Abbc", "Abbbc", and so on.

+

One or more of the regular expression term immediately preceding it. For example, "AX.+4" finds "AXx4", but not "AX4".

The longest possible string that matches this regular expression in a paragraph is always matched. If the paragraph contains the string "AX 4 AX4", the entire passage is highlighted.

?

Zero or one of the regular expression term immediately preceding it. For example, "Texts?" matches "Text" and "Texts" and "x(ab|c)?y" finds "xy", "xaby", or "xcy".

\

The special character that follows it is interpreted as a normal character and not as a regular expression meta character (except for the combinations "\n", "\t", "\b", "\>" and "\<"). For example, "tree\." matches "tree.", not "treed" or "trees".

\n

When entered in the Find text box, finds a line break that was inserted with the Shift+Enter key combination in Writer, or the Ctrl+Enter key combination in a Calc cell.

When entered in the Replace text box in Writer, inserts a paragraph break that can be inserted with the Enter or Return key. It has no special meaning in Calc, and is treated literally there.

To change line breaks into paragraph breaks in Writer, enter \n in both the Find and Replace boxes, and then perform a search and replace.

\t

A tab character. Can also be used in the Replace box.

\b

A word boundary. For example, "\bbook" matches "bookmark" and "book" but not "checkbook" whereas "book\b" matches "checkbook" and "book" but not "bookmark".

Note, this form replaces the obsolete (although they still work for now) forms "\>" (match end of word) and "\<" (match start of word).

^$

Поиск пустого абзаца.

^.

Поиск первого символа абзаца.

& или $0

Adds the string that was found by the search criteria in the Find box to the term in the Replace box when you make a replacement.

For example, if you enter "window" in the Find box and "&frame" in the Replace box, the word "window" is replaced with "windowframe".

You can also enter an "&" in the Replace box to modify the Attributes or the Format of the string found by the search criteria.

[...]

Any single occurrence of any one of the characters that are between the brackets. For example: "[abc123]" matches the characters ‘a’, ‘b’, ’c’, ‘1’, ‘2’ and ‘3’. "[a-e]" matches single occurrences of the characters a through e, inclusive (the range must be specified with the character having the smallest Unicode code number first). "[a-eh-x]" matches any single occurrence of the characters that are in the ranges ‘a’ through ‘e’ and ‘h’ through ‘x’.

[^...]

Any single occurrence of a character, including Tab, Space and Line Break characters, that is not in the list of characters specified inclusive ranges are permitted. For example "[^a-syz]" matches all characters not in the inclusive range ‘a’ through ‘s’ or the characters ‘y’ and ‘z’.

\uXXXX

\UXXXXXXXX

The character represented by the four-digit hexadecimal Unicode code (XXXX).

The character represented by the eight-digit hexadecimal Unicode code (XXXXXXXX).

note

For certain symbol fonts the symbol (glyph) that you see on screen may look related to a different Unicode code than what is actually used for it in the font. The Unicode codes can be viewed by choosing Insert - Special Character, or by using Unicode conversion shortcut.


\N{UNICODE CHARACTER NAME}

Match the Unicode named character.

Some remarkable Unicode named characters are SPACE, NO-BREAK SPACE, SOFT HYPHEN, ACUTE ACCENT, CIRCUMFLEX ACCENT, GRAVE ACCENT.

note

The Unicode character names can be searched and viewed by choosing Insert - Special Character.


|

The infix operator delimiting alternatives. Matches the term preceding the "|" or the term following the "|". For example, "this|that" matches occurrences of both "this" and "that".

{N}

The post-fix repetition operator that specifies an exact number of occurrences ("N") of the regular expression term immediately preceding it must be present for a match to occur. For example, "tre{2}" matches "tree".

{N,M}

The post-fix repetition operator that specifies a range (minimum of "N" to a maximum of "M") of occurrences of the regular expression term immediately preceding it that can be present for a match to occur. For example, "tre{1,2}" matches "tre" and "tree".

{N,}

The post-fix repetition operator that specifies a range (minimum "N" to an unspecified maximum) of occurrences of the regular expression term immediately preceding it that can be present for a match to occur. (The maximum number of occurrences is limited only by the size of the document). For example, "tre{2,}" matches "tree", "treee", and "treeeee".

(...)

The grouping construct that serves three purposes.

  1. To enclose a set of ‘|’ alternatives. For example, the regular expression "b(oo|ac)k" matches both "book" and "back".

  2. To group terms in a complex expression to be operated on by the post-fix operators: "*", "+" and "?" along with the post-fix repetition operators. For example, the regular expression "a(bc)?d" matches both "ad" and "abcd"; "M(iss){2}ippi" matches "Mississippi".

  3. To reference the matched sub string inside the parentheses for later use. The "\N" construct is used in the Find box, the "$N" construct is used in the Replace box. "N" being a digit, the reference to the first match is represented by "\1" in the Find box and by "$1" in the Replace box; "\2" and "$2" reference to the second matched, and so on.

For example, the regular expression "(890)xy\1z\1" matches "890xy890z890".

With the regular expression "(fruit|truth)\b" in the Find box, and the replacement expression "$1ful" in the Replace box, occurrences of "fruit" and "truth" are replaced with "fruitful" and "truthful" respectively. Note: "\b" prevents "fruitfully" or "truthfully" from matching.

[:alpha:]

Represents an alphabetic character. Use [:alpha:] to find one of them.

\d

[:digit:]

Represents a decimal digit. Use [:digit:] to find one of them.

[:alnum:]

Представление алфавитно-цифрового символ ([:alpha:] и [:digit:]).

\s

[:space:]

Представляет символ пробела (но не другие пробельные символы).

[:print:]

Представление печатаемого символа.

[:cntrl:]

Представление непечатаемого символа.

[:lower:]

Представление строчной буквы, если выбрано значение Учитывать регистр в поле Параметры.

[:upper:]

Represents an uppercase character if Match case is selected in Options.


Regular expression terms can be combined to form complex and sophisticated regular expressions for searches as show in the following examples.

Примеры

Выражение

Значение

^$

Пустой абзац.

^ указывает, что совпадение должно быть в начале абзаца,

$ specifies that a paragraph mark or the end of a cell must follow the matched string.

^.

The first character of a paragraph.

^ указывает, что совпадение должно быть в начале абзаца,

. specifies any single character.

e([:digit:])?

Matches "e" by itself or an "e" followed by one digit.

e specifies the character "e",

[:digit:] specifies any decimal digit,

? specifies zero or one occurrences of [:digit:].

^([:digit:])$

Matches a paragraph or cells containing exactly one digit.

^ указывает, что совпадение должно быть в начале абзаца,

[:digit:] specifies any decimal digit,

$ specifies that a paragraph mark or the end of a cell must follow the matched string.

^[:digit:]{3}$

Matches a paragraph or cell containing only three digit numbers

^ указывает, что совпадение должно быть в начале абзаца,

[:digit:] specifies any decimal digit,

{3} specifies that [:digit:] must occur three times,

$ specifies that a paragraph mark or the end of a cell must follow the matched string.

\bconst(itu|ruc)tion\b

Matches the words "constitution" and "construction" but not the word "constitutional."

\b specifies that the match must begin at a word boundary,

const specifies the characters "const",

( starts the group,

itu specifies the characters "itu",

| specifies the alternative,

ruc specifies the characters "ruc",

) ends the group,

tion specifies the characters "tion",

\b specifies that the match must end at a word boundary.


Пожалуйста, поддержите нас!