正则表达式列表

note

For a full list of supported metacharacters and syntax, see ICU Regular Expressions documentation


术语

含义及用法

任何字符

任何给定的字符,除非该字符是正则表达式中的元字符。正则表达式的元字符清单位于该表的以下部分。

.

任何单个字符,除了换行符或者段落标记。例如,搜索“sh.rt”将会同时匹配“shirt”和“short”。

^

段落或者单元格的起始位置。位于段落开始位置的特殊对象(例如空的字段、锚定到字符的框架)将会被忽略。例如:“^Peter”会匹配单词“Peter”,仅当该单词为段落的第一个单词时。

$

段落或者单元格的结束位置。位于段落结束位置的特殊对象(例如空的字段、锚定到字符的框架)将会被忽略。例如:“Peter$”会匹配单词“Peter”,仅当该单词为段落的最后一个单词时,注意“Peter”后面如果还有一个英文句号那么匹配会失败。

单独的 $ 可匹配段落末尾。这样就可以查找替换段落标记了。

*

正则表达式字符之前紧邻的一个字符出现零次或多次。例如,“Ab*c”匹配“Ac”,“Abc”,“Abbc”,“Abbbc”等等。

+

正则表达式字符之前紧邻的一个字符出现一次或多次。例如,“AX.+4”匹配“AXx4”,但是无法匹配“AX4”。

这种模式下,会匹配段落中尽可能长的字符串。如果段落中包含字符串“AX 4 AX4”,那么整个这个字符串会被匹配到并被高亮。

?

正则表达式字符之前紧邻的一个字符出现零次或一次。例如,“Texts?”匹配“Text”和“Texts”,“x(ab|c)?y”会找到“xy”,“xaby”或者“xcy”。

\

转义字符。在该字符之后的单个特殊字符将会被解析为正常字符,而不是作为正则表达式元字符,但 "\n", "\t", "\b", "\>" and "\<" 例外。比如,“tree\.”会匹配“tree.”,而不能匹配“treed”或者“trees”。

\n

当在查找框中使用时,查找在 Writer 中由 Shift+Enter 组合键插入的换行符,或者查找在 Calc 单元格中由 Ctrl+Enter 组合键插入的换行符。

当在替换框中使用时,插入一个可由 Enter 键或者回车键插入的段落标记。该功能在 Calc 中不可用,没有特殊含义,会被原样对待。

要在Writer中将换行符替换为段落标记(分段符),请在查找框和替换框中同时输入“\n”,然后执行查找和替换。

\t

制表符。也可以在替换框中使用。

\b

单词边界。比如,“\bbook”匹配“bookmark”和“book”,但是不会匹配“checkbook”;“book\b”匹配“checkbook”和“book”,但是不会匹配“bookmark”。

注意:这种形式代替了旧式的 "\>" (匹配单词结尾) 和 "\<" (匹配单词开始),尽管这种旧的匹配方式目前仍然可用。

\w

Match a word character.

\W

Match a non-word character.

^$

查找空的段落。

^.

查找段落的首个字符。

& 或 $0

Adds the string that was found by the search criteria in the Find box to the term in the Replace box when you make a replacement.

For example, if you enter "window" in the Find box and "&frame" in the Replace box, the word "window" is replaced with "windowframe".

You can also enter an "&" in the Replace box to modify the Attributes or the Format of the string found by the search criteria.

[...]

Any single occurrence of any one of the characters that are between the brackets. For example: "[abc123]" matches the characters ‘a’, ‘b’, ’c’, ‘1’, ‘2’ and ‘3’. "[a-e]" matches single occurrences of the characters a through e, inclusive (the range must be specified with the character having the smallest Unicode code number first). "[a-eh-x]" matches any single occurrence of the characters that are in the ranges ‘a’ through ‘e’ and ‘h’ through ‘x’.

[^...]

Any single occurrence of a character, including Tab, Space and Line Break characters, that is not in the list of characters specified inclusive ranges are permitted. For example "[^a-syz]" matches all characters not in the inclusive range ‘a’ through ‘s’ or the characters ‘y’ and ‘z’.

\uXXXX

\UXXXXXXXX

The character represented by the four-digit hexadecimal Unicode code (XXXX).

The character represented by the eight-digit hexadecimal Unicode code (XXXXXXXX).

note

For certain symbol fonts the symbol (glyph) that you see on screen may look related to a different Unicode code than what is actually used for it in the font. The Unicode codes can be viewed by choosing Insert - Special Character, or by using Unicode conversion shortcut.


\N{UNICODE CHARACTER NAME}

Match the Unicode named character.

Some remarkable Unicode named characters are SPACE, NO-BREAK SPACE, SOFT HYPHEN, ACUTE ACCENT, CIRCUMFLEX ACCENT, GRAVE ACCENT.

note

The Unicode character names can be searched and viewed by choosing Insert - Special Character.


|

The infix operator delimiting alternatives. Matches the term preceding the "|" or the term following the "|". For example, "this|that" matches occurrences of both "this" and "that".

{N}

The post-fix repetition operator that specifies an exact number of occurrences ("N") of the regular expression term immediately preceding it must be present for a match to occur. For example, "tre{2}" matches "tree".

{N,M}

The post-fix repetition operator that specifies a range (minimum of "N" to a maximum of "M") of occurrences of the regular expression term immediately preceding it that can be present for a match to occur. For example, "tre{1,2}" matches "tre" and "tree".

{N,}

The post-fix repetition operator that specifies a range (minimum "N" to an unspecified maximum) of occurrences of the regular expression term immediately preceding it that can be present for a match to occur. (The maximum number of occurrences is limited only by the size of the document). For example, "tre{2,}" matches "tree", "treee", and "treeeee".

(...)

The grouping construct that serves three purposes.

  1. To enclose a set of ‘|’ alternatives. For example, the regular expression "b(oo|ac)k" matches both "book" and "back".

  2. To group terms in a complex expression to be operated on by the post-fix operators: "*", "+" and "?" along with the post-fix repetition operators. For example, the regular expression "a(bc)?d" matches both "ad" and "abcd"; "M(iss){2}ippi" matches "Mississippi".

  3. To reference the matched sub string inside the parentheses for later use. The "\N" construct is used in the Find box, the "$N" construct is used in the Replace box. "N" being a digit, the reference to the first match is represented by "\1" in the Find box and by "$1" in the Replace box; "\2" and "$2" reference to the second matched, and so on.

For example, the regular expression "(890)xy\1z\1" matches "890xy890z890".

With the regular expression "(fruit|truth)\b" in the Find box, and the replacement expression "$1ful" in the Replace box, occurrences of "fruit" and "truth" are replaced with "fruitful" and "truthful" respectively. Note: "\b" prevents "fruitfully" or "truthfully" from matching.

[:alpha:]

Represents an alphabetic character. Use [:alpha:] to find one of them.

\d

[:digit:]

Represents a decimal digit. Use [:digit:] to find one of them.

[:alnum:]

代表一个字母数字字符 (字母和数字)。

\s

[:space:]

代表空格字符 (但不代表其他空白字符)。

[:print:]

代表一个可打印字符。

[:cntrl:]

代表一个非打印字符。

[:lower:]

如果在「选项」中选择「区分大小写」,则显示小写字符。

[:upper:]

如果在「选项」中选中「区分大小写」,则代表一个大写字符。


Regular expression terms can be combined to form complex and sophisticated regular expressions for searches as show in the following examples.

示例

Expression

Meaning

^$

An empty paragraph.

^ specifies that the match must be at the start of a paragraph,

$ specifies that a paragraph mark or the end of a cell must follow the matched string.

^.

The first character of a paragraph.

^ specifies that the match must be at the start of a paragraph,

. specifies any single character.

e([:digit:])?

Matches "e" by itself or an "e" followed by one digit.

e specifies the character "e",

[:digit:] specifies any decimal digit,

? specifies zero or one occurrences of [:digit:].

^([:digit:])$

Matches a paragraph or cells containing exactly one digit.

^ specifies that the match must be at the start of a paragraph,

[:digit:] specifies any decimal digit,

$ specifies that a paragraph mark or the end of a cell must follow the matched string.

^[:digit:]{3}$

Matches a paragraph or cell containing only three digit numbers

^ specifies that the match must be at the start of a paragraph,

[:digit:] specifies any decimal digit,

{3} specifies that [:digit:] must occur three times,

$ specifies that a paragraph mark or the end of a cell must follow the matched string.

\bconst(itu|ruc)tion\b

Matches the words "constitution" and "construction" but not the word "constitutional."

\b specifies that the match must begin at a word boundary,

const specifies the characters "const",

( starts the group,

itu specifies the characters "itu",

| specifies the alternative,

ruc specifies the characters "ruc",

) ends the group,

tion specifies the characters "tion",

\b specifies that the match must end at a word boundary.


请支持我们!