正则表达式列表

术语

含义及用法

任何字符

任何给定的字符,除非该字符是正则表达式中的元字符。正则表达式的元字符清单位于该表的以下部分。

.

任何单个字符,除了换行符或者段落标记。例如,搜索“sh.rt”将会同时匹配“shirt”和“short”。

^

段落或者单元格的起始位置。位于段落开始位置的特殊对象(例如空的字段、锚定到字符的框架)将会被忽略。例如:“^Peter”会匹配单词“Peter”,仅当该单词为段落的第一个单词时。

$

段落或者单元格的结束位置。位于段落结束位置的特殊对象(例如空的字段、锚定到字符的框架)将会被忽略。例如:“Peter$”会匹配单词“Peter”,仅当该单词为段落的最后一个单词时,注意“Peter”后面如果还有一个英文句号那么匹配会失败。

单独的 $ 可匹配段落末尾。这样就可以查找替换段落标记了。

*

正则表达式字符之前紧邻的一个字符出现零次或多次。例如,“Ab*c”匹配“Ac”,“Abc”,“Abbc”,“Abbbc”等等。

+

正则表达式字符之前紧邻的一个字符出现一次或多次。例如,“AX.+4”匹配“AXx4”,但是无法匹配“AX4”。

这种模式下,会匹配段落中尽可能长的字符串。如果段落中包含字符串“AX 4 AX4”,那么整个这个字符串会被匹配到并被高亮。

?

正则表达式字符之前紧邻的一个字符出现零次或一次。例如,“Texts?”匹配“Text”和“Texts”,“x(ab|c)?y”会找到“xy”,“xaby”或者“xcy”。

\

转义字符。在该字符之后的单个特殊字符将会被解析为正常字符,而不是作为正则表达式元字符,但 "\n", "\t", "\b", "\>" and "\<" 例外。比如,“tree\.”会匹配“tree.”,而不能匹配“treed”或者“trees”。

\n

当在查找框中使用时,查找在 Writer 中由 Shift+Enter 组合键插入的换行符,或者查找在 Calc 单元格中由 Ctrl+Enter 组合键插入的换行符。

当在替换框中使用时,插入一个可由 Enter 键或者回车键插入的段落标记。该功能在 Calc 中不可用,没有特殊含义,会被原样对待。

要在Writer中将换行符替换为段落标记(分段符),请在查找框和替换框中同时输入“\n”,然后执行查找和替换。

\t

制表符。也可以在替换框中使用。

\b

单词边界。比如,“\bbook”匹配“bookmark”和“book”,但是不会匹配“checkbook”;“book\b”匹配“checkbook”和“book”,但是不会匹配“bookmark”。

注意:这种形式代替了旧式的 "\>" (匹配单词结尾) 和 "\<" (匹配单词开始),尽管这种旧的匹配方式目前仍然可用。

^$

查找空的段落。

^.

查找段落的首个字符。

& 或 $0

Adds the string that was found by the search criteria in the Find box to the term in the Replace box when you make a replacement.

For example, if you enter "window" in the Find box and "&frame" in the Replace box, the word "window" is replaced with "windowframe".

You can also enter an "&" in the Replace box to modify the Attributes or the Format of the string found by the search criteria.

[...]

Any single occurrence of any one of the characters that are between the brackets. For example: "[abc123]" matches the characters ‘a’, ‘b’, ’c’, ‘1’, ‘2’ and ‘3’. "[a-e]" matches single occurrences of the characters a through e, inclusive (the range must be specified with the character having the smallest Unicode code number first). "[a-eh-x]" matches any single occurrence of the characters that are in the ranges ‘a’ through ‘e’ and ‘h’ through ‘x’.

[^...]

Any single occurrence of a character, including Tab, Space and Line Break characters, that is not in the list of characters specified inclusive ranges are permitted. For example "[^a-syz]" matches all characters not in the inclusive range ‘a’ through ‘s’ or the characters ‘y’ and ‘z’.

\uXXXX

\UXXXXXXXX

The character represented by the four-digit hexadecimal Unicode code (XXXX).

The character represented by the eight-digit hexadecimal Unicode code (XXXXXXXX).

note

For certain symbol fonts the symbol (glyph) that you see on screen may look related to a different Unicode code than what is actually used for it in the font. The Unicode codes can be viewed by choosing Insert - Special Character, or by using Unicode conversion shortcut.


\N{UNICODE CHARACTER NAME}

Match the Unicode named character.

Some remarkable Unicode named characters are SPACE, NO-BREAK SPACE, SOFT HYPHEN, ACUTE ACCENT, CIRCUMFLEX ACCENT, GRAVE ACCENT.

note

The Unicode character names can be searched and viewed by choosing Insert - Special Character.


|

The infix operator delimiting alternatives. Matches the term preceding the "|" or the term following the "|". For example, "this|that" matches occurrences of both "this" and "that".

{N}

The post-fix repetition operator that specifies an exact number of occurrences ("N") of the regular expression term immediately preceding it must be present for a match to occur. For example, "tre{2}" matches "tree".

{N,M}

The post-fix repetition operator that specifies a range (minimum of "N" to a maximum of "M") of occurrences of the regular expression term immediately preceding it that can be present for a match to occur. For example, "tre{1,2}" matches "tre" and "tree".

{N,}

The post-fix repetition operator that specifies a range (minimum "N" to an unspecified maximum) of occurrences of the regular expression term immediately preceding it that can be present for a match to occur. (The maximum number of occurrences is limited only by the size of the document). For example, "tre{2,}" matches "tree", "treee", and "treeeee".

(...)

The grouping construct that serves three purposes.

  1. To enclose a set of ‘|’ alternatives. For example, the regular expression "b(oo|ac)k" matches both "book" and "back".

  2. To group terms in a complex expression to be operated on by the post-fix operators: "*", "+" and "?" along with the post-fix repetition operators. For example, the regular expression "a(bc)?d" matches both "ad" and "abcd" in a search.; the regular expression "M(iss){2}ippi" matches "Mississippi".

  3. To record the matched sub string inside the parentheses as a reference for later use in the Find box using the "\n" construct or in the Replace box using the "$n" construct. The reference to the first match is represented by "\1" in the Find box and by "$1" in the Replace box. The reference to the second matched sub string by "\2" and "$2" respectively, and so on.

For example, the regular expression "(890)7\1\1" matches "8907890890".

With the regular expression "\b(fruit|truth)\b" in the Find box and the regular expression "$1ful" in the Replace box occurrences of the words "fruit" and "truth" can be replaced with the words "fruitful" and "truthful" respectively without affecting the words "fruitfully" and "truthfully"

[:alpha:]

Represents an alphabetic character. Use [:alpha:] to find one of them.

[:digit:]

Represents a decimal digit. Use [:digit:] to find one of them.

[:alnum:]

代表一个字母数字字符 (字母和数字)。

[:space:]

代表空格字符 (但不代表其他空白字符)。

[:print:]

代表一个可打印字符。

[:cntrl:]

代表一个非打印字符。

[:lower:]

如果在「选项」中选择「区分大小写」,则显示小写字符。

[:upper:]

如果在「选项」中选中「区分大小写」,则代表一个大写字符。


note

For a full list of supported metacharacters and syntax, see ICU Regular Expressions documentation


Regular expression terms can be combined to form complex and sophisticated regular expressions for searches as show in the following examples.

示例

Expression

Meaning

^$

An empty paragraph.

^ specifies that the match must be at the start of a paragraph,

$ specifies that a paragraph mark or the end of a cell must follow the matched string.

^.

The first character of a paragraph.

^ specifies that the match must be at the start of a paragraph,

. specifies any single character.

e([:digit:])?

Matches "e" by itself or an "e" followed by one digit.

e specifies the character "e",

[:digit:] specifies any decimal digit,

? specifies zero or one occurrences of [:digit:].

^([:digit:])$

Matches a paragraph or cells containing exactly one digit.

^ specifies that the match must be at the start of a paragraph,

[:digit:] specifies any decimal digit,

$ specifies that a paragraph mark or the end of a cell must follow the matched string.

^[:digit:]{3}$

Matches a paragraph or cell containing only three digit numbers

^ specifies that the match must be at the start of a paragraph,

[:digit:] specifies any decimal digit,

{3} specifies that [:digit:] must occur three times,

$ specifies that a paragraph mark or the end of a cell must follow the matched string.

\bconst(itu|ruc)tion\b

Matches the words "constitution" and "construction" but not the word "constitutional."

\b specifies that the match must begin at a word boundary,

const specifies the characters "const",

( starts the group,

itu specifies the characters "itu",

| specifies the alternative,

ruc specifies the characters "ruc",

) ends the group,

tion specifies the characters "tion",

\b specifies that the match must end at a word boundary.


请支持我们!