what regular expression pattern is used to match non-printable characters?

EmEditor Dwelling - EmEditor Help - How to - Search

Regular Expression Syntax

EmEditor regular expression syntax is based on Perl regular expression syntax.

Literals

All characters are literals except: ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^", "$", "|", and "\". These characters are literals when preceded past a "\". A literal is a graphic symbol that matches itself. For example, searching for "\?" will match every "?" in the certificate, or searching for "Hello" will match every "Hello" in the document.

Metacharacters

The following tables contain the complete listing of metacharacters (not-literals) and their behavior in the context of regular expressions.

\

Marks the adjacent character equally a special character, a literal, or a back reference. For case, 'n' matches the character "northward". '\n' matches a newline character. The sequence '\\' matches "\" and "\(" matches "(".

^

Matches the position at the beginning of the input cord. For case, "^e" matches any "eastward" that begins a string.

$

Matches the position at the end of the input string. For example, "east$" matches any "e" that ends a string.

*

Matches the preceding character or sub-expression zero or more times. For example, zo* matches "z" and "zoo". * is equivalent to {0,}.

+

Matches the preceding character or sub-expression one or more times. For example, 'zo+' matches "zo" and "zoo" , only not "z". + is equivalent to {1,}.

?

Matches the preceding character or sub-expression zero or one time. For example, "do(es)?" matches the "practise" in  "exercise"or "does".? is equivalent to {0,1}

{n}

n is a nonnegative integer. Matches exactly northward times. For case, 'o{two}' does not lucifer the "o" in "Bob" but matches the two o's in "food".

{n,}

n is a nonnegative integer. Matches at least due north times. For instance, 'o{two,}' does not friction match "o" in "Bob" and matches all the o'due south in "foooood". "o{one,}" is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'.

{northward,m}

yard and n are nonnegative integers, where n <= k. Matches at least northward and at most k times. For example, "o{1,3}" matches the start three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Note that you cannot put a space between the comma and the numbers.

?

When this character immediately follows whatever of the other quantifiers (*, +, ?, {n}, {due north,}, {n,yard}), the matching pattern is not-greedy. A non-greedy blueprint matches every bit little of the searched string equally possible, whereas the default greedy pattern matches as much of the searched string as possible. For instance, in the cord "oooo", 'o+?' matches a single "o", while 'o+' matches all 'o'due south.

.

Matches any single character. For case, ".e" will friction match text where any character precedes an "eastward", similar "he", "we", or "me". In EmEditor Professional, information technology matches a newline character within the range specified in the Additional Lines to Search for Regular Expressions text box if the Regular Expression "." Can Match Newline Characters cheque box is checked.

(pattern)

Parentheses serve two purposes: to group a pattern into a sub-expression and to capture what generated the match. For example the expression "(ab)*" would lucifer all of the cord "ababab". Each sub-expression friction match is captured as a dorsum reference (see below) numbered from left to right. To lucifer parentheses characters ( ), utilise '\(' or '\)'.

(?<name>design)

Captures the string matched by "pattern" into the group "name".

\1 - \9

Indicates a back reference - a back reference is a reference to a previous sub-expression that has already been matched. The reference is to what the sub-expression matched, not to the expression itself. A back reference consists of the escape character "\" followed by a digit "i" to "ix", "\ane" refers to the get-go sub-expression, "\2" to the 2nd etc. For case, "(a)\1" would capture "a" as the first back reference and lucifer any text "aa". Back references can besides be used when using the Replace feature under the Search card. Use regular expressions to locate a text pattern, and the matching text can be replaced by a specified back reference. For example, "(h)(e)" will find "he", and putting "\1" in the Supersede With box will supplant "he" with "h" whereas "\2\1" will replace "he" with "eh".

\yard<name>

Indicates a named back reference. A named back reference is a reference to a previous named capturing group using this form: (?<name>expression). If "name" is a number, it indicates a numbered back reference, equivalent to \ane, \ii, \3, ...

(?:pattern)

A subexpression that matches design but does not capture the match, that is, it is a not-capturing match that is not stored for possible later use with dorsum references. This is useful for combining parts of a blueprint with the "or" character (|). For example, 'industr(?:y|ies) is a more than economical expression than 'manufacture|industries'.

(?=pattern)

A subexpression that performs a positive lookahead search, which matches the string at whatever point where a cord matching pattern begins. For case, "x(?=abc)" matches an "10"but if it is followed by the expression "abc". This is a non-capturing lucifer, that is, the friction match is not captured for possible later use with back references. pattern cannot contain a newline character.

(?!pattern)

A subexpression that performs a negative lookahead search, which matches the search string at whatever point where a cord not matching blueprint begins. For example, "x(?!abc)" matches an "ten" simply if information technology is not followed by the expression "abc". This is a non-capturing match, that is, the match is not captured for possible subsequently use with back references. pattern cannot incorporate a newline character.

(?<=pattern)

A subexpression that performs a positive lookbehind search, which matches the search string at any betoken where a string matching pattern ends. For example, "(?<=abc)ten" matches an "x" but if it is preceded by the expression "abc". This is a non-capturing match, that is, the match is not captured for possible later utilise with back references. pattern cannot contain a newline character. pattern must be of fixed length.

(?<!blueprint)

A subexpression that performs a negative lookbehind search, which matches the search string at any point where a string not matching pattern ends. For example, "(?<!abc)x" matches an "x" only if it is not preceded by the expression "abc". This is a non-capturing lucifer, that is, the match is not captured for possible after use with back references. pattern cannot contain a newline graphic symbol. pattern must exist of fixed length.

10|y

Matches either x or y. For example, 'z|nutrient' matches "z" or "food". '(z|f)ood' matches "zood" or "food".

[xyz]

A grapheme set. Matches any ane of the enclosed characters. For example, '[abc]' matches the 'a' in "plain".

[^xyz]

A negative character set. Matches any character not enclosed. For case, '[^abc]' matches the 'p' in "plain".

[a-z]

A range of characters. Matches whatever character in the specified range. For example, '[a-z]' matches any lowercase alphabetic graphic symbol in the range 'a' through 'z'.

[^a-z]

A negative range characters. Matches any grapheme not in the specified range. For example, '[^a-z]' matches whatsoever character not in the range 'a' through 'z'.

Character Classes

The following character classes are used inside a character fix such equally "[:classname:]". For instance, "[[:space:]]" is the set of all whitespace characters.

alnum

Any linguistic graphic symbol and number: alphabetical, syllabary, or ideographic.

blastoff

Whatever linguistic graphic symbol: alphabetical, syllabary, or ideographic.

bare

Any bare graphic symbol, either a space or a tab.

cntrl

Whatever control character.

digit

Whatsoever digit 0-9.

graph

Any graphical character.

lower

Any lowercase grapheme a-z, and other lowercase graphic symbol.

impress

Any printable character.

punct

Whatsoever punctuation character.

infinite

Whatsoever whitespace grapheme.

upper

Whatsoever capital grapheme A-Z, and other uppercase character.

xdigit

Whatsoever hexadecimal digit grapheme, 0-9, a-f and A-F.

discussion

Whatever discussion graphic symbol - all alphanumeric characters plus the underscore.

unicode

Whatsoever character whose lawmaking is greater than 255. (Regex.Boost only)

Graphic symbol Properties

Syntax:

\p{property-proper name}

\P{property-proper name}  (negative)

\p{^property-name}  (negative)  (Onigmo merely)

The post-obit property names can be used. For example, "\p{alnum}" is whatever alphanumeric grapheme, and "\P{alnum}" is its negative form.

alnum

Whatever linguistic graphic symbol and number: alphabetical, syllabary, or ideographic.

blastoff

Whatever linguistic character: alphabetical, syllabary, or ideographic.

blank

Any blank grapheme, either a space or a tab.

cntrl

Any command character.

digit

Whatsoever digit 0-ix.

graph

Any graphical character.

lower

Any lowercase graphic symbol a-z, and other lowercase character.

print

Any printable character.

punct

Any punctuation grapheme.

infinite

Whatsoever whitespace graphic symbol.

upper

Whatever capital grapheme A-Z, and other majuscule character.

xdigit

Any hexadecimal digit graphic symbol, 0-9, a-f and A-F.

word

Any word character - all alphanumeric characters plus the underscore.

unicode

Whatsoever character whose lawmaking is greater than 255. (Regex.Heave only)

ascii

Any ASCII characters. (Onigmo only)

Hiragana

Any Hiragana character. (Onigmo only)

Katakana

Any Katakana character. (Onigmo but)

Han

Any Han character. (Onigmo only)

Hangul

Any Hangul grapheme. (Onigmo just)

g See Unicode Properties for the complete property name list (Onigmo only)

Single character escape sequences

The following escape sequences are aliases for unmarried characters:

0x07

\a

Bell character.

0x0C

\f

Class feed.

0x0A

\northward

Newline character.

0x0D

\r

Carriage render.

0x09

\t

Tab character.

0x0B

\5

Vertical tab.

0x1B

\e

ASCII Escape character.

0dd

\0dd

An octal graphic symbol lawmaking, where dd is one or more octal digits.

0xXX

\30

A hexadecimal graphic symbol code, where 20 is i or more hexadecimal digits (a Unicode graphic symbol).

0xXXXX

\ten{XXXX}

A hexadecimal graphic symbol code, where XXXX is one or more hexadecimal digits (a Unicode character).

Z-'@'

\cZ Z-'@'

An ASCII escape sequence command-Z, where Z is whatsoever ASCII grapheme greater than or equal to the graphic symbol code for '@'.

Word Boundaries

The following escape sequences match the boundaries of words:

\<

Matches the start of a word. (Boost.Regex only)

\>

Matches the end of a word. (Boost.Regex simply)

\b

Matches a word purlieus (the start or end of a word).

\B

Matches only when non at a word purlieus.

Character class escape sequences

The following escape sequences tin be used to correspond entire character classes:

\w

Any word character - all alphanumeric characters plus the underscore.

\Westward

Complement of \w - find any non-word grapheme

\s

Any whitespace character.

\Due south

Complement of \s.

\d

Any digit 0-9.

\D

Complement of \d.

\50

Any lower case character a-z.

\L

Complement of \l.

\u

Any upper instance character A-Z.

\U

Complement of \u.

\C

Any unmarried character, equivalent to '.'.

\Q

The begin quote operator, everything that follows is treated as a literal character until a \E end quote operator is found.

\East

The end quote operator, terminates a sequence begun with \Q.

Replacement Expressions

See Replacement Expression Syntax.

Notes

In Find in Files and in Replace in Files , the carriage return (\r) and the line feed (\north) must be specified advisedly. Come across To Specify newline characters for details.

In gild for some escape sequences to piece of work in EmEditor, like "\l", "\u" and their complements, the Match Example selection has to be selected.

Copyright Notice

The regular expression routines used in EmEditor use Boost library Regex++ and Onigmo.

Copyright (C) Dr John Maddock

Copyright (C) Chiliad. Takata, based on Oniguruma Copyright (C) past M. Kosako.

See Too

Q. What are examples of regular expressions?

To Specify newline characters

g Boost.Regex: Regular Expression Syntax

g Onigmo: Regular Expression Syntax

g Onigmo: Unicode Properties

Copyright © 2003-2022 by Emurasoft, Inc.

harperency1954.blogspot.com

Source: http://www.emeditor.org/en/howto_search_search_regexp_syntax.html

0 Response to "what regular expression pattern is used to match non-printable characters?"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel