首頁 > 軟體

Shell指令碼中的正規表示式

2020-06-16 16:29:15

一、正規表示式的定義

正規表示式又稱正規表示式、常規表示式。在程式碼中常簡寫為regex、regexp或RE。正規表示式是使用單個字串來描述,匹配一系列符合某個句法規則的字串,簡單來說,是一種匹配字串的方法,通過一些特殊符號,實現快速查詢、刪除、替換某個特定字串。
正規表示式是由普通字元與元字元組成的文字模式。模式用於描述在搜尋文字時要匹配的一個或多個字串。正規表示式作為一個模板,將某個字元模式與所搜尋的字串進行匹配。其中普通字元包括大小寫字母、數位、標點符號及一些其他符號,元字元則是指那些在正規表示式中具有特殊意義的專用字元,可以用來規定其前導字元(即位於元字元前面的字元)在目標物件中的出現模式。

1、基礎正規表示式

正規表示式的字串表達方法根據不同的嚴謹程度與功能分為基本正規表示式與擴充套件正規表示式。基礎正規表示式是常用的正規表示式的最基礎的部分。在Linux系統中常見的檔案處理工具中grep與sed支援基礎正規表示式,而egrep與awk支援擴充套件正規表示式。

提前準備一個名為test.txt的測試檔案,檔案具體內容如下:

[root@CentOS01 ~]# vim test.txt
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
PI=3.14148223023840-2382924893980--2383892948
a wood cross!
Actions speak louder than words

#wooood #
#woooood #
AxyzxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

1)基礎正規表示式範例:

[root@centos01 ~]# grep -n 'the' test.txt         <!--查詢特定字元,-n顯示行號-->
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.
[root@centos01 ~]# grep -in 'the' test.txt    <!--查詢特定字元,-in顯示行號不區分大小寫-->
3:The home of Football on BBC Sport online.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.
[root@centos01 ~]# grep -vn 'the' test.txt    <!--查詢不包括特定字元的行,-vn選項實現-->
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
7:PI=3.14148223023840-2382924893980--2383892948
8:a wood cross!
9:Actions speak louder than words
10:
11:
12:#wooood #
13:#woooood #
14:AxyzxyzxyzxyzxyzC
15:I bet this place is really spooky late at night!
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.

2)grep利用中括號“[]”來查詢集合字元

[root@centos01 ~]# grep -n 'sh[io]rt' test.txt      <!--中括號來查詢集合字元,
“[]”中無論有幾個字元,都僅代表一個字元,
也就是說“[io]”表示匹配“i”或者“o”-->
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants.
[root@centos01 ~]# grep -n 'oo' test.txt     <!--查詢重複單個字元-->
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
12:#wooood #
13:#woooood #
15:I bet this place is really spooky late at night!
[root@centos01 ~]# grep -n '[^w]oo' test.txt   <!--查詢“oo”前面不是“w”的字串,
使用“[^]”選項實現-->
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
12:#wooood #
13:#woooood #
15:I bet this place is really spooky late at night!
[root@centos01 ~]# grep -n '[^a-z]oo' test.txt        <!--查詢“oo”前面不存在小寫字母-->
3:The home of Football on BBC Sport online.
[root@centos01 ~]# grep -n '[0-9]' test.txt        <!--查詢包含數位的行-->
4:the tongue is boneless but it breaks bones.12!
7:PI=3.14148223023840-2382924893980--2383892948

3)grep查詢行首“^”與行尾字元“$”

[root@centos01 ~]# grep -n '^the' test.txt      <!--查詢以“the”字串為行首的行-->
4:the tongue is boneless but it breaks bones.12!
[root@centos01 ~]# grep -n '^[a-z]' test.txt      <!--查詢以小寫字母為行首的行 -->
1:he was short and fat.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
8:a wood cross!
[root@centos01 ~]# grep -n '^[A-Z]' test.txt       <!--查詢以大寫字母為行首的行-->
2:He was wearing a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
6:The year ahead will test our political establishment to the limit.
7:PI=3.14148223023840-2382924893980--2383892948
9:Actions speak louder than words
14:AxyzxyzxyzxyzxyzC
15:I bet this place is really spooky late at night!
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.
[root@centos01 ~]# grep -n '^[^a-zA-Z]' test.txt    <!--查詢不以字母開頭的行-->
12:#wooood #
13:#woooood #
[root@centos01 ~]# grep -n 'w..d' test.txt      <!--查詢任意一個字元“.”與重複字元“*”-->
5:google is the best tools for search keyword.
8:a wood cross!
9:Actions speak louder than words
[root@centos01 ~]# grep -n 'ooo*' test.txt     <!--檢視包含至少兩個o以上的字串-->
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
11:#woood #
13:#woooooood #
19:I bet this place is really spooky late at night!
[root@centos01 ~]# grep -n 'woo*d' test.txt      <!--查詢w開頭d結尾,中間至少包含一個o的字串-->
8:a wood cross!
11:#woood #
13:#woooooood #
[root@centos01 ~]# grep -n '[0-9][0-9]*' test.txt   <!--查詢任意數位所在行-->
4:the tongue is boneless but it breaks bones.12!
7:PI=3.141592653589793238462643383249901429
[root@centos01 ~]# grep -n 'o{2}' test.txt       <!--查詢連續兩個o的字元“{}”-->
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
11:#woood #
13:#woooooood #
19:I bet this place is really spooky late at night!

2、元字元總結

<img " src="https://s1.51cto.com/images/blog/201911/09/07a1db13ccef928a82d18582046e1a41.png" alt="Shell指令碼中的正規表示式" />

二、擴充套件正規表示式元字元

<img " src="https://s1.51cto.com/images/blog/201911/09/6a07d0597ea832b49889593b878078a2.png" alt="Shell指令碼中的正規表示式" />

三、文字處理器

在Linux/UNIX系統中包含很多種文字處理器或文字編輯器,其中包括VIM編輯器與grep等。而grep,sed,awk更是shell程式設計中經常用到的文字處理工具,被稱為shell程式設計三劍客。

1、sed工具

sed(Stream EDitor)是一個強大而簡單的文字解析轉換工具,可以讀取文字,並根據指定的條件對文字內容進行編輯(刪除、
替換、新增、移動等),最後輸出所有行或者僅輸出處理的某些行。sed也可以在無互動的情況下實現相當複雜的文字處理操作,被廣泛應用於shell指令碼中,用以完成各種自動化處理任務。

sed的工作流程主要包括讀取、執行和顯示三個過程:

  • 讀取:sed從輸入流(檔案、管道、標準輸入)中讀取一行內容並儲存到臨時的緩衝區中(又稱模式空間,patterm space)。
  • 執行:預設情況下,所有的sed命令都在模式空間中順序地執行,除非指定了行的地址,否則sed命令將會在所有的行上依次執行。
  • 顯示:傳送修改後的內容到輸出流。再傳送資料後,模式空間將會被清空。在所有的檔案內容都被處理完成之前,上述過程將重複執行,直到所有內容被處理完。

2、sed命令常見的用法

sed[選項] '操作'  引數
sed [選項] -f scriptfile 引數

常見的sed命令選項主要包含以下幾種:

  • -e或--expression=:表示用指定命令或者指令碼來處理輸入的文字檔案。
  • -f或--file=:表示用指定的指令碼檔案來處理輸入的文字檔案。
  • -h或--help:顯示幫助。
  • -n、--quiet或silent:表示僅顯示處理後的結果。
  • -i:直接編輯文字檔案。
    “操作”用於指定對檔案操作的動作行為,也就是sed的命令。通常情況下是採用的“[n1[,n2]]”操作引數的格式。n1、n2是可選的,不一定會存在,代表選擇進行操作的行數,如操作需要在5~20行之間進行,則表示為“5,20動作行為”。常見的操作包括以下幾種:
  • a:增加,在當前行下面增加一行指定內容。
  • c:替換,將選定行替換為指定內容。
  • d:刪除,刪除選定的行。
  • i:插入,在選定行上面插入一行指定內容。
  • p:列印,如果同時指定行,表示列印指定行;如果不指定行,則表示列印所有內容;如果有非列印字元,則以ASCII碼輸出。其通常與“-n”選項一起使用。
  • s:替換,替換指定字元。
  • y:字元轉換。

3、用法範例

1)輸出符號條件的文字(p表示正常輸出)

[root@centos01 ~]# sed -n '3p' test.txt        <!--輸出第三行-->
The home of Football on BBC Sport online.
[root@centos01 ~]# sed -n '3,5p' test.txt  <!--輸出第三行到第五行-->
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
[root@centos01 ~]# sed -n 'p;n' test.txt       <!--輸出所有奇數行-->
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.
PI=3.141592653589793238462643383249901429
Actions speak louder than words
#woood #
#woooooood #

I bet this place is really spooky late at night!
I shouldn't have lett so tast.
[root@centos01 ~]# sed -n 'p;n' test.txt     <!--輸出所有偶數行-->
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.
PI=3.141592653589793238462643383249901429
Actions speak louder than words
#woood #
#woooooood #

I bet this place is really spooky late at night!
I shouldn't have lett so tast.
[root@centos01 ~]# sed -n '1,5{p;n}' test.txt <!--輸出第一行到第五行之間的奇數行 -->
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.

[root@centos01 ~]# sed -n '10,${n;p}' test.txt       <!--輸出第10行至檔案尾之間的偶數行-->
#woood #
#woooooood #

I bet this place is really spooky late at night!
I shouldn't have lett so tast.

2)Sed命令結合正規表示式

[root@centos01 ~]# sed -n '/the/p' test.txt <!--輸出包含the的行-->
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
[root@centos01 ~]# sed -n '4,/the/p' test.txt<!--輸出從第4行至第一個包含the的行-->
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
[root@centos01 ~]# sed -n '/the/=' test.txt       <!--輸出包含the的行所在的行號,
等號(=)用來輸出行號-->
4
5
6
[root@centos01 ~]# sed -n '/^PI/p' test.txt <!--輸出以PI開頭的行-->
PI=3.141592653589793238462643383249901429
[root@centos01 ~]# sed -n '/<wood>/p' test.txt  <!--輸出包含單詞wood的行,
<、>代表單詞邊界-->
a wood cross!

3)刪除符合條件的檔案(d)

[root@centos01 ~]# nl test.txt | sed '3d'    <!--刪除第3行-->
     1  he was short and fat.
     2  He was wearing a blue polo shirt with black pants.
     4  the tongue is boneless but it breaks bones.12!
     5  google is the best tools for search keyword.
     6  The year ahead will test our political establishment to the limit.
     7  PI=3.141592653589793238462643383249901429
     8  a wood cross!
     9  Actions speak louder than words
    10  
    11  #woood #
    12  
    13  #woooooood #
    14  
    15  
    16  AxyzxyzxyzxyzC
    17  
    18  
    19  I bet this place is really spooky late at night!
    20  Misfortunes never come alone/single.
    21  I shouldn't have lett so tast.
[root@centos01 ~]# nl test.txt | sed '3,5d'   <!--刪除第3~5行-->
     1  he was short and fat.
     2  He was wearing a blue polo shirt with black pants.
     6  The year ahead will test our political establishment to the limit.
     7  PI=3.141592653589793238462643383249901429
     8  a wood cross!
     9  Actions speak louder than words
    10  
    11  #woood #
    12  
    13  #woooooood #
    14  
    15  
    16  AxyzxyzxyzxyzC
    17  
    18  
    19  I bet this place is really spooky late at night!
    20  Misfortunes never come alone/single.
    21  I shouldn't have lett so tast.
[root@centos01 ~]# sed '/^[a-z]/d' test.txt <!--刪除以小寫字母開頭的行-->
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
The year ahead will test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
Actions speak louder than words

#woood #

#woooooood #

AxyzxyzxyzxyzC

I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

4)替換符合條件的文字

[root@centos01 ~]# sed 's/the/THE/' test.txt <!--將每行中的第一個the替換為THE-->
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
THE tongue is boneless but it breaks bones.12!
google is THE best tools for search keyword.
The year ahead will test our political establishment to THE limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words

#woood #

#woooooood #

AxyzxyzxyzxyzC

I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.
[root@centos01 ~]# sed 's/l/L/2' test.txt <!--將每行中的第三個l替換為L-->
he was short and fat.
He was wearing a blue poLo shirt with black pants.
The home of FootbalL on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tooLs for search keyword.
The year ahead wilL test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words

#woood #

#woooooood #

AxyzxyzxyzxyzC

I bet this place is reaLly spooky late at night!
Misfortunes never come alone/singLe.
I shouldn't have Lett so tast.
[root@centos01 ~]# sed 's/^/#/' test.txt  <!--在每行行首插入#號-->
#he was short and fat.
#He was wearing a blue polo shirt with black pants.
#The home of Football on BBC Sport online.
#the tongue is boneless but it breaks bones.12!
#google is the best tools for search keyword.
#The year ahead will test our political establishment to the limit.
#PI=3.141592653589793238462643383249901429
#a wood cross!
#Actions speak louder than words
#
##woood #
#
##woooooood #
#
#
#AxyzxyzxyzxyzC
#
#
#I bet this place is really spooky late at night!
#Misfortunes never come alone/single.
#I shouldn't have lett so tast.
[root@centos01 ~]# sed '/the/s/o/0/g' test.txt  <!--將包含the的所有行中的o都替換為0-->
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the t0ngue is b0neless but it breaks b0nes.12!
g00gle is the best t00ls f0r search keyw0rd.
The year ahead will test 0ur p0litical establishment t0 the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words

#woood #

#woooooood #

AxyzxyzxyzxyzC

I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

3、awk工具

在Linux/UNIX系統中,awk是一個功能強大的編輯工具,逐行讀取輸入文字,並根據指定的匹配模式進行查詢,對符合條件的內容進行格式化輸出或者過濾處理,可以在無互動的情況下實現相當複雜的文字操作,被廣泛應用於Shell指令碼,完成各種自動化設定任務。

1)awk常見用法

通常情況下awk所使用的命令格式如下所示,其中,單引號加上大括號“{}”用於設定對資料進行的處理動作。awk可以直接處理目標檔案也可以通過“-f”讀取指令碼對目標檔案進行處理。

awk 選項  '模式或條件 {編輯指令}' 檔案1 檔案2 ......
awk -f 指令碼檔案 檔案1 檔案2 ...

awk包含幾個特殊的內建變數(可直接用)如下所示:

  • NF:當前處理的行的欄位個數。
  • FS:指定每行文字的欄位分隔符,預設為空格或製表位。
  • NR:當前處理的行的欄位個數。
  • $0:當前處理的行的整行內容。
  • FILENAME:被處理的檔名。
  • RS:資料記錄分隔,預設為n,即每行為一條記錄。

2)用法範例

[root@centos01 ~]# awk '{print}' test.txt  <!--輸出所有內容-->
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words

#woood #

#woooooood #

AxyzxyzxyzxyzC

I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.
[root@centos01 ~]# awk 'NR==1,NR==3{print}' test.txt <!--輸出1~3行內容-->
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
[root@centos01 ~]# awk '(NR%2)==1{print}' test.txt   <!--輸出所有奇數行的內容-->
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.
PI=3.141592653589793238462643383249901429
Actions speak louder than words
#woood #
#woooooood #

I bet this place is really spooky late at night!
I shouldn't have lett so tast.
[root@centos01 ~]# awk '(NR%2)==0{print}' test.txt   <!--輸出所有偶數行內容-->
He was wearing a blue polo shirt with black pants.
the tongue is boneless but it breaks bones.12!
The year ahead will test our political establishment to the limit.
a wood cross!

AxyzxyzxyzxyzC

Misfortunes never come alone/single.
[root@centos01 ~]# awk '/^root/{print}' /etc/passwd  <!--輸出以root開頭的行-->
root:x:0:0:root:/root:/bin/bash
[root@centos01 ~]# awk '{print $1 $3}' test.txt <!--輸出每行中的第1、3個欄位-->
heshort
Hewearing
Theof
theis
googlethe
Theahead
PI=3.141592653589793238462643383249901429
across!
Actionslouder

#woood

#woooooood

AxyzxyzxyzxyzC

Ithis
Misfortunescome
Ihave

IT145.com E-mail:sddin#qq.com