首頁 > 軟體

Linux流量控制(TC)之表面

2020-06-16 16:31:02

1.1 流量控制是什麼

? 流量控制是路由器上報文的接收和傳送機制及排隊系統的統稱。這包括在一個輸入介面上決定以何種速率接收何種報文,在一個輸出介面上以何種速率、何種順序輸出何種報文。

? 傳統的流量控制涉及到整流(sharping),排程(scheduling), 分類(classifying),監管(policing),dropping(丟棄), 標記(marking)等工作。

  • 整流。整流器通過延遲封包來使流量保持在一定速率。整流就是讓包在輸出佇列上被傳送之前進行延時,然後一定的速率傳送,使網路流量保持在一定的速率之下,這是大部分使用者進行流量控制的目的。
  • 排程。排程就是對佇列中的輸入輸出報文進行排列。最常的排程方法就是FIFO(先進先出),更廣泛的來說,在輸出佇列上的任何流量控制都可以被稱作排程,因為報文被排列以被輸出。
  • 分類。分類就是將流量進行劃分以便區別處理,例如拆分後放到不同的輸出佇列中。在報文的接收、路由、傳送過程中,網路裝置可以用多種方式來分類報文。分類包括對報文進行標記,標記可以在邊際網路中由一個單一的控制單元來完成,也可以在每一跳中都進行標記。
  • 監管。監管作為流量控制的一部分,就是用於限制流量。監管常用於網路邊際裝置,使某個節點不能使用多於分配給它的頻寬。監管器以特定的速率接收封包,當流量超過這一速率時就對接收的封包執行相應的動作。最嚴格的動作就是丟棄封包,儘管該封包可以被重新分類。
  • 丟棄。丟棄就是通過某種機制來選擇哪個封包被丟掉。如RED。
  • 標記。標記流量控制在封包中插入了DSCP部分,在一個可管理網路中,其可被其它路由器利用和識別(通常用於DiffServ,差分服務)。

1.2 為什麼需要流量控制

? 分組交換網路和電路交換網路的一個重要不同之處是:分組交換網路是無狀態的,而電路交換網路(比如電話網)必須保持其狀態。分組交換網路和IP網路一樣被設計成無狀態的,實際上,無狀態是IP的一個根本優勢。

? 無狀態的缺陷是不能對不同型別資料流進行區分。但通過流量控制,管理員就能夠基於報文的屬性對其進行排隊和區別。它甚至能夠被用於類比電路交換網路,將無狀態網路模擬成有狀態網路。

? 有很多實際的理由去考慮使用流量控制,並且流量控制也有很多有意義的應用場景。下面是一些利用流量控制可以解決或改善的問題的例子,下面的列表不是流量控制可以解決的問題的完整列表,此處僅僅介紹了一些能通過流量控制來解決的幾類問題

常用的流量控制解決方案

  • 通過TBF和帶子分類的HTB將頻寬限制在一個數值之下
  • 通過HTB分類(HTB class)和分類(classifying)並配合filter,來限制指定使用者、服務或用戶端的頻寬。
  • 通過提升ACK報文的優先順序,以及使用wondershaper來最大化非對稱線路上的TCP吞吐量。
  • 通過帶子分類的HTB和分類(classifying)為某個應用或使用者保留頻寬。
  • 通過HTB分類(HTB class)中的(優先順序)PRIO機制來提高延時敏感型應用的效能。
  • 通過HTB的租借機制來管理多餘的頻寬。
  • 通過HTB的租借機制來實現所有頻寬的公平分配。
  • 通過監管器(policer)加上帶丟棄動作的過濾器(filter)來使某種型別的流量被丟棄。

1.3 如何進行流量控制

1.3.1 流量控制一般組成

一個流量控制系統,根據需要實現的功能,大致包含一下幾個元件:

  • 排程器
  • 分類器(可選)
  • 監管器
  • 過濾器

其中,分類器不是必須的,如一些無類流量控制系統。下表是Linux中的對應實現的元件概念。

traditional elementLinux component
shaping The class offers shaping capabilities.
scheduling A qdisc is a scheduler. Schedulers can be simple such as the FIFO or complex, containing classes and other qdiscs, such as HTB.
classifying The filter object performs the classification through the agency of a classifier object. Strictly speaking, Linux classifiers cannot exist outside of a filter.
policing A policer exists in the Linux traffic control implementation only as part of a filter.
dropping To drop traffic requires a filter with a policer which uses "drop" as an action.
marking The dsmark qdisc is used for marking.

1.3.2 Linux TC

Linux TC包含了強大的流控各方面的功能。在使用之前,先簡單了解一下其中的邏輯。

Linux TC流量控制的相關名詞解釋:

  • Queueing Discipline (qdisc)

    An algorithm that manages the queue of a device, either incoming (ingress) or outgoing (egress).

  • root qdisc

    The root qdisc is the qdisc attached to the device.

  • Classless qdisc

    A qdisc with no configurable internal subdivisions.

  • Classful qdisc

    A classful qdisc contains multiple classes. Some of these classes contains a further qdisc, which may again be classful, but need not be. According to the strict definition, pfifo_fast is classful, because it contains three bands which are, in fact, classes. However, from the user's configuration perspective, it is classless as the classes can't be touched with the tc tool.

  • Classes

    A classful qdisc may have many classes, each of which is internal to the qdisc. A class, in turn, may have several classes added to it. So a class can have a qdisc as parent or an other class. A leaf class is a class with no child classes. This class has 1 qdisc attached to it. This qdisc is responsible to send the data from that class. When you create a class, a fifo qdisc is attached to it. When you add a child class, this qdisc is removed. For a leaf class, this fifo qdisc can be replaced with an other more suitable qdisc. You can even replace this fifo qdisc with a classful qdisc so you can add extra classes.

  • Classifier

    Each classful qdisc needs to determine to which class it needs to send a packet. This is done using the classifier.

  • Filter

    Classification can be performed using filters. A filter contains a number of conditions which if matched, make the filter match.

  • Scheduling

    A qdisc may, with the help of a classifier, decide that some packets need to go out earlier than others. This process is called Scheduling, and is performed for example by the pfifo_fast qdisc mentioned earlier. Scheduling is also called 'reordering', but this is confusing.

  • Shaping

    The process of delaying packets before they go out to make traffic confirm to a configured maximum rate. Shaping is performed on egress. Colloquially, dropping packets to slow traffic down is also often called Shaping.

  • Policing

    Delaying or dropping packets in order to make traffic stay below a configured bandwidth. In Linux, policing can only drop a packet and not delay it - there is no 'ingress queue'.

  • Work-Conserving

    A work-conserving qdisc always delivers a packet if one is available. In other words, it never delays a packet if the network adaptor is ready to send one (in the case of an egress qdisc).

  • non-Work-Conserving

    Some queues, like for example the Token Bucket Filter, may need to hold on to a packet for a certain time in order to limit the bandwidth. This means that they sometimes refuse to pass a packet, even though they have one available.

1.3.2 Linux TC詳解

首先需要注意的是:Linux tc只對egress方向實現了良好的控制,而對ingress方向控制有限,簡而言之,控發不控收。

下面看實現中的幾個重要概念:

  • 佇列。佇列是流控的基礎概念。通過使用佇列和其他機制,可以進行整流,排程等工作。

  • 令牌桶。這是個非常重要的因素。為了控制出隊的速率,一種方式就是直接統計佇列中出隊的報文或位元組數,但是為了保證精確性就需要複雜的計算。在流量控制中廣泛應用的另一種方式就是令牌桶,令牌桶以一定的速率產生令牌,報文或位元組出隊時從令牌桶中取令牌,只有取到令牌後才能出隊。

    我們可以打一個比方,一群人正排隊等待乘坐遊樂場的遊覽車。讓我們想象現在有一條固定的道路,遊覽車以固定的速度抵達,每個人都必須等待遊覽車到達後才能乘坐。遊覽車和遊客就可以類比為令牌和報文,這種機制就是速率限制或流量整形,在一個固定的時間段內只有一部分人能乘坐遊覽車。

    繼續上面的比方,設想有大量的遊覽車正停在車站等待遊客乘坐,但現在沒有一個遊客。如果現在有一大群遊客同時過來了,那麼他們都可以馬上乘上遊覽車。在這裡,我們就可以將車站類比為桶,一個桶中包含一定數量的令牌,桶中的令牌可以一次性被使用完而不管封包到達的時間。

    讓我們來完成這個比方,遊覽車以固定的速率抵達車站,如果沒人乘坐就會停滿車站,即令牌以一定的速率進入桶中,如果令牌一直沒被使用那麼桶就可以被裝滿,而如果令牌不斷的被使用那麼桶就不會滿。令牌桶是處理會產生流量突發應用(比如HTTP)的關鍵思想。

    使用令牌桶過濾器的排隊規則(TBF qdisc,Token Bucket Filter)是流量整形的一個經典例子(在TBF小節中有一個圖表,通過該圖表可以形象化的幫助讀者理解令牌桶)。TBF以給定的速度產生令牌,當桶中有令牌時才傳送資料,令牌是整流的基本思想。

Linux tc中主要的元件是qdisc, class, filter。

  • qdisc包含classful qdisc和classless disc。兩者的區別是glassful qdisc可以包含多個分類,可以更加精細的控制流量。

    • 常見的classless qdisc有:choke, codel, p/bfifo,fq, fq_codel, gred, hhf, ingress,mqprio, multiq, netem, pfifo_fast, pie, red, rr, sfb, sfq, tbf。linux預設使用的就是fifo_fast。

    • 常見的classful qdisc有:ATM, CBQ, DRR, DSMARK, HFSC, HTB, PRIO, QFQ

  • 分類只存在於可分類排隊規則(classful qdisc)(例如,HTB和CBQ)中。分類可以很複雜,它可以包含多個子分類,也可以只包含一個子qdisc。在超級複雜的流量控制應用場景中,一個類中再包含一個可分類qdisc也是可以的。

    任何一個分類都可以和任意多個filter相關聯,這樣就可以選擇一個子分類或運用一個filter來重新排列或丟棄進入分類中的封包。

    葉子分類是qdisc中的最後一個分類,它包含一個qdisc(預設是pfifo)並且不包含任意子分類。任何包含子分類的分類都是內部分類而不是子分類。

  • Linux的過濾器可以允許使用者利用一個或多個過濾器將封包分類至輸出佇列上。它包含了一個分類器實現,常見的分類器如u32,u32分類器可以允許使用者基於封包的屬性來選擇封包。

無論是qdisc,還是class, 都需要有一個唯一識別符號。就是所說的控制代碼。它們都採用major:minor格式來命名,注意他們都是以十六進位制解析。對於他們的使用,在栗子中會做具體說明。

接下來我們主要介紹一下classful qdisc的情況。看一下封包的流程。

  • flow within classful qdisc & class

    When traffic enters a classful qdisc, it needs to be sent to any of the classes within - it needs to be 'classified'. To determine what to do with a packet, the so called 'filters' are consulted. It is important to know that the filters are called from within a qdisc, and not the other way around!

    The filters attached to that qdisc then return with a decision, and the qdisc uses this to enqueue the packet into one of the classes. Each subclass may try other filters to see if further instructions apply. If not, the class enqueues the packet to the qdisc it contains.

    Besides containing other qdiscs, most classful qdiscs also perform shaping. This is useful to perform both packet scheduling (with SFQ, for example) and rate control. You need this in cases where you have a high speed interface (for example, ethernet) to a slower device (a cable modem).

  • How filters are used to classify traffic

    Recapping, a typical hierarchy might look like this:

                     1:   root qdisc
                      |
                     1:1    child class
                   /  |  
                  /   |   
                 /    |    
                 /    |    
              1:10  1:11  1:12   child classes
               |      |     | 
               |     11:    |    leaf class
               |            | 
               10:         12:   qdisc
              /          /   
           10:1  10:2   12:1  12:2   leaf classes

? But don't let this tree fool you! You should not imagine the kernel to be at the apex of the tree and the network below, that is just not the case. Packets get enqueued and dequeued at the root qdisc, which is the only thing the kernel talks to.

? A packet might get classified in a chain like this: 1: -> 1:1 -> 1:12 -> 12: -> 12:2

? The packet now resides in a queue in a qdisc attached to class 12:2. In this example, a filter was attached to each 'node' in the tree, each choosing a branch to take next. This can make sense. However, this is also possible: 1: -> 12:2

? In this case, a filter attached to the root decided to send the packet directly to 12:2.

  • How packets are dequeued to the hardware

    When the kernel decides that it needs to extract packets to send to the interface, the root qdisc 1: gets a dequeue request, which is passed to 1:1, which is in turn passed to 10:, 11: and 12:, each of which queries its siblings, and tries to dequeue() from them. In this case, the kernel needs to walk the entire tree, because only 12:2 contains a packet.

    In short, nested classes ONLY talk to their parent qdiscs, never to an interface. Only the root qdisc gets dequeued by the kernel!

    The upshot of this is that classes never get dequeued faster than their parents allow. And this is exactly what we want: this way we can have SFQ in an inner class, which doesn't do any shaping, only scheduling, and have a shaping outer qdisc, which does the shaping.

1.3.3 HTB的設定使用

HTB是一種classful qdisc,是一種分層分類流控方法,是Linux常用的一種流控設定。接下來就來看一下使用設定:

設定HTB需要四個步驟:

  • 建立root qdisc
  • 建立class
  • 建立filter,關聯到class
  • 新增leaf class disc(非必需)
#tc qdisc add dev eth0 root handle 1: htb default 30 //新增root qdisc, 1:是 1:0的簡寫
#tc class add dev eth0 parent 1: classid 1:1 htb rate 6mbit burst 15k //以根1:為根,建立class
#tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit burst 15k 
#tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 6mbit burst 15k 
#tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 6mbit burst 15k 
#tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10 //為leaf class新增qdisc,預設為pfifo
#tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10 
#tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10 
# 新增過濾器 , 直接把流量導向相應的類 : 
#U32="tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32"
#$U32 match ip dport 80 0xffff flowid 1:10 //關聯filter到class
#$U32 match ip sport 25 0xffff flowid 1:20

其中建立class時,其中的引數意義如下:

default

這是HTB排隊規則的一個可選引數,預設值為0, 當值為0時意味著會繞過所有和rootqdisc相關聯的分類,然後以最大的速度出隊任何未分類的流量。

rate

這個引數用來設定流量傳送的最小期望速率。這個速率可以被當作承諾資訊速率(CIR), 或者給某個葉子分類的保證頻寬。

ceil

這個引數用來設定流量傳送的最大期望速率。租藉機制將會決定這個引數的實際用處。 這個速率可以被稱作“突發速率”。

burst

這個引數是rate桶的大小(參見令牌桶這一節)。HTB將會在更多令牌到達之前將burst個位元組的封包出隊。

cburst

這個引數是ceil桶的大小(參見令牌桶這一節)。HTB將會更多令牌(ctoken)到達之前將cburst個位元組的封包出隊。

quantum

這個是HTB控制租借機制的關鍵引數。正常情況下,HTB自己會計算合適的quantum值,而不是由使用者來設定。對這個值的輕微調整都會對租借和整形造成巨大的影響,因為HTB不僅會根據這個值向各個子分類分發流量(速率應高於rate,小於ceil),還會根據此值輸出各個子分類中的資料。

r2q

通常,quantum 的值由HTB自己計算,使用者可以通過此引數設定一個值來幫助HTB為某個分類計算一個最優的quantum值。

mtu

prio

1.3.4 入向流控

入向的流控常見做法是通過把介面的流量重定向到ifb裝置,然後在ifb的egress上做流控,間接達到控制入向的目的。簡單的使用範例如下:

#modprobe ifb    //需要載入ifb模組

#ip link set dev ifb0 up txqueuelen 1000

#tc qdisc add dev eth1 ingress  //新增ingress qdisc

#tc filter add dev eth1 parent ffff: protocol ip u32 match u32 0 0flowid 1:1 action mirred egress redirect dev ifb0   //重定向流量到ifb

#tc qdisc add dev ifb0 root netem delay 50ms loss 1%  //在ifb上設定操作,這裡使用了netem,也可以和出向一樣,設定qdisc, class, filter。

1.3.5 統計檢視

  • 使用tc qdisc show dev xx 檢視qdisc
  • 使用tc class show dev xx 檢視class
  • 使用tc filter show dev xx 檢視filter,注意這裡都是檢視預設為root,即出向的規則,如果要檢視入向的,需要使用tc filter show dev xx ingress 。
The tc tool allows you to gather statistics of queuing disciplines in Linux. Unfortunately statistic results are not explained by authors so that you often can't use them. Here I try to help you to understand HTB's stats.
First whole HTB stats. The snippet bellow is taken during simulation from chapter 3.

# tc -s -d qdisc show dev eth0
 qdisc pfifo 22: limit 5p
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 

 qdisc pfifo 21: limit 5p
 Sent 2891500 bytes 5783 pkts (dropped 820, overlimits 0) 

 qdisc pfifo 20: limit 5p
 Sent 1760000 bytes 3520 pkts (dropped 3320, overlimits 0) 

 qdisc htb 1: r2q 10 default 1 direct_packets_stat 0
 Sent 4651500 bytes 9303 pkts (dropped 4140, overlimits 34251) 

First three disciplines are HTB's children. Let's ignore them as PFIFO stats are self explanatory.
overlimits tells you how many times the discipline delayed a packet. direct_packets_stat tells you how many packets was sent thru direct queue. Other stats are sefl explanatory. Let's look at class' stats:

tc -s -d class show dev eth0
class htb 1:1 root prio 0 rate 800Kbit ceil 800Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 10240 level 3 
 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) 
 rate 70196bps 141pps 
 lended: 6872 borrowed: 0 giants: 0

class htb 1:2 parent 1:1 prio 0 rate 320Kbit ceil 4000Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 4096 level 2 
 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) 
 rate 70196bps 141pps 
 lended: 1017 borrowed: 6872 giants: 0

class htb 1:10 parent 1:2 leaf 20: prio 1 rate 224Kbit ceil 800Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 2867 level 0 
 Sent 2269000 bytes 4538 pkts (dropped 4400, overlimits 36358) 
 rate 14635bps 29pps 
 lended: 2939 borrowed: 1599 giants: 0

I deleted 1:11 and 1:12 class to make output shorter. As you see there are parameters we set. Also there are level and DRR quantum informations.
overlimits shows how many times class was asked to send packet but he can't due to rate/ceil constraints (currently counted for leaves only).
rate, pps tells you actual (10 sec averaged) rate going thru class. It is the same rate as used by gating.
lended is # of packets donated by this class (from its rate) and borrowed are packets for whose we borrowed from parent. Lends are always computed class-local while borrows are transitive (when 1:10 borrows from 1:2 which in turn borrows from 1:1 both 1:10 and 1:2 borrow counters are incremented).
giants is number of packets larger than mtu set in tc command. HTB will work with these but rates will not be accurate at all. Add mtu to your tc (defaults to 1600 bytes).

1.3.6 雜項說明

  • 檢視統計資訊時,看不到統計速度rate等?核心為了效能,預設關閉了顯示,可以通過echo 1 > /sys/module/sch_htb/parameters/htb_rate_est來開啟。

ss


IT145.com E-mail:sddin#qq.com