做乳房按摩有什么好处女人如何按摩乳房下载_Word模板_4

is_998870

暂无简介

做乳房按摩有什么好处女人如何按摩乳房 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Fang Yu UC Berkeley fyu@eecs.berkeley.edu Zhifeng Chen Google Inc. zhifengc@google.com Yanlei Diao University of Massachusetts, Amherst yanlei@cs.umass.edu T...

1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Fang Yu UC Berkeley fyu@eecs.berkeley.edu Zhifeng Chen Google Inc. zhifengc@google.com Yanlei Diao University of Massachusetts, Amherst yanlei@cs.umass.edu T. V. Lakshman Bell Laboratories, Lucent Technologies lakshman@research.bell-labs.com Randy H. Katz UC Berkeley randy@eecs.berkeley.edu ABSTRACT Packet content scanning at high speed has become ex- tremely important due to its applications in network secu- rity, network monitoring, HTTP load balancing, etc. In content scanning, the packet payload is compared against a set of patterns specified as regular expressions. In this pa- per, we first show that memory requirements using tradi- tional methods are prohibitively high for many patterns used in packet scanning applications. We then propose regular expression rewrite techniques that can effectively reduce memory usage. Further, we develop a grouping scheme that can strategically compile a set of regular ex- pressions into several engines, resulting in remarkable im- provement of regular expression matching speed without much increase in memory usage. We implement a new DFA-based packet scanner using the above techniques. Our experimental results using real-world traffic and patterns show that our implementation achieves a factor of 12 to 42 performance improvement over a commonly used DFA- based scanner. Compared to the state-of-art NFA-based implementation, our DFA-based packet scanner achieves 50 to 700 times speedup. Categories and Subject Descriptors C.2.0 [Computer Communication Networks]: General – Security and protection (e.g., firewalls) General Terms Algorithms, Design, Security. Keywords Regular expressions, DFA, intrusion detection, deep packet in- spection. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ANCS'06, December 3–5, 2006, San Jose, California, USA. Copyright 2006 ACM 1-59593-580-0/06/0012...$5.00. 1. INTRODUCTION Packet content scanning (also known as Layer-7 filtering or pay- load scanning) is crucial to network security and network moni- toring applications. In these applications, the payload of packets in a traffic stream is matched against a given set of patterns to identify specific classes of applications, viruses, protocol defini- tions, etc. Currently, regular expressions are replacing explicit string patterns as the pattern matching language of choice in packet scanning applications. Their widespread use is due to their ex- pressive power and flexibility for describing useful patterns. For example, in the Linux Application Protocol Classifier (L7-filter) [1], all protocol identifiers are expressed as regular expressions. Similarly, the Snort [2] intrusion detection system has evolved from no regular expressions in its ruleset in April 2003 to 1131 out of 4867 rules using regular expressions as of February 2006. Another intrusion detection system, Bro [3], also uses regular expressions as its pattern language. As regular expressions gain widespread adoption for packet content scanning, it is imperative that regular expression matching over the packet payload keep up with the line-speed packet header processing. Unfortunately, this requirement cannot be met in many existing payload scanning implementations. For example, when all 70 protocol filters are enabled in the Linux L7-filter [1], we found that the system throughput drops to less than 10Mbps, which is well below current LAN speeds. Moreover, over 90% of the CPU time is spent in regular expression matching, leaving little time for other intrusion detection or monitoring functions. On the other hand, although many schemes for fast string match- ing [4-11] have been developed recently in intrusion detection systems, they focus on explicit string patterns only and can not be easily extended to fast regular expression matching. The inefficiency in regular expression matching is largely due to the fact that the current solutions are not optimized for the fol- lowing three unique complex features of regular expressions used in network packet scanning applications. • First, many such patterns use multiple wildcard metachar- acters (e.g., ‘.’, ‘*’). For example, the pattern for identifying the Internet radio protocol, “membername.*session.*player”, has two wildcard fragments “.*”. Some patterns even contain over ten such wildcard fragments. As regular expressions are converted into state machines for pattern matching, large 2 numbers of wildcards can cause the corresponding Determi- nistic Finite Automaton (DFA) to grow exponentially. • Second, a majority of the wildcards are used with length restrictions (‘?’, ‘+’). As we shall show later in the paper, such length restrictions can increase the resource needs for expression matching. • Third, groups of characters are also commonly used: for example, the pattern for matching the ftp protocol, “^220[\x09-\x0d -~]*ftp”, contains a class (inside the brack- ets) that includes all the printing characters and space charac- ters. The class of characters may intersect with other classes or wildcards. Such interaction can result in a highly complex state machine. To the best of our knowledge, there has not been any detailed study of optimizations for these kinds of regular expressions as they are so specific to network packet scanning applications. In this paper, we address this gap by analyzing these regular expres- sions and developing memory-efficient DFA-based solutions for high speed processing. Specifically, we make the following con- tributions: • We analyze the computational and storage cost of building individual DFAs for matching regular expressions, and iden- tify the structural characteristics of the regular expressions in networking applications that lead to exponential growth of DFAs, as presented in Section 3.2. • Based on the above analysis, we propose two rewrite rules for specific regular expressions in Section 3.3. The rewritten rules can dramatically reduce the size of resulting DFAs, making them small enough to fit in memory. We prove that the patterns after rewriting are equivalent to the original ones for detecting non-overlapping patterns. While we do not claim to handle all possible cases of dramatic DFA growth (in fact the worse case cannot be improved), our rewrite rules do cover those patterns present in common payload scanning rulesets like Snort and Bro, thus making fast DFA-based pat- tern matching feasible for today’s payload scanning applica- tions. • We further develop techniques to intelligently combine mul- tiple DFAs into a small number of groups to improve the matching speed in Section 4, while avoiding the exponential growth in the number of states in memory. We demonstrate the effectiveness of our rewriting and group- ing solutions through a detailed performance analysis using real- world payload scanning pattern sets. As the results show, our DFA-based implementation can increase the regular expression matching speed on the order of 50 to 700 times over the NFA- based implementation used in the Linux L7-filter and Snort sys- tem. It can also achieve 12-42 times speedup over a commonly used DFA-based parser. The pattern matching speed can achieve gigabit rates for certain pattern sets. This is significant for im- plementing fast regular expression matching of the packet payload using network processors or general-purpose processors, as the ability to more quickly and efficiently classify enables many new technologies like real-time worm detection, content lookup in overlay networks, fine-grained load balancing, etc. 2. PROBLEM STATEMENT In this section, we first discuss regular expressions used in packet payload scanning applications, then present the possible solutions for regular expression matching, and finally define the specific problem that we address in this paper. 2.1 Regular Expression Patterns A regular expression describes a set of strings without enumerat- ing them explicitly. Table 1 lists the common features of regular expression patterns used in packet payload scanning. For exam- ple, consider a regular expression from the Linux L7-filter [1] for detecting Yahoo traffic: “^(ymsg|ypns|yhoo).?.?.?.?.?.?.?[lwt].*\xc0\x80”. This pattern matches any packet payload that starts with ymsg, ypns, or yhoo, followed by seven or fewer arbitrary characters, and then a letter l, w or t, and some arbitrary characters, and finally the ASCII letters c0 and 80 in the hexadecimal form. Table 2 compares the regular expressions used in two net- working applications, Snort and the Linux L7-filter, against those used in emerging Extensible Markup Language (XML) filtering applications [12, 13] where regular expressions are matched over text documents encoded in XML. We notice three main differ- ences: (1) While both types of applications use wildcards (‘.’, ‘?’, ‘+’, ‘*’), the patterns for packet scanning applications contain larger numbers of them in each pattern; (2) classes of characters (“[]”) are used only in packet scanning applications; (3) a high percentage of patterns in packet payload scanning applications have length restrictions on some of the classes or wildcards, while such length restrictions usually do not occur in XML filtering. This shows that compared to the XML filtering applications, net- work packet scanning applications face additional challenges These challenges lead to a significant increase in the complexity of regular expression matching, as we shall show later in this paper. Table 1. Features of Regular Expressions Syntax Meaning Example ^ Pattern to be matched at the start of the input ^AB means the input starts with AB. A pattern without ‘^’, e.g., AB, can be matched anywhere in the input. | OR relationship A|B denotes A or B. . A single character wildcard ? A quantifier denot- ing one or less A? denotes A or an empty string. * A quantifier denot- ing zero or more A* means an arbitrary number of As. {} Repeat A{100} denotes 100 As. [ ] A class of characters [lwt] denotes a letter l, w, or t. [^] Anything but [^\n] denotes any character except \n. Table 2. Comparison of regular expressions in networking applications against those in XML filtering Snort L7-filter XML filter- ing # of regular expressions analyzed 1555 70 1,000- 100,000 % of patterns starting with “^” 74.4% 72.8% ≥80% % of patterns with wildcards “., +, ?, *” 74.9% 75.7% 50% - 100% Average # of wildcards per pattern 4.65 7.03 1-2 % of patterns with class “[ ]” 31.6% 52.8% 0 Average # of classes per pattern 7.97 4.78 0 % of patterns with length restric- tions on classes or wildcards 56.3% 21.4% ≈0 3 2.2 Solution Space for Regular Expression Matching Finite automata are a natural formalism for regular expressions. There are two main categories: Deterministic Finite Automaton (DFA) and Nondeterministic Finite Automaton (NFA). In this section, we survey existing solutions using these automata. A DFA consists of a finite set of input symbols, denoted as ∑, a finite set of states, and a transition function δ [14]. In network- ing applications, ∑ contains the 28 symbols from the extended ASCII code. Among the states, there is a single start state q0 and a set of accepting states. The transition function δ takes a state and an input symbol as arguments and returns a state. A key feature of DFA is that at any time there is only one active state in the DFA. An NFA works similarly to a DFA except that the δ function maps from a state and a symbol to a set of new states. Therefore, multiple states can be active simultaneously in an NFA. A theoretical worst case study [14] shows that a single regular expression of length n can be expressed as an NFA with O(n) states. When the NFA is converted into a DFA, it may generate O(∑n) states. The processing complexity for each character in the input is O(1) in a DFA, but is O(n2) for an NFA when all n states are active at the same time. To handle m regular expressions, two choices are possible: processing them individually in m automata, or compiling them into a single automaton. The former is used in Snort [2] and Linux L7-filter [1]. The latter is proposed in recent studies [12, 13] so that the single composite NFA can support shared matching of common prefixes of those expressions. Despite the demonstrated performance gains over using m separate NFAs, in practice this approach experiences large numbers of active states. This has the same worst case complexity as the sum of m separate NFAs. Therefore, this approach on a serial processor can be slow, as given any input character, each active state must be serially exam- ined to obtain new states. In DFA-based systems, compiling m regular expressions into a composite DFA provides guaranteed performance benefit over running m individual DFA. Specifically, a composite DFA re- duces processing cost from O(m) (O(1) for each automaton) to O(1), i.e., a single lookup to obtain the next state for any given character. However, the number of states in the composite automaton grows to O(∑mn) in the theoretical worst case. In fact, we will show in Section 4 that typical patterns in packet payload scanning applications indeed interact with each other and can cause the creation of an exponential number of states in the com- posite DFA. Table 3. Worst case comparisons of DFA and NFA One regular expression of length n m regular expressions compiled together Processing complexity Storage cost Processing complexity Storage cost NFA O(n2) O(n) O(n2m) O(nm) DFA O(1) O(∑n) O(1) O(∑nm) There is a middle ground between DFA and NFA called lazy DFA. Lazy DFA are designed to reduce memory consumption of conventional DFA [12, 15]: a lazy DFA keeps a subset of the DFA that matches the most common strings in memory; for un- common strings, it extends the subset from the corresponding NFA at runtime. As such, a lazy DFA is usually much smaller than the corresponding fully-compiled DFA and provides good performance for common input strings. Bro intrusion detection systems [3] adopt this approach. However, malicious senders can easily construct packets that keep the system busy and slow down the matching process. Field Programmable Gate Arrays (FPGAs) provide a high degree of parallelism and thus can be used to speed up the regular expression matching process. There are existing FPGA solutions that build circuits based on DFA [16] or NFA [17-19]. Recently, Bordie et al. presented a new architecture for matching regular expressions using FPGA [28]. In this architecture, the finite state machine can take multiple bytes at a time to get very high throughput. These approaches are promising if the extra FPGA hardware can be embedded in the packet processors. FPGAs, however, are not available in many applications; in such situa- tions, a network processor or general-purpose CPU-based imple- mentation may be more desirable. 2.3 Problem statement In this paper, we seek a fast and memory-efficient solution to regular expression matching for packet payload scanning. We define the scope of the problem as follows: • We consider DFA-based approaches in this paper, as NFA- based approaches are inefficient on serial processors or proc- essors with limited parallelism (e.g., multi-core CPUs in comparison to FPGAs). Our goal is to achieve O(1) compu- tation cost for each incoming character, which cannot be ac- complished by any existing DFA-based solutions due to their excessive memory usage. Thus, the focus of the study is to reduce memory overhead of DFA while approaching the op- timal processing speed of O(1) per character. • We focus on general-purpose processor-based architectures and explore the limits of regular expression matching in this environment. Wherever appropriate, we leverage the trend of multi-core processors that are becoming prevalent in those architectures. Nevertheless, our results can be used in FPGA- based and ASIC-based approaches as well [20]. It is worth noting that there are two sources of memory usage in DFAs: states and transitions. The number of transitions is linear with respect to the number of states because for each state there can be at most 28 (for all ASCII characters) links to next states. Therefore, we consider the number of states (in minimized DFA) as the primary factor for determining the memory usage in the rest of the paper. Recently, Kumar et al. extended our work and pro- posed algorithms to do efficient link compression [20]. 3. MATCHING OF INDIVIDUAL PATTERNS In this section, we present our solution to matching individual regular expression patterns. The main technical challenge is to create DFAs that can fit in memory, thus making a fast DFA- based approach feasible. We first define a few concepts key to DFA construction in the context of packet payload scanning in Section 3.1. We then analyze the size of DFAs for typical payload scanning patterns in Section 3.2. Although theoretical analyses [12, 14] have shown that DFAs are subject to exponential blow- up, here, we identify specific structures that can lead to exponen- tial growth of DFAs. Based on the insights from this analysis, in Section 3.3, we propose pattern rewrite techniques that explore the possibility of trading off exhaustive pattern matching (which real-world applications often allow) for memory efficiency. Fi- nally, we offer guidelines to pattern writers on how to write pat- terns amenable to efficient implementation in Section 3.4. 4 3.1 Design Considerations Although regular expressions and automata theory can be directly applied to packet payload scanning, there is a noticeable differ- ence in between. Most existing studies on regular expressions focus on a specific type of evaluation, that is, checking if a fixed length string belongs to the language that a regular expression defines. More specifically, a fixed length string is said to be in the language of a regular expression, if the string is matched from start to end by a DFA corresponding to that regular expression. In contrast, in packet payload scanning, a regular expression pattern can be matched by the entire input or specific substrings of the input. Without a priori knowledge of the starting and ending posi- tions of those substrings, DFAs created for recognizing all sub- string matches can be highly complex. For a better understanding, we next present a few concepts pertaining to the completeness of matching results and the DFA execution model for substring matching. Completeness of matching results Given a regular expression pattern and an input string, a complete set of results contains all substrings of the input that the pattern can possibly match. For example, given a pattern ab* and an input abbb, three possible matches can be reported, ab, abb, and abbb. We call this style of matching Exhaustive Matching. It is formally defined as below: Exhaustive Matching: Consider the matching process M as a function from a pattern P and a string S to a power set of S, such that, M(P, S) = {substring S' of S| S' is accepted by the DFA of P}. In practice, it is expensive and often unnecessary to report all matching substrings, as most applications can be satisfied by a

本文档为【做乳房按摩有什么好处女人如何按摩乳房】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

做乳房按摩有什么好处女人如何按摩乳房

热门搜索

历史搜索

做乳房按摩有什么好处 女人如何按摩乳房

热门搜索

历史搜索

做乳房按摩有什么好处女人如何按摩乳房