0%

CS144-Lab1

CS144 Lab1的主要任务是完成TCPReassembler,其主要功能为:

1
void Reassembler::insert( uint64_t first_index, string data, bool is_last_substring, Writer& output );
其中first_index是数据的逻辑下标,也就是数据到达的顺序,is_last_substring标识了该数据是否代表了最后一份数据。 而output明显就是输出的字节流。

TCP的数据流一般由以下部分组成:

| popped data | unpopped-and-pushed data | arrived-and-unpushed data |

其中popped data已经被Reader获取,而unpopped-and-pushed data已经被Writer写进缓存, 但Reader暂时还未读取,最后的arrived-and-unpushed data是从网络接收,尚未组装传入Writer,非连续的数据, 是本Lab的核心工作对象。

计算机网络的特性决定了:不同的数据到来顺序是乱序的,他们之间可能有重叠(overlapping),而且到来的数据可能已经被push, 而Reassembler要解决这些问题,提供可靠的流服务(Reliable Flow)

设计思路

我的基本想法是用一个类似char数组的缓存存储arrived-and-unpushed的数据, 然后用一个map<int,int>存储已经到达的数据的index区间 \([l,r]\),在数据到达时进行区间的合并, 这一问题和经典算法题插入区间一致。

其他的逻辑比较简单,主要是: - first_index若是arrived-and-unpushed data的首地址,要直接push - 若是空字符则省略,但若有last_string的标识,则要把writer关闭 - 一部分data在push之后,buf之后的数据要往前推(有优化空间?)

Reassembler的成员如下:

1
2
3
4
5
private:
std::map<uint64_t, uint64_t> buffer;
std::string buf;
uint64_t end_index;
uint64_t pending;

实现代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
void Reassembler::insert( uint64_t first_index, string data, bool is_last_substring, Writer& output )
{
// Your code here.
(void)first_index;
(void)data;
(void)is_last_substring;
(void)output;

if ( buf.empty() )
buf.resize( output.capacity() );

uint64_t bias_push = output.bytes_pushed();
uint64_t insert_l = max( output.bytes_pushed(), first_index );
uint64_t insert_r = min( first_index + data.size() - 1, output.available_capacity() + bias_push - 1 );

if ( is_last_substring )
end_index = first_index + data.size() - 1;

if ( data.empty() && is_last_substring ) {
output.close();
return;
}

if ( insert_l - first_index >= data.size() )
return;
if ( insert_l > insert_r )
return;

for ( uint64_t i = insert_l; i <= insert_r; i++ ) {
buf[i - bias_push] = data[i - first_index];
}

bool changed = true;
while ( changed && !buffer.empty() ) {
changed = false;
auto upper = buffer.lower_bound( insert_l );

// upper.first >= l, compare [l,r] with [uf, us]
if ( upper != buffer.end() && insert_r + 1 >= upper->first ) {

insert_r = max( upper->second, insert_r );
pending -= upper->second - upper->first + 1;

buffer.erase( upper );
changed = true;
continue;
}

if ( upper == buffer.begin() || buffer.empty() )
break;
auto lower = --upper;

// lower.first < l, compare [lf, ls] with [l, r]
if ( lower != buffer.end() && lower->second + 1 >= insert_l ) {

insert_l = lower->first;
insert_r = max( lower->second, insert_r );
pending -= lower->second - lower->first + 1;

buffer.erase( lower );
changed = true;
continue;
}
}

if ( insert_l == output.bytes_pushed() ) {
uint64_t old_bias = output.bytes_pushed();
output.push( buf.substr( insert_l - output.bytes_pushed(), insert_r - insert_l + 1 ) );
bias_push = output.bytes_pushed();

if ( insert_r == end_index ) {
output.close();
}

for ( auto it = buffer.begin(); it != buffer.end(); it++ )
for ( uint64_t i = it->first; i <= it->second; ++i )
buf[i - bias_push] = buf[i - old_bias];
} else {
buffer[insert_l] = insert_r;
pending += insert_r - insert_l + 1;
}
}

uint64_t Reassembler::bytes_pending() const
{
return pending;
}

大部分的代码和插入区间问题一致,最后的性能指标为 1.78Gbit/s,有一定的优化空间。

我想用string先存储起来,最后进行合并的话性能会提高不少。