Communication Efficiency in Multi-Agent Systems

Communication Efﬁciency in Multi-Agent Systems

Mary Berna-Koes

Robotics Institute

Carnegie Mellon University

[email protected]

Illah Nourbakhsh

Robotics Institute

Carnegie Mellon University

[email protected]

Katia Sycara

Robotics Institute

Carnegie Mellon University

[email protected]

Abstract— Despite the growing number of multi-agent software

systems, relatively few physical systems have adopted multi-agent

systems technology. Agents that interact with a dynamic physical

environment have requirements not shared by virtual agents,

including the need to transfer information about the world and

their interaction with it. The agent communication languages

proven successful in software based multi-agent systems incur

overheads that make them impractical or infeasible for the

transfer of low level data. Instead, real world systems typically

employ application speciﬁc protocols to transfer video, audio,

sensory, or telemetry data. These protocols lack the transparency

and portability of formal agent communication languages and

consequently are limited in their scalability. We propose aug-

menting the capabilities of current multi-agent systems to provide

for the efﬁcient transfer of low level information, by allowing

backchannels of communication between agents with ﬂexible pro-

tocols in a carefully principled way. We show that this extension

can yield signiﬁcant performance increases in communication

efﬁciency and discuss the beneﬁts of incorporating backchannels

into a search a rescue robot system.

I. INTRODUCTION

Agents in a multi-agent system (MAS) must be able to in-

teract and communicate with each other. This usually requires

a common language, an Agent Communication Language, or

ACL. Much work has been done in developing ACLs that

are declarative, syntactically simple, and readable by people.

KQML [1] and FIPA-ACL [2] are two of the most widely used

ACLs in multi-agent systems. These languages have been very

successful in facilitating the communication and coordination

of software agents in a variety of domains including organi-

zational decision making [3], [4]; ﬁnancial management [5];

and even aircraft maintenance [6]. This approach to interagent

communication, while well suited to communication related

to negotiation or the transfer of high level information, has

signiﬁcant overhead which frequently proves a drawback in

systems that require the transfer of low level data or systems

with stringent time and bandwidth limits. Real robot systems

typically fall into both of these categories. They may require

the transfer of telemetry, video, audio, and sensory data at high

frequencies in real time over relatively slow wireless networks

or RF modems.

The data transferred by real robot systems tends either to

be relatively small messages, such as telemetry commands, or

large multimedia ﬁles, such as streaming video. The overhead

due to the ACL is the most signiﬁcant for small messages.

Multimedia ﬁles are also ill-suited for transfer via ACLs as

the intermediary representation of the ﬁles is not compatible

with a textual representation. Furthermore, most ACLs require

ASCII text messages which results in an inﬂation of the

message size plus additional processing. Consequently, most

physical robot teams do not use the agent communication

languages developed by the multi-agent systems community.

Instead, they typically use an ad hoc solution, deﬁning their

own protocols speciﬁc to their system. This approach lacks

the semantic power and transparency afforded by a language

like KQML and does not allow robots from various teams to

communicate.

The robot community would beneﬁt from the adoption of a

formal ACL inside the framework of a MAS, provided that the

efﬁciency of the current approach of hard coding various pro-

tocols were not signiﬁcantly decreased. This adoption would

enable different robot systems to communicate and cooperate,

allow robots to negotiate the ﬂow of information, and provide

increased transparency. Likewise, the MAS community could

beneﬁt from a more efﬁcient method of transferring low level

data, such as media ﬁles, particularly as web cameras become

more ubiquitous. Thus we have the apparently opposing goals

of improving efﬁciency for the transfer of media ﬁles and small

messages at high frequencies and preserving the portability,

readability, and declarative nature of an agent communication

language like KQML.

We propose a two tiered communication strategy, aug-

menting the current multi-agent system architecture. This

solution realizes both these goals by uniting the strengths of

ACLs and the methods used by the robot community. This

extension, which we call backchannels, has been implemented

on the RETSINA MAS [7], which uses KQML. We present

the backchannel extension, detail the necessary supporting

network drivers, show signiﬁcant analytical and experimental

performance improvements achieved with backchannels, and

relate the successes of this approach in a search and rescue

robot system.

II. TWO TIERED COMMUNICATION

The current approach to multi-agent communication is to

allow one channel of communication between agents and to

constrain communication to one language. This works well

for software agents communicating high level information

(such as commitments or negotiations) when efﬁciency is not

of paramount importance. Systems operating in a dynamic

physical environment need to send low level information (e.g.

telemetry or video data). These low level communications

(a) (b)

Fig. 1. Figure (a) shows the current MAS architecture. Figure (b) illustrates

how backchannels augment the current system.The backchannels do not

replace current lines of communication using the ACL, nor can they exist

between two agents not communicating at the ACL level.

often occur at high frequencies, which can clog the main line

of communication. We extend the current architecture to allow

multiple lines of communication between agents, as shown in

ﬁgure 1. The additional lines, or backchannels, are for the

transfer of low level information.

In order to preserve the functionality and elegance of the

ACL, all metainformation pertaining to the backchannels is

related at the ACL level. Accordingly, as shown in ﬁgure 1,

the backchannels are simplex, so that the sender, denoted as

the server, and receiver, or client, are always evident from ACL

level communication. The content of the messages sent over

a backchannel is likewise ﬁxed and agreed upon at the ACL

level. As backchannels are a resource facilitating the transfer

of information, negotiations on the regulation of this resource,

namely the establishment of backchannels and the frequency

of communication, take place at the ACL level.

As explained above, the use of backchannels is not to

replace the current agent communication languages, which are

necessary for communication between heterogeneous agents,

but for the transfer of low level messages. The content and

meaning of the messages to be exchanged is speciﬁed by

referencing a user deﬁned format description library. Each

entry in the library consists of details of how to parse one

message and a semantically meaningful description of the type

of format, for example, “video” or “teleoperation-imperatives.”

Some applications may use a translator agent to convert

messages into the agent communication language or another

human readable form for transparency. Since the efﬁciency of

the message format can have a large impact on performance, as

explained in the Analysis section, protocols should be carefully

designed.

The purpose of the agent communication language is to

facilitate communication between agents in a multi-agent

system. Extending the system architecture therefore necessi-

tates the augmentation of the language with communication

acts relating to this extension. Accordingly, the ACL must

support the establishment, ﬂow control, and termination of

backchannels.

Communication Act Description

backchannel request :request <line-type>

:protocol <message-type>

:reference-number <N>

:server-name <name>

accept request :accept <line-type>

:reference-number <N>

:server-name <name>

decline request :decline <line-type>

:reference-number <N>

:server-name <name>

:reason <details>

connection status :connection <status>

:reference-number <N>

:server-name <name>

TABLE I

COMM UNICATI ON AC TS TO SUP PO RT E STAB LI SH MEN T OF BAC KC HANNE LS

A. Establishing a backchannel

The formation of a backchannel begins with one agent

desiring to send or receive low level information to or from

another agent. If the initiator wishes to be the sender, the

server, she requests a “client line” of the other agent. If the

initiator wishes to receive information, acting as client, she

requests a “server line” of the other agent. In either case, the

request should indicate the type of data to be transferred by

referencing the format description library. Additional technical

information relating to the establishment of the backchannel,

discussed in the Network Driver Details section and illustrated

in ﬁgure 2, accompanies this request. The communication acts

necessary for the formation of a backchannel are described in

table I.

These communication acts were designed for KQML but

can easily be adapted to other ACLs. Backchannels are

distinguished by a reference number, unique to the server.

All communication regarding a particular backchannel must

specify the server and reference number. The <line-type>

may be either server-line or client-line. The connection status,

<status>, of the backchannel connection may be requested,

accepted, connected, failed, declined, or terminated. Imple-

mentation speciﬁcs for TCP are described in the Network

Drivers section of this paper.

B. Flow control of backchannels

Backchannels are simply a means of facilitating the transfer

of information between agents. As information transfer is

a commodity that consumes system resources, negotiations

regarding the ﬂow control of backchannels are necessary.

The subject of negotiation is well explained in many papers

on multi-agent systems [8], [9], and the details are not pre-

sented here. We examine the communication acts speciﬁc to

backchannels which must be supported by the ACL. It is

necessary to start and stem the ﬂow of messages, regulate

the frequency of transmissions, repeat messages, conﬁrm the

connection status of the backchannel, and inquire and respond

to the total number of transmissions sent. The communication

Communication Act Description

begin transfer/ :transmission <start/stop>

halt transfer :reference-number <N>

:server-name <name>

repeat messages :repeat <number of messages>

:reference-number <N>

:server-name <name>

request frequency :request <frequency>

:reference-number <N>

:server-name <name>

accept frequency :accept <frequency>

:reference-number <N>

:server-name <name>

deny frequency :deny <frequency>

:reference-number <N>

:server-name <name>

request total :request num-tranmissions

number of :reference-number <N>

transmissions :server-name <name>

report total :tell num-tranmissions <n>

number of :reference-number <N>

transmissions :server-name <name>

TABLE II

COMM UN IC ATI ON ACTS TO SUP PO RT CON TRO L OF BAC KC HAN NE LS

Communication Act Description

request termination :request termination

(Handshake method) :reference-number <N>

:server-name <name>

termination warning :termination-warning <Time>

(Warning method) :reference-number <N>

:server-name <name>

report termination :connection terminated

(All methods) :reference-number <N>

:server-name <name>

TABLE III

COMM UN IC ATI ON ACTS TO SUP PO RT TER MINATION OF BACK CHANN EL S

acts necessary to the ﬂow control of backchannels is described

in table II.

C. Termination of backchannels

As either the server or the client may be the initiator of the

backchannel, so must the ability to end the connection over

a backchannel fall equally to both parties. We provide three

protocols for termination at various levels of social etiquette.

The support required of the ACL for each method is described

in table III. Consider again two agents, Agent 1 and Agent 2,

where Agent 1 wishes to terminate the connection.

In the most polite way to sever the line of communication,

the handshake method, Agent 1 sends a request to Agent 2

though the ACL to end the connection. Agent 2 closes the

TCP connection when ready and then sends a response in

ACL alerting Agent 1 that the connection has been closed.

The second protocol for ending communication across a

backchannel is more similar to the agent shaking her head

than shaking hands. In the warning method, Agent 1 sends

warnings in ACL that the line will be shut down. Warnings

are sent at 30 seconds and 5 seconds. During this time, Agent

2 has the opportunity to close the TCP connection on her side

and then notify Agent 1 that the connection has been closed.

At the end of the time, if Agent 2 has not terminated the

connection, Agent 1 terminates it and then sends a message

of notiﬁcation to Agent 2.

The ﬁnal protocol, the cold shoulder method, is far less

friendly than the other methods as data may be lost. Here,

either agent simply closes the TCP connection without notice.

A message in the ACL is then sent notifying the other agent

that the connection has been closed.

Independent of which protocol is chosen, we insist that a

message is always sent through the ACL when a connection

is closed. With this restriction, termination protocols are

consistent with the rule that all metacommunication be handled

at the ACL level so that the state of a connection be apparent

from the messages. An exception to this rule is the case when

neither agent intentionally terminates the backchannel but the

connection is severed due to a physical loss of communication

between the agents. If this occurs, the agents must recognize

that the connection was terminated from the TCP level, which

sends a warning when a connection is terminated. Our protocol

does not allow agents to reopen a closed backchannel for any

reason, so a new backchannel would need to be established if

the agents wish to reestablish backchannel communication.

III. NETWORK DRIVER DETAILS

Although backchannels, like agent communication lan-

guages, are independent of the transport protocol, their imple-

mentation naturally depends heavily on the transport protocol

used. This section provides sample implementation details in

order to demonstrate the interleaving of low level and ACL

level communication. The speciﬁc implementation presented

here is for the TCP low level transport protocol, the most

widely used protocol in real robot systems. The analysis and

experimental results also assume TCP, though for systems

where efﬁciency is of paramount importance, UDP may be

a better alternative.

The protocol for establishing a connection between a pair of

agents varies slightly depending on whether the initiator wants

to send or receive the information. To illustrate this distinction,

consider Agents 1 (the initiator) and 2 as shown in ﬁgure 2.

In the ﬁrst case, Agent 1 wishes to send information to Agent

2. For example, Agent 1 may wish to control Agent 2 through

teleoperation and needs to send many commands. In this case,

Agent 1 acts as the server while Agent 2 is the client. In

the second case, Agent 1 wishes to receive information from

Agent 2. As an example of this relationship, Agent 1 may

want streaming video at a certain resolution from Agent 2.

Now Agent 1 is the client and Agent 2 is the server.

Since the ﬁrst step in establishing a TCP connection is

for the server to open a passive line on a port, the server

must acquiesce to the agreement before any progress can

be made. The case in which the initiator is the server is

consequently the simplest (ﬁgure 2(a)). Agent 1, the server,

opens a port for connections and then sends a message to

Agent 2 through the ACL requesting permission to set up a line

to send information (from Agent 2’s perspective a client line).

Agent 1 also provides the necessary information for Agent 2

(a) Agent 1 → Agent 2 (b) Agent 1 ← Agent 2

Fig. 2. Flowchart of interleaved high and low level communication for establishment of a backchannel in the case where Agent 1 wants Agent 2 to be the

client (a) and where Agent 1 wants Agent 2 to be the server (b). Messages sent through the ACL are in boxes with rounded edges.

to connect to the available port as well as a reference number.

The reference number is a unique number on Agent 1’s side

referring to this particular channel for communication. Finally,

Agent 1 speciﬁes what information will be sent over the line

with the format descriptor, which is a reference into a library

of format descriptions. Agent 2 either accepts the request and

opens a client connection over TCP, or declines, optionally

providing a reason. Once the connection has been established,

Agent 2 notiﬁes Agent 1. The backchannel is now ready to be

used, controlled through communication between the agents

in the ACL.

The protocol for the case in which the initiator wishes to

receive information and act as the client differs only in the

logistics. For a complete description of the interactions of the

agents when the initiator is the client, refer to ﬁgure 2(b).

IV. ANALYSIS

There are several reasons to believe that backchannels

will improve system performance, but it is ﬁrst necessary to

deﬁne our metrics. Two of the most important elements of

network performance are bandwidth and latency. Following

the lead of the networking community [11], this analysis

uses frequency, directly related to throughput, and latency

as metrics for evaluating the performance of a system. This

paper equates the bandwidth and throughput of the system

with the frequency of message exchanges though there are

some differences. Bandwidth is the transmission capacity of

the network, usually measured in bits per second. Throughput

is the measurement of real world data across the network and

can never be more than bandwidth, and is frequently less due

to network trafﬁc. We use frequency to describe the number of

messages successfully transmitted in a given time, rather than

the number of bytes transmitted as throughput. Frequency is

important from the multi-agent system perspective because it is

0 200 400 600 800 1000 1200 1400 1600 1800

Useful message size in bytes

Maximum frequency (in KHz)

Effect of backchannel on maximum frequency on 802.11b wireless network

With backchannel

Without backchannel

Fig. 3. A theoretical analysis of the maximum frequency at which messages

can be sent over an 11 Mbps network assuming the same message content

with or without backchannels and ignoring TCP effects.

the number of messages sent between agents which determines

the maximum number of agents and the maximum rate at

which information can be exchanged in a network.

Network latency is the amount of time it takes for a packet

to travel from the source to the destination. We use latency

to describe the amount of time it takes for a message to

be sent from the source and be processed at the destination.

Latency is of interest in systems with real-time constraints.

Although bandwidth and latency are partially properties of a

given network (i.e. a network that uses phone lines has lower

bandwidth and higher latency than one that uses Ethernet

lines), they are also dependent on the size of the messages

transmitted over the network. A theoretical analysis the effect

of message size on both throughput and latency illuminates

Fig. 4. Modeled and measured TCP latencies for 403 transfers from the

University of Washington to the UC-Davis, used with permission [10].

the beneﬁts of the more efﬁcient system we propose.

For this analysis, consider a multi-agent system composed

of identical agents, each sending the same messages. The

bandwidth required for such a system would depend on the

number of agents, the frequency at which each agent is sending

its messages, and the size of each message.

Bandwidth = Number × Frequency × Message

consumed of Agents Size

When using the backchannel, the overhead of sending

messages can be signiﬁcantly reduced. Consider the case of

teleoperating a robot with a joystick. An interface agent wishes

to send the left and right wheel velocities to the robot agent.

Using the system implemented for a urban search and rescue

multi-agent system [12], which uses KQML, the message has

the following format:

(tell

:sender Agent2

:receiver Agent1

:language default-language

:ontology default-ontology

:reply-with nob

:in-reply-to nob

:forward-to nob

:content

(:leftWheelVelocity 145

:rightWheelVelocity 152))

Alternatively, if the message were passed over the backchan-

nel with a format description stating that each message was

two characters long and the ﬁrst and second characters were

to be interpreted as the left and right wheel velocities re-

spectively, each message would only be two bytes long, as

opposed to the 200 bytes required to send the same information

over KQML. On the other hand, setting up the backchannel

and controlling the backchannel through the ACL incurs some

overhead.

The efﬁciency of the format description is very important.

0 200 400 600 800 1000 1200 1400 1600 1800

Useful Message Size in Bytes

Frequency in KHz

Effect of backchannel on maximum frequency on 802.11b wireless network

Experimental results without backchannel

Experimental results with backchannel

Theoretical results without backchannel

Theoretical results with backchannel

Fig. 5. Measured and theoretical effect of backchannels on maximum

frequency

To remove this dependency, we assume that the same message

:content is sent over the backchannel as through the ACL

line and all overhead is due purely to non-content overhead.

We also ignore the fact that large messages are most likely

either media ﬁles or compressed, which means that they would

need to be encoded and thus enlarged in order to be transmitted

as ASCII text, incurring additional overhead (e.g. 20% using

UU-encoding). Finally, we ignore TCP level overhead in this

analysis.

Latency is studied by networks researchers because it affects

the quality of service of a network [10]. In a multi-agent

system, it is particularly important in real-time domains where

small changes in latency may mean the difference between

success and failure. The work of Cardwell et al., reproduced

with permission in ﬁgure 4, shows analytically and empirically

that TCP latency increases with message size. For example,

doubling the message size from 20000 to 40000 bytes results

in a nearly 75% increase in latency. Furthermore, the results

indicate an increased sensitivity of latency to message size

for smaller messages. This data suggests that more efﬁcient

protocols may signiﬁcantly decrease latency in a system. As

a caveat, however, the latency of a system depends on the

protocols and algorithms used. The Nagle algorithm [13] or

kernel buffers can cause small messages to be accumulated

and packaged together, increasing the latency.

V. EXPERIMENTAL RESULTS

We augmented the RETSINA multi-agent system, [7], to

support backchannels as discussed in Section 2. We then

conducted tests to determine the maximum frequency at which

messages could be interchanged between a pair of agents on

two different networks. The results (shown in ﬁgure 5) conﬁrm

our claim that the use of backchannels can lead to signiﬁcant

savings for small messages. The maximum frequencies ob-

served on a 11 Mbps network match the predicted results of

ﬁgure 3 very well.

0 100 200 300 400 500 600

0.5

1.5

2.5

3.5

4.5

Useful Message Size in Bytes

Average Latency in ms

Effect of backchannel on Latency

Without backchannel on wireless network

With backchannel on wireless network

Fig. 6. Measured effect of backchannels on latency

Tests to determine how latency is affected by the addition

of a backchannel also match theoretical results. The tests

described do not measure only TCP latency because we timed

the round trip which includes time to parse the message.

The total savings in latency is an average of 20% on the

wireless network with our implementation, but can be as much

as 80% savings for messages in the 375 to 500 byte range, a

signiﬁcant beneﬁt for real-time systems.

VI. APPLICATION TO SEARCH AND RESCUE ROBOTICS

Urban Search and Rescue (USAR) involves coordination

between people, agents, and robots to explore a space with

many environmental challenges, including limited and spo-

radic communication. We have built a heterogeneous set of

wheeled differential drive robots, and several user interfaces

to allow robots to explore a mock up disaster site. In order

to enable robots, agents, and people to work together, we are

using the RETSINA multi-agent system architecture. Based on

empirical testing of our system, we found that the overhead of

KQML messages was very signiﬁcant. Additionally, sending

real-time video to the human operators could only be done

at enormous cost through RETSINA and so was controlled

outside the architecture. After integrating backchannels into

the RETSINA architecture, we found that the stability and

reliability of the robot system improved signiﬁcantly as latency

was decreased and throughput was increased.

VII. RELATED WORK

Discussion of the balance between efﬁciency and readability

is common in the networking community where HTML is

mixed with other formats using MIME headers to allow media

types other than simple ASCII text to be encoded into a

message. Much work has also been done to improve the speed

of transferring images and multimedia. This work [10], [11],

[14], [13] has contributed to the analysis on throughput and

is an excellent resource for more information on the effect

of message size on latency. To our knowledge, this is the ﬁrst

attempt to integrate backchannels into a MAS in a formal way.

The MAS community has largely ignored the importance of

throughput efﬁciency. Work has been done to develop real-time

multi-agent systems, [15], [16], but this work has focused on

higher level algorithms, ignoring the low level protocol and

the potential beneﬁts of improving throughput.

VIII. CONCLUSION

We have presented arguments that a single line and language

for agent communication is inadequate for systems that require

the transfer of multimedia ﬁles and low level data at high

frequencies. We describe a two-tiered communication archi-

tecture using backchannels and the steps necessary to integrate

backchannels into existing MAS architectures in a principled

way. We hope this ﬂexibility will open up multi-agent systems

for use with physical agent systems, helping to bridge the

multi-agent multi-robot system research gap. Theoretical and

experimental tests show improved communication efﬁciency

and the addition of backchannels to the MAS RETSINA

enabled the use of RETSINA for an urban search and rescue

multi-robot system.

REFERENCES

[1] T. Finin, R. Fritzson, and R. McEntire, “KQML as an agent commu-

nication language,” in Proceedings of the 3rd International Conference

on Information and Knowledge Management, November 1994.

[2] (1997) Foundation for intelligent physical agents. [Online]. Available:

http://www.ﬁpa.org

[3] K. Sycara, K. Decker, A. Pannu, M. Williamson, and D. Zeng, “Dis-

tributed intelligent agents,” IEEE Expert, December 1996.

[4] H. Chalupsky et al., “Electric elves: Agent technology for supporting

human organizations,” AI Magazine, Summer 2002.

[5] K. Decker, K. Sycara, and D. Zeng, “Designing a multi-agent portfolio

management system,” in Proceedings of the AAAI Workshop on Internet

Information Systems, 1996.

[6] O. Shehory, G. Sukthankar, and K. Sycara, “Agent aided aircraft

maintenance,” in Proceedings of Autonomous Agents ’99, May 1999,

pp. 306–312.

[7] K. Sycara, M. Paolucci, M. van Velsen, and J. Giampapa, “The

RETSINA MAS infrastructure,” special joint issue of Autonomous

Agents and MAS, vol. 7, no. 1 and 2, July 2003.

[8] C. Li, J. Giampapa, and K. Sycara, “A review of research literature on

bilateral negotiations,” Carnegie Mellon University, Robotics Institute,

Pittsburgh, PA, Tech. Rep. CMU-RI-TR-03-41, Nov. 2003.

[9] G. Zlotkin and J. Rosenschein, “Negotiation and task sharing among

autonomous agents in cooperative domains,” in Proceedings of the

Eleventh International Joint Conference on Artiﬁcial Intelligence, 1989,

pp. 912–917.

[10] N. Cardwell, S. Savage, and T. Anderson, “Modeling tcp latency,” in

Proceedings of IEEE INFOCOM, March 2000.

[11] M. Harchol-Balter. (2002, November) Quality of service lectures.

[Online]. Available: http://www-2.cs.cmu.edu/∼srini/15-441/F02/

[12] J. Wang, M. Lewis, and J. Gennari, “USAR: A game based simulation

for teleoperation,” in Proceedings of the 47th Annual Meeting of the

Human Factors and Ergonomics Society, Denver, CO, Oct. 13–17, 2003.

[13] J. Nagle, “Congestion control in IP/TCP internetworks,” Network Infor-

mation Center, SRI International, Menlo Park, CA, RFC 896, January

1984.

[14] J. F. Kurose and K. W. Ross, Computer Networking: A Top-Down

Approach Featuring the Internet. Addison-Wesley, July 2000, ch. 3.7

TCP Congestion Control.

[15] M. Allouche, O. Boisser, and C. Sayettat, “Temporal social reasoning

in dynamic multi-agent systems,” in Proceedings of the Fourth Inter-

national Conference on Multi-Agent Systems (ICMAS-2000). IEEE

Computer Society, 2000, pp. 23–28.

[16] V. Julian and V. Botti, “Developing real-time multi-agent systems,” in

Fourth Iberoamerican Workshop on Multi-Agent Systems, Iberagents

2002. IBERAMIA 2002, the VIII Iberoamerican Conference on

Artiﬁcial Intelligence, November 2002.