From 058ef64db8ce40909a18c91ab4805804362f80cb Mon Sep 17 00:00:00 2001 From: Gilbert Ramirez Date: Sat, 6 Dec 2003 06:09:13 +0000 Subject: Add the ability to print packet dissections in PDML (an XML-based format) to tethereal. It could be added to Ethereal, but the GUI changes to allow the user to select PDML as a print format have not been added. Provide a python module (EtherealXML.py) to help parse PDML. Provide a sample app (msnchat) which uses tethereal and EtherealXML.py to reconstruct MSN Chat sessions from packet capture files. It produces a nice HTML report of the chat sessions. Document tethereal's PDML and EtherealXML.py usage in doc/README.xml-output Update tethereal's manpage to reflect the new [-T pdml|ps|text] option svn path=/trunk/; revision=9180 --- doc/README.xml-output | 206 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 206 insertions(+) create mode 100644 doc/README.xml-output (limited to 'doc/README.xml-output') diff --git a/doc/README.xml-output b/doc/README.xml-output new file mode 100644 index 0000000000..df3d77e920 --- /dev/null +++ b/doc/README.xml-output @@ -0,0 +1,206 @@ +Protocol Dissection in XML Format +================================= +$Id: README.xml-output,v 1.1 2003/12/06 06:09:12 gram Exp $ +Copyright (c) 2003 by Gilbert Ramirez + + +Tethereal has the ability to print its protocol dissection in an +XML format, by using the "-Tpdml -V" options. Similar functionality +could be put into the "Print" dialog of Ethereal, but that work has +not been done yet. + +The XML that tethereal produces follows the Packet Details Markup +Language (PDML) specified by the group at the Politecnico Di Torino +working on Analyzer. The specification can be found at: + +http://analyzer.polito.it/30alpha/docs/dissectors/PDMLSpec.htm + +A related XML format, the Packet Summary Markup Language (PSML), is +also defined by the Analyzer group to provide packet summary information. +The PSML format is not documented in a publicly-available HTML document, +but its format is simple. Some day it may be added to tethereal so +that "-Tpsml" (without "-V") would produce PSML. + +One wonders if the "-T" option should read "-Txml" instead of "-Tpdml" +(and in the future, "-Tpsml"), but if tethereal was required to produce +another XML-based format of its protocol dissection, then "-Txml" would +be ambiguous. + +PDML +==== +The PDML that tethereal produces is known not to be loadable into Analyzer. +It causes Analyzer to crash. As such, the PDML that tethereal produces +is be labled with a version number of "0", which means that the PDML does +not fully follow the PDML spec. Furthemore, a creator attribute in the +"" tag gives the version number of [t]ethereal that produced the PDML. +In that way, as the PDML produced by tethereal matures, but still does not +meet the PDML spec, scripts can make intelligent decisions about how to +best parse the PDML, based on the "creator" attribute. + +A PDML file is delimited by a "" tag. +A PDML file contains multiple packets, denoted by the "" tag. +A packet will contain multiple protocols, denoted by the "" tag. +A protocol might contain one or more fields, denoted by the "" tag. + +A pseudo-protocol named "geninfo" is produced, as is required by the PDML +spec, and printed as the first protocol after the opening "" tag. +Its information comes from ethereal's "frame" protocol, which servers +the similar purpose of storing packet meta-data. Both "geninfo" and +"frame" protocols are provided in the PDML output. + +The "" tag +================ +Example: + + +The creator is "ethereal" (i.e., the "ethereal" engine. It will always say +"ethereal", not "tethereal") version 0.9.17. + + +The "" tag +================= +"" tags can have the following attributes: + + name - the display filter name for the protocol + showname - the label used to describe this protocol in the protocol + tree. This is usually the descriptive name of the protocol, + but it can be modified by dissectors to include more data + (tcp can do this) + pos - the starting offset within the packet data where this + protocol starts + size - the number of octets in the packet data that this protocol + covers. + +The "" tag +================= +"" tags can have the following attributes: + + name - the display filter name for the field + showname - the label used to describe this field in the protocol + tree. This is usually the descriptive name of the protocol, + followed by some represention of the value. + pos - the starting offset within the packet data where this + field starts + size - the number of octets in the packet data that this field + covers. + value - the actual packet data, in hex, that this field covers + show - the representation of the packet data ('value') as it would + appear in a display filter. + +Some dissectors sometimes place text into the protocol tree, without using +a field with a field-name. Those appear in PDML as "" tags with no +'name' attribute, but with a 'show' attribute giving that text. + +Many dissectors label the undissected payload of a protocol as belonging +to a "data" protocol, and the "data" protocol usually resided inside +that last protocol dissected. In the PDML, The "data" protocol becomes +a "data" field, placed exactly where the "data" protocol is in tethereal's +protocol tree. So, if tethereal would normally show: + ++-- Frame +| ++-- Ethernet +| ++-- IP +| ++-- TCP +| ++-- HTTP + | + +-- Data + +In PDML, the "Data" protocol would become another field under HTTP: + + + + ... + + + + ... + + + + ... + + + + ... + + + + ... + + + + + + +tools/EtherealXML.py +==================== +This is a python module which provides some infrastructor for +Python developers who wish to parse PDML. It is designed to read +a PDML file and call a user's callback function every time a packet +is constructed from the protocols and fields for a single packet. + +The python user should import the module, define a callback function +which accepts one argument, and call the parse_fh function: + +------------------------------------------------------------ +import EtherealXML + +def my_callback(packet): + # do something + +fh = open(xml_filename) +EtherealXML.parse_fh(fh, my_callback) + +# Now that the script has the packet data, do someting. +------------------------------------------------------------ + +The object that is passed to the callback function is an +EtherealXML.Packet object, which corresponds to a single packet. +EtherealXML Provides 3 classes, each of which corresponds to a PDML tag: + + Packet - "" tag + Protocol - "" tag + Field - "" tag + +Each of these classes has accessors which will return the defined attributes: + + get_name() + get_showname() + get_pos() + get_size() + get_value() + get_show() + +Protocols and fields can contain other fields. Thus, the Protocol and +Field class have a "children" member, which is a simple list of the +Field objects, if any, that are contained. The "children" list can be +directly accessed by calling users. It will be empty of this Protocol +or Field contains no Fields. + +Furthemore, the Packet class is a sub-class of the PacketList class. +The PacketList class provides methods to look for protocols and fields. +The term "item" is used when the item being looked for can be +a protocol or a field: + + item_exists(name) - checks if an item exists in the PacketList + get_items(name) - returns a PacketList of all matching items + + +General Notes +============= +Generally, parsing XML is slow. If you're writing a script to parse +the PDML output of tethereal, pass a read filter with "-R" to tethereal to +try to reduce as much as possible the number of packets coming out of tethereal. +The less your script has to process, the faster it will be. + +'tools/msnchat' is a sample Python program that uses EtherealXML to parse PDML. +Given one or more capture files, it runs tethereal on each of them, providing +a read filter to reduce tethereal's output. It finds MSN Chat conversations +in the capture file and produces nice HTML showing the conversations. It has +only been tested with capture files containing non-simultaneous chat sessions, +but was written to more-or-less handle any number of simultanous chat +sessions. -- cgit v1.2.1