Jump to content

Calligra/Binary MS Office Formats

From KDE Community Wiki

Microsoft Office has many different binary file formats. The main ones are the various version of .doc, .xls and .ppt. KOffice can import these formats. Here we list issues with these formats, where the existing documentation has proved insufficient.

Word (.doc)

In kword filters, for the msword -> odf part, toc are not seen as tocs, but as several lines on a good format. They should be seen as ToC and it easy to find where the ToCs begin: texthandler::fieldStart, I don't remember the number of the field... but the logs write it clearly. My trouble was to find where the ToC ends or something like how long does it take ! the texthandler::fieldEnd never shows something like the texthandler::fieldStart shows.

Excel (.xls)

PowerPoint (.ppt)

-- order of member elements may vary even if this is not documented

  examples: 
     SlideHeadersFootersContainer in DocumentContainer

-- PerSlideHeadersFootersContainer can sometimes have a HeaderAtom

-- SoundCollectionContainer can have a recInstance if 5 instead of 0

-- OutlineViewInfoContainer can have recInstance of 0

-- VBAInfoAtom

-- DocumentTextInfoContainer has been seen to have two TextMasterStyleAtom members

-- TextPFException can have bulletChar == 0, the documentation claims this is wrong

-- KinsokuAtom has been seen to have the value 128 for the level

O-DRAW

This is the format for embedded images.