Existing users, log in.  New users, create a free account.  Lost password?

Mac OS X  |  Business / Productivity  |  Word Processing  |  Trapeze  |  PDF to formatted text is tricky

Trapeze

Trapeze

convert PDF files to HTML, RTF, ASCII or plain text

Version:  1.3.1

   [ Views: 762 ]

PDF to formatted text is tricky

Feedback Type:  Developer Note

Contributed by: Thursday, November 09 2006 @ 05:36 PM PST

Product Platform: MacOSX

Used Product For: Unspecified

PDF to formatted text is an imperfect science, as the source PDF has no layout information (margins, word breaks, line spacing, paragraph starts, etc.) so formatting has to be calculated and extrapolated from the physical location of text in the page space. In other words, it's hard to recompute the layout in a way that works for every possible type of PDF document, and more imporantly, it depends highly on the software that was used to create the PDF (for example, Latex strips out all inter-word spacing).

Trapeze works perfectly on many PDFs, but poorly in some (though in the majority of our test cases, it does a fine job). If you find that Trapeze is not working for your files, the best thing you can do is send us some samples. It's the only way we can trace what is happening in the layout engine. Thanks!
  

1 of 1 users found this helpful.

Rate this Developer Note

Was this Developer Note helpful? Yes | No

Comments

0 comments |

No user comments.