View Full Version : Conversion of word files into XML and then extracting to an Oracle database

10-29-07, 10:18 PM
Hi guys,
For those who are experts please could you answer a few questions if you could kindly please.

For the company I work for, most of the stuff we needs to be documented into a pre-formated word file (bread and butter of our company). A check list of sorts. These word files contain tables with columns which are consistent through which ever clients we have. Each document can share a particular row or have a similar row.
My question is how hard/easy would it be to convert these table ladened word files into XML which can then be extracted into a database per each client?
The goal of this project is to get away from manual reference/generation of these documents and to control certain aspects of these documents, ie the rows so they are nearly all consistent between clients.
Is this easy for someone with the know how? Or inherently hard? Or is there another way.
Any thoughts welcome!

10-31-07, 08:47 PM
Anyone please?

11-01-07, 12:20 AM
Anyone please?

If you want to buy software that already does it, here's what google turned up:

http://rustemsoft.com/ (XML converter 6.00)
http://www.xml.com/pub/a/2003/12/31/qa.html (old discussion on conversion methods)

Never used any of these, so buyer beware. From the looks of it though, it shouldn't be too bad to code something up if your needs are simple, and you are willing to dive into and debug parsing word docs (didn't see a real spec in a quick search, so may require some trial and error).

Good luck :).

11-01-07, 02:26 AM
Thanks bud, but is there another easier way? erm rater than covert straigh to XML, erm extract to the database maybe?

11-01-07, 11:32 AM
Do you have an example of a word doc? I don't use word but I can see what I come up with.

11-01-07, 09:23 PM
Thanks bud, but is there another easier way? erm rater than covert straigh to XML, erm extract to the database maybe?

XML is probably your best bet for what you want to do.

i bet EG knows how to do it.

EDIT: this may help

its openoffice, but might help you...

11-01-07, 10:07 PM
Thanks Vin,

But, we use MS office.

The thing is we are using a converter to convert word to XML. I am assuming ( I cannot confirm) that we are using MS office to convert the word files to XML files.

The problem is when we try to extract a column into the database, they all have the same amount of columns etc, nearly the same structure, only some are parsed, and not all. Even even those that are parsed into the database, there are some rows missing. When we check the Word and XML files we cannot find anything different!...I dunno what gives...

we are talking a lot of files we have to move into the database and we do not want to do it manually obviously....

...cheers for the insight anyway guys

11-01-07, 10:24 PM
I'd chime in but I've got very little experience with Office products and data transformation between them. If I knew how to help I would :(

Can you export to CSV?

11-01-07, 10:53 PM
I don't know how to do it just yet myself as we just started learning the stuff yesterday but I believe ADO.NET handles that exact stuff.

11-02-07, 10:53 PM
Thanks bud, but is there another easier way? erm rater than covert straigh to XML, erm extract to the database maybe?

Sounds like you should just write up/macro record some vb script to export your word tables to simple CSV (comma seperated values), or tabs, or colons, etc files, and then import those to whatever database you are using. This would avoid the need for the xml converter at least. Presuming you are using access, your vb script should be able to easily launch both word & access, and iterate through your files without manual intervention.

Good luck.

Here's a few links that describe the basic manual steps:


01-15-08, 10:06 AM
Did you ever figure out how to do this?? It's a little (or lot) late but I know how to do this now :D

01-15-08, 10:35 PM
Well, we outsorced the problem to a IT firm. Things are going well :)...ish

01-16-08, 11:42 PM
yikes... you paid someone to do this?? I could write you a program in about 30 minutes that would easily take care of it... hmmm... I wonder if there's anyone else out there they would pay for something like this.....

01-17-08, 12:56 PM
If you have word 2007, the .docx file is actually a zip file with the contents in xml