I was attempting to parse a Word 2007 document for mail merge purposes and found that libxml was the fastest way to do it with Ruby. The XML document uses namespaces heavily but it's not readily apparent how to search the document with libxml using the namespaces.
Here is a sample section and the code I found after an extensive Google search:
and the code used to find the paragraph node "w:p"
<w:doc>
<w:p>
<w:t>Text being sought</w:t>
</w:p>
</w:doc>
ns="w:http://schemas.openxmlformats.org/wordprocessingml/2006/main"
doc.find("//w:p",ns).each do |p|
#do something special with the paragraph node here
end
It took me a while to track this down so I thought I would share here in the hopes of helping someone else.
No comments:
Post a Comment