I have roughly 12000 RTF-filled cells in Excel (including formatting tags). To access the unformatted content, I must parse them.
This is the example of one of the cells with text:
{\rtf1\ansi\deflang1060\ftnbj\uc1
{\fonttbl{\f0 \froman \fcharset0 Times New Roman;}{\f1 \fswiss \fcharset238
Arial;}}
{\colortbl ;\red255\green255\blue255 ;\red0\green0\blue0 ;}
{\stylesheet{\fs24\cf2\cb1 Normal;}{\cs1\cf2\cb1 Default Paragraph Font;}}
\paperw11908\paperh16833\margl1800\margr1800\margt1440\margb1440\headery720\footery720
\deftab720\formshade\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot
\sectd\pgwsxn11908\pghsxn16833\marglsxn1800\margrsxn1800\margtsxn1440\margbsxn1440
\headery720\footery720\sbkpage\pgncont\pgndec
\plain\plain\f1\fs24\pard TPR 0160 000\par IPR 0160 000\par OB-R-02-28\par}
And all I really need is this:
TPR 0160 000
IPR 0160 000
OB-R-02-28
The issue with simply iterating over the cells to eliminate any extraneous formatting is that not all of the 12000 cells are as easy as this. As a result, I would have to manually review numerous versions and write numerous variants, and in the end, there would still be a tonne of manual labor to be done.
However, if I copy the content of a single cell to an empty text document, save it as RTF, and then open it in MS Word, it immediately parses the text and gives me the results I need. Unfortunately, doing so for 12000 cells is rather cumbersome.
Therefore, I was considering using a VBA macro to copy the output back to the original cell after copying the contents of the cell to Word and forcing parsing. Sadly, I'm not really sure how to go about doing that.
Does anyone know the answer? or another strategy? If there is a solution or a push in the right direction, I would be really appreciative.