However if I open a blank file, then 'place' the PDF in the file, it inserts it as an 'Embedded Document', which looks like it should with Georgia being the assigned font. I don't understand why both methods aren't opening the same way?
cid font f1 download for 16
Just stumbled upon the same problem. I printed a OneNote note to pdf with Microsofts PDF Printer and tried to open the PDF in Affinity. When I open the PDF in Illustrator it displays the text correctly but is still missing those PostScript CIDFont+F1 fonts.
If you want to Open the PDF you need to have the fonts installed. If you don't have them, you could use Place, with the Passthrough option instead, but you will not be able to edit the contents of the PDF.
Here is a little video that shows a good process for this. You will need a trial or access to Acrobat first though:Opening a PDF in Adobe Illustrator or Affinity Designer without the fonts
Does anyone have either:- a parser (like this one ) that copes with CID fonts; or- some example code for how to parse a pages resource dictionary to find the pages fonts and get its ToUnicode stream to help finish off this example ( )
And the actual helper class, which I'll post in its entirety, because they are all used in the example, and be because I've found so few complete examples myself when I was trying to solve this issue. The helper uses both PDFSharp and iTextSharp to be able to able to open PDFs pre- and post-1.5, ExtractTextFromPDFBytes to read in a standard PDF, and my FindObjects (to search the document tree and return objects) and FromUnicode that takes encrypted texts and a fonts collection to translate it.
The main features that CID fonts add are the ability to have 16bit values (so 65535 separate CID characters rather than 256) and much more sophisticated and more flexible unicode settings for extraction. Predefined CMAPs (or custom ones embedded by the user) allow for text extraction to provide appropriate values.
Encoding is far more elaborate for CID fonts with the CIDSystemInfo key allowing a number of preset values for common languages (ie Korean, Japanese, Chinese) and the CIDtoGIDMap in Type2 fonts allowing custom control.
No, that's not how it works - this was a mistake in the old post from years ago. That's what you may have found for a file or a number of files. But names like this just mean that the fonts are given random names in the order some app or person used them.
Now the problem: given I had PDF files with embedded fonts -- how can I extract those fonts in a way that they are re-usable as regular font files? Are there (preferably free) tools which can do that? Also: can this be done programmatically with, say, iText?
You have several options. All these methods work on Linux as well as on Windows or Mac OS X. However, be aware that most PDFs do not include to full, complete fontface when they have a font embedded. Mostly they include just the subset of glyphs used in the document.
Next, MuPDF. This application comes with a utility called pdfextract (on Windows: pdfextract.exe) which can extract fonts and images from PDFs. (In case you don't know about MuPDF, which still is relatively unknown and new: "MuPDF is a Free lightweight PDF viewer and toolkit written in portable C.", written by Artifex Software developers, the same company that gave us Ghostscript.)(Update: Newer versions of MuPDF have moved the former functionality of 'pdfextract' to the command 'mutool extract'. Download it here: mupdf.com/downloads)
This command will dump all of the extractable files from the pdf file referenced into the current directory. Generally you will see a variety of files: images as well as fonts. These include PNG, TTF, CFF, CID, etc. The image names will be like img-0412.png if the PDF object number of the image was 412. The fontnames will be like FGETYK+LinLibertineI-0966.ttf, if the font's PDF object number was 966.
Then, Ghostscript can also extract fonts directly from PDFs. However, it needs the help of a special utility program named extractFonts.ps, written in PostScript language, which is available from the Ghostscript source code repository.
Now use it, you need to run both, this file extractFonts.ps and your PDF file. Ghostscript will then use the instructions from the PostScript program to extract the fonts from the PDF. It looks like this on Windows (yes, Ghostscript understands the 'forward slash', /, as a path separator also on Windows!):
I've tested the Ghostscript method a few years ago. At the time it did extract *.ttf (TrueType) just fine. I don't know if other font types will also be extracted at all, and if so, in a re-usable way. I don't know if the utility does block extracting of fonts which are marked as protected.
Finally, Didier Stevens' pdf-parser.py: this one is probably not as easy to use, because you need to have some know-how about internal PDF structures. pdf-parser.py is a Python script which can do a lot of other things too. It can also decompress and extract arbitrary streams from objects, and therefore it can extract embedded font files too.
It tells me that there are two instances of FontFile2 inside the PDF, and these are in PDF objects no. 15 and no. 16, respectively. Object no. 15 holds the /FontFile2 for font /ArialMT, object no. 16 holds the /FontFile2 for font /Arial-BoldMT.
A quick peeking into the PDF specification reveals the the keyword /FontFile2 relates to a 'stream containing a TrueType font program' (/FontFile would relate to a 'stream containing a Type 1 font program' and /FontFile3 would relate to a 'stream containing a font program whose format is specified by the Subtype entry in the stream dictionary' hence being either a Type1C or a CIDFontType0C subtype.)
So Bingo!, we have a winner: pdf-parser.py did indeed extract a valid font file for us. Given the size of this file (778.552 Bytes), it looks like this font had been embedded even completely in the PDF...
In any case you need to follow the license that applies to the font. Some font licences do not allow free use and/or distribution. Pirating fonts is like pirating any software or other copyrighted material.
Using the free online web page by IDR Solutions, PDF to HTML5 (link), convert a PDF to a zip file. In the resulting zip will be a font directory of woff file types. Current Internet browsers support woff files if you were not aware. (reference) These can be examined at the online site FontDrop! (link).
PDF2SVG version 6.0 from PDFTron does a reasonable job. It produces OpenType (.otf) fonts by default. Use --preserve_fontnames to preserve "the font/font-family naming scheme as obtained from the source file."
PDF2SVG is a commercial product, but you can download a free demo executable (which includes watermarks on the SVG output but doesn't otherwise restrict usage). There may be other PDFTron products that also extract fonts, but I only recently discovered PDF2SVG myself.
A font can be embedded only if it containsa setting by the font vendor that permits it to be embedded. Embeddingprevents font substitution when readers view or print the file,and ensures that readers see the text in its original font. Embeddingincreases file size only slightly, unless the document uses CIDfonts. a font format commonly used for Asian languages. You canembed or substitute fonts in Acrobat or when you export an InDesigndocument to PDF.
If you have difficulty copying and pasting textfrom a PDF, first check if the problem font is embedded (File >Properties > Font tab). For an embedded font, try changing thepoint where the font is embedded, rather than sending it insidethe PostScript file. Distill the PDF without embedding that font.Then open the PDF in Acrobat and embed the font using the Preflightfixup.
The Acrobat installation includes width-only versions of many common Chinese, Japanese, and Korean fonts, therefore Distiller can then access these fonts in Acrobat. Make sure that the fonts are available on your computer. (In Windows, choose Complete when you install Acrobat, or choose Custom and select the Asian Language Support option under the View Adobe PDF category. In Mac OS, these fonts are installed automatically.)
To specify other font folders for Distiller to search, in Acrobat Distiller, choose Settings > Font Locations. Then in the dialog box, click Add to add a font folder. Select Ignore TrueType Versions Of Standard PostScript Fonts to exclude TrueType fonts that have the same name as a font in the PostScript 3 font collection.
You can create a printable previewof your document that substitutes default fonts for any text formattedin fonts that are available on your local computer but are not embeddedin the PDF. This preview can help you decide whether to embed thoselocal fonts in the PDF, to achieve the look you want for your document.
The font KozGoPr6N-Medium is considered by Adobe a restricted font and this behavior is because Adobe changed the way it leads with CID (Character Identifier font) fonts. It now prompt the users to install an Asian language pack for documents that do not have a single eastern character in them. All because the encoding for the used CID fonts is Eastern (it has to be, because Adobe requires it for CID fonts).
Note: For font-related errors, try changing the font-related settings in the device driver. For example, in the printer's properties (Windows), click Advanced, then, in the Graphics section, change the TrueType Font option to Download as Softfont.
Do you receive the error from more than one application? If the same problem occurs from more than one application, the cause is most likely a problem at the system level. Damaged fonts, damaged system files, damaged printer drivers, insufficient hard disk space, network problems, or hardware problems commonly cause system-level problems. If the problem occurs only from one application, see the "Isolating Application-Specific Problems" section. 2ff7e9595c
Comments