Availability of Patent
ALL TEXT, ALL AUTHORITIES
- ONE FORMAT
If you prefer, we now have
MAPS-XML for all Authorities. See
the DTD Link, below.
Quantity discounts available on multiple collections
Call us for current pricing
Patent Data Collection
Number of files (rounded down) through End
US Patent Grant Facsimile Images
April 1790 -
Dec 31st 2015
US Patent Full Text (FT) 1976 and
Bibliographic, minimum 1975 and earlier
Dec 31, 2015
SSD or Flash
Application Facsimile Images
Dec 31 2015
Application Full Text
Dec 31 2015
SSD or Flash
and Applications - Facsimile Images
Patents and Applications Full Text
SSD or Flash
Application Facsimile Images
Application Full Text (Note
SSD or Flash
JP Abstracts - Bibliographic data
and Abstract *
(* human translated Text Abstract paragraph)
SSD or Flash
JP Abstract Facsimile images (single
with a single 200 dpi
representative image, if supplied
August 20th 2016
Size of Media will vary
- Text is shipped with each Weekly
issue in 7z archive format
We now ship COMPLETE TEXT collection Sets (4 authorities) on 250GB
Delivered in Original EPC Approved Languages (approx. English
70%, German 20% French 10%) - All
files have English Bibliographic Data, Titles and Abstracts)
Grants have claims in all 3 languages, but no abstract para.
Most in English, All contain complete English Abstracts - All PCT
Text is OCR Sourced.
The MAPS 7.0 Specification
is now available!
Current MAPS Specification for Version 7.0
Note: Version 7.0c contain additional minor corrections
in the documentation (from V7.0b) and does not affect data contents in the
MAP or MAPS-XML files.
Current MAPS-XML DTD for Version
7.0 (XML DTD):
SAMPLE MAPS and MAPS-XML Data for
For the engineers
who must determine if our data will satisfy their requirements, we wanted to provide
the widest range of samples covering all technologies (i.e., Classes).
We determined that the best way to do this is to provide the first
complete week for every year in each of the collections in both MAPS and
MAPS-XML formats. This is close to 2% of the total data in each of the MAPS or MAPS-XML data collections (each Format with all four Authorities included is over 1 TB of uncompressed text data).
The compressed size of the sample data files is approximately 2.3 GB for
each format. This is a very large amount for sample data, but this way, there
should be no
questions about exactly what is included in the collections. Plus, development with a high level of confidence is also possible, prior to purchase. Each
7z Archive file for MAPS and MAPS-XML files is between 5.5 MB (earlier years) and 75 MB (recent years) .The MathML 7z archive
files are between 40KB and 475KB.
IMPORTANT NOTE ABOUT USING THIS SAMPLE DATA: Even though this sample data is from our primary working inventory, it is just that, a WORKING INVENTORY. For example, at the time of this writing there were
over 750 files, mostly WO and Pre-2006 EP files, currently in a "Pending
State" that have been edited but not yet verified or re-added to our
inventories. By "Edited" we mean serious clean-up by one of our editors while they view the facsimile image version on one of the Dual Monitor workstation designed for this purpose. The files that consume the most time are those OCR files where the patent deals with HTML commands. If you are a programmer, imagine the possibility for errors reading HTML Examples in an Application dealing with various HTML tags while the OCR system itself generates XML with HTML tags for highlighting. They can be a nightmare, and consume untold hours of editing. This is not the fault of the Patent Authorities. Also, the complete collections must go through one last update with the most recent CPC and IPCR
data to ensure that you get the most recent class information in the
entire collection back-file. This lets you begin to use the data you index almost immediately for class searches. We consider the
USPTO database systems the ideal system for comparison searches since they are the primary source for the US data. They are effectively our Gold Standard for US Data search comparisons. The EPO eSpace system is used for EP and JP Data test comparisons, while Patentscope and the EPO systems are used for different aspects when comparing and testing searches for WO applications. The MAPS-XML data files are generated NEW for every MAPS-XML data order, and they were generated new for the first week of data for these sample files, as well. Out of the 71,515 WO MAPS Application files (including Search Reports, Corrected Copies, Amended Claims, etc.), 166 MAPS files did not convert to XML properly
and failed our Basic Verification tests. These have been left out of
Sample MAPS-XML archives. The MAPS files that caused the conversion
errors will be checked, corrected, and these files will be regenerated
and added at some point in the future. The sample data is currently a
Additional Notes on Sample Data:
1) MAPS MVER elements
may contain V7.0a, V7.0b or V7.0c. All 3 versions have the same
technical specification in Version 7.0 since letter updates (a to b, or, b
to c, etc.) indicate a change to documentation, only (corrections of typos, formatting errors, etc.).
2) The MathML files
(HTML) described in Appendix-M of the MAPS Specification for the MAPS
and MAPS-XML are in the 2nd column after the MAPS files for the
applicable issue weeks. The same MathML HTML files are used with both formats (MAPS and MAPS-XML), so are only provided once.
3) The approximate size
for each set of sample files (combined) is listed in megabytes (e.g.,
~775 MB) to the right of the name and format of that set. The size
includes the archive files for the HTML MathML files, if any are
included with that set.
4) MAPS-XML data is created from the MAPS Data.
The reason for this is that the authorities provide a large portion of
their data created with OCR software (Optical Character Recognition).
The quality of this text, especially for older files, depends greatly on
the quality of the scanned images ("WIPO - we feel your pain"). The MAPS format is ideal for human editors (e.g., to clean up and make needed repairs while viewing the scanned images).
The files as delivered by the Authorities are very difficult, if not
impossible to edit since large amounts of the data are in HTML Numeric
Entities (e.g., ′ is the Prime symbol)
and ALL of the OCR Text Characters for Chinese, Japanese, Russian and a
few others and provided as HTML Numeric Entities, for example, І is the Cyrillic Byelorussian-Ukrrainian Capital Letter "I"). We have made it far more Human friendly since we have converted ALL of the character sets to the UTF-8 BINARY format so that any GOOD Text Editor that
FULLY SUPPORTS UTF-8 may be used to edit or view both the MAPS and
MAP-XML files while looking as the Image page, which the text contents SHOULD match. Editing XML is still not nearly as easy to edit when compared to the MAPS format, even with a good XML editor. You may want to download and install a known good UTF-8 editor to make viewing the sample files, easier. DO NOT USE NOTEPAD in Windows (it seem to trash UTF-8 files randomly, even thought they now claim is supports UTF-8). One of the best editors we have found to edit the MAPS or MAPS-XML files in the Windows Environment, is named: Notepad2-Mod. It is a "Fork" of Notepad2 made by very competent programmers and their associates (Kai Liu or XhmikosR and others). You can acquire it here (on GitHub): https://xhmikosr.io/notepad2-mod/ Another good editor is named Akel-Pad.
It has columnar insertion, and when we need that feature, Akel-Pad is
used, otherwise we go for Notepad2-Mod. But there are many UTF-8
compatible editors out there now. Just be sure you have one that
will let you save a UTF-8 file WITHOUT a Byte Order Mark (BOM).
any questions or comments you may have about this sample data, or about
any of our services, programs, or data products, to SupportIPDataCorp.com and we will be happy to assist you.
Begin MAPS Sample Data, Version 7.0
US Applications, MAPS Format (~779 MB):
Samples/Maps/USA/20020103.7z - Samples/Maps/USA/20020103-MathML.7z
Samples/Maps/USA/20030102.7z - Samples/Maps/USA/20030102-MathML.7z
Samples/Maps/USA/20040101.7z - Samples/Maps/USA/20040101-MathML.7z
Samples/Maps/USA/20050106.7z - Samples/Maps/USA/20050106-MathML.7z
Samples/Maps/USA/20060105.7z - Samples/Maps/USA/20060105-MathML.7z
Samples/Maps/USA/20070104.7z - Samples/Maps/USA/20070104-MathML.7z
Samples/Maps/USA/20080103.7z - Samples/Maps/USA/20080103-MathML.7z
Samples/Maps/USA/20090101.7z - Samples/Maps/USA/20090101-MathML.7z
Samples/Maps/USA/20100107.7z - Samples/Maps/USA/20100107-MathML.7z
Samples/Maps/USA/20110106.7z - Samples/Maps/USA/20110106-MathML.7z
Samples/Maps/USA/20120105.7z - Samples/Maps/USA/20120105-MathML.7z
Samples/Maps/USA/20130103.7z - Samples/Maps/USA/20130103-MathML.7z
Samples/Maps/USA/20140102.7z - Samples/Maps/USA/20140102-MathML.7z
Samples/Maps/USA/20150101.7z - Samples/Maps/USA/20150101-MathML.7z
Samples/Maps/USA/20160107.7z - Samples/Maps/USA/20160107-MathML.7z
US Granted Patents, MAPS Format (~800MB):
Samples/Maps/USG/20020101.7z - Samples/Maps/USG/20020101-MathML.7z
Samples/Maps/USG/20030107.7z - Samples/Maps/USG/20030107-MathML.7z
Samples/Maps/USG/20040106.7z - Samples/Maps/USG/20040106-MathML.7z
Samples/Maps/USG/20050104.7z - Samples/Maps/USG/20050104-MathML.7z
Samples/Maps/USG/20060103.7z - Samples/Maps/USG/20060103-MathML.7z
Samples/Maps/USG/20070102.7z - Samples/Maps/USG/20070102-MathML.7z
Samples/Maps/USG/20080101.7z - Samples/Maps/USG/20080101-MathML.7z
Samples/Maps/USG/20090106.7z - Samples/Maps/USG/20090106-MathML.7z
Samples/Maps/USG/20100105.7z - Samples/Maps/USG/20100105-MathML.7z
Samples/Maps/USG/20110104.7z - Samples/Maps/USG/20110104-MathML.7z
Samples/Maps/USG/20120103.7z - Samples/Maps/USG/20120103-MathML.7z
Samples/Maps/USG/20130101.7z - Samples/Maps/USG/20130101-MathML.7z
Samples/Maps/USG/20140107.7z - Samples/Maps/USG/20140107-MathML.7z
Samples/Maps/USG/20150106.7z - Samples/Maps/USG/20150106-MathML.7z
Samples/Maps/USG/20160105.7z - Samples/Maps/USG/20160105-MathML.7z
EP Applications and Granted Patents (MAPS Format ~nnn MB):
EP Sample files being added now. Links will be updated soon.
WO - World Patent Applications (PCT, MAPS Format ~468 MB):
JP Unexamined Application Abstracts in English (MAPS Format ~nnn MB):
Files being prepared for addition now. Links will be updated soon.
Begin MAPS-XML Sample Data, Version 7.0
These are the same publications listed above in MAPS format, except these are in the MAPS-XML format. Please see the Current MAPS-XML DTD for Version 7.0 on this page under the list of Collections near the top. When using MAPS-XML Data, you
should also consult the Current MAPS Specification (also on this page)
since it contains more detailed information on each of the MAPS codes
and data they contain.
US Applications (MAPS-XML Format ~784MB):
US Granted Patents (MAPS-XML Format ~810MB):
EP Applications and Granted Patents (MAPS XML Format ~nnn MB):
Files being prepared now. Links will be updated when added.
WO World Patent Applications
Patent Cooperation Treaty (PCT, MAPS-XML Format ~ 447MB):
JP Unexamined Application Abstracts in English (MAPS-XML Format ~nnn MB):
Files being prepared now. Links will be updated when added.
US, EP, WO and JP CPC Classification Data
(plus DOCDB Family-ID data)
US, EP, WO and JP CPC Classification Data as well as IPCR Classification Data (IPC Version 8)
is maintained and updated monthly on our FTP servers for subscribers to
any of our Text Data Products (i.e., get one or more text subscriptions, get all of the CPC and IPCR Class Data with it!).
The US CPC Class Data comes from the monthly US CPC Master Class File
("US CPC-MCF" for short) that the USPTO began to produce late in 2015.
All EP, WO,
JP CPC Data, and US Reissue CPC data comes from the Backfile and Weekly
DOCDB Updates. The USPTO is still working on adding Reissue Patent CPC Class data
to the CPC-MCF, but it is still not there as of this writing (and we certainly DO need it).
We download and convert all of this data each month into our SUPER-EASY-TO-USE ST.8 Class Format CSV Files - (CSV is Comma Separated Values - easy - and, we like super-easy-to-use things, Who doesn't? Right?).
We initially updated class data bi-monthly since the EPO's new version
of the DOCDB Back-File was not released until January of 2016,
and this made the DOCDB AMEND files almost impossible to deal with (the complexity of the updates was approaching the level of insanity). The delay was caused by the EPO's conversion to the new DOCDB format (and that was a GARGANTUAN task - better them, than us!). Our processing of all of the Backfile was completed in July and this now give is synchronized CPC and IPCR data across the all of the Authorities we support. We are also going back to the Bi-monthly class updates.
Ensuring that all of your Class
Updates end at the same time, and are ALL applied to your system at the
same time is EXTREMELY IMPORTANT for professional searchers who depend
on classifications for their searches
- If they are not (e.g., you are missing one authority's updates for a
month or so), you may have a document that was reclassified, and
was pulled out of one Group, and you may have missed where it was added
to the new Group, and even if your Searcher is searching both Groups, this document will simply NOT show up in the results.
This can spell disaster by missing relevant documents. In fact, being
synchronized with all updates is almost always MORE IMPORTANT than
having current Class data (if you are only late by 30, 60 or no more
than 90 days at the most). As long as your searchers know their classes
and groups, and they cover them properly in their searches they will not
miss documents in a properly synchronized class system.
The CPC Class Data includes CPC Primary, CPC Invention Related and Non-Invention Related Further Classifications, and it also fully supports the Combination Class data (Groups and CPC Rank entries) as well. The IPC Class data supports the Primary and Further Classification Entries.
addition to the CPC and IPCR data, we also provide updated CSV files
containing the DOCDB Family-ID for every file that is supplied by the EPO for
our Collections (which is almost every one of them - only a very small percentage are missing). The EPO has really done a REMARKABLE job getting this information updated. This lets you index the FMID in your database and "Instantly" pull up all family files that you have out in the four collections.
This is a great way to handle post-search processing of the results to
weed out duplicates for searchers reviewing files, and also lets you
check the family files for the correct language (or the language preferred by your searchers), and many other things.
Below is a link to the TEXT INSTRUCTIONS copied from our FTP Servers out
of the CPC Directory this past June. There are also several compressed CPC, IPCR and FMID Update files so you can see the data,
Do not use this Class data since there are newer files
on the FTP site, or will be by the time this page is
posted on our Web
Site. We attempt to be timely providing this data, but we will
PRIORITIZE SYNCHRONIZATION over timeliness every time (as described
above). There have been recent delays caused by damaged files
provided by the
USPTO (that we were the first to report to them this past June, 2016). They PROMPTLY repaired it and had a good file up in 7 or 8 days! (It pays to know who to contact at each of the Authorities, and we maintain good relationships with all of them - and THANKFULLY, all of the patent authorities seem to have good, hard working, knowledgeable personnel in their support staffs - and we sure hope it stays that way!).
Samples of our Text Instruction in the CPC Directory on the FTP site for subscribers:
Samples (actual data) for CPC, IPCR and FMID data, including Backfiles, Updates and an Amend File
from the CPC Directory on the FTP site for subscribers of any text data:
NOTE on CSV Updates with CREATE and AMEND file CPC Data:
1. AMEND CSV File Updates -
Even though the Amend file names are unique (week number and date), we
separate them into Weekly Numbered Sub-directories in the Archive for
anyone doing manual CPC updates to a system. If your CPC and IPCR Class
Update software does NOT use or check the ST.8 Action Date, you MUST
apply the Amend Updates in order after each weekly Create file. The
order for a complete Update to bare index files is:
A. All Backfile CSV files for this Authority (for the sake of this example, the backfile ends on 2016 Week 5)
B. Next, apply Weekly Create File for Week 6.
C. Next, apply Weekly Amend File for Week 6.
D. Next, apply Weekly Create File for Week 7.
E. Next, apply Weekly Amend File for Week 7.
F. Next, apply Weekly Create File for Week 8.
G. Next, apply Weekly Amend File for Week 8.
Repeat Create then Amend Loop for all weeks on hand, in order.
2. FMID data can be updated in any order
from any of the FMID files since the FMID is simply repeated for the
same files each time it is saved in the CSV files. If we generated this
from one of our database systems, we would not repeat the entries for
any publication number, but this is done directly from the DOCDB parser
software since we do not want to add the possibility of a problem in our
database corrupting the data, however remote the chance may be.
If your database already has the Family-ID for a Pub, simply Skip the
entry. For FMID element format verification, the FMID element is a
number from 1 to 232 -1. In other words, a 32 bit unsigned number (or 1 to 4,294,967,295).
IP DATA HOME PAGE
IP Data Corporation
704 W. Park Ave,
Suite C, Edgewater FL. 32132 U.S.A.
Copyright © 2005-2016 IP Data Corporation, ALL RIGHTS RESERVED