Using AI on patents

August 7, 2018

This article was contributed by Andy Oram

Software patents account for more than half of all utility patents granted in the US over the past few years. Clearly, many companies see these patents as a way to fortune and growth, even while software patents are hated by many people working in the free and open-source movements. The field of patenting has now joined the onward march of artificial intelligence. This was the topic of a talk at OSCON 2018 by Van Lindberg, an intellectual-property lawyer, board member and general counsel for the Python Software Foundation, and author of the book Intellectual Property and Open Source. The disruption presented by deep learning ranges from modest enhancements that have already been exploited—making searches for prior art easier—to harbingers of automatic patent generation in the future.

Automating drudgery

For the past couple decades, lawyers have been gradually automating searches through patents, as has already been done with case law. Paying a human patent hunter—a job category all its own—to find prior art that can overturn a patent costs $5,000 to $10,000. As Lindberg said: "It's miserable when people have to do a computer's work." Searching patents presents different challenges from court rulings, but patents have structural similarities that help overcome these challenges.

The main twist that make patents harder to search than other types of legal documents springs from the tendency patent filers have to invent idiosyncratic terminology. To grant them some slack, we can acknowledge that a patent is supposed to introduce something new into the world, so how can you use old terminology to describe something novel? But, in reality, many patent filers deliberately use oddball terms for processes that could be described in more recognizable, everyday language. In other words, because a patent has to be novel, the filer may distort the language to emphasize the novelty of the device or process.

How can deep learning enhance the search for prior art? Lindberg said that natural language processing (NLP) can create indexes of terms. Then, as search databases created by tools such as Lucene do, these indexes can be queried to find which terms seem to be associated with others. NLP can do this through a variety of algorithms involving how often each term appears, how closely together two terms appear in the text, and so on. Algorithms can create graphs for patent documents and find similarities among patents, in ways similar to how social scientists create graphs of human networks and figure out who is close to whom; that is how Facebook and LinkedIn can suggest people you may want to connect with.

If deep learning turns up two patent applications where terms appear to be used in similar ways—appearing in the same relationship to other common words—it may indicate that the idiosyncratic term in a new patent application is just a synonym for a standard term in an existing patent. And that may indicate prior art that can be used to deny a patent application or overturn a patent that was already granted on the basis that it's not novel.

Improvements offered by structural work

As partial compensation for the impenetrability of patent language, patent descriptions have a predictable structure: summary, claims, etc. The descriptions are almost always rich with figures that bear labels with useful keywords. Lindberg said that these figures provide a valuable resource for data mining.

He said that, up until the past few years, the hardest part of deep learning for prior-art searches was getting clean, usable data. (This is true for many applications of AI.) He said it could take six months to scrape the patent office's site and run standard analysis such as lemmatization (a fancy word for stemming, such as finding the common stem in "person", "impersonal", and even "people") to produce a good index.

Some of the easier queries you can make include getting the top terms used in a patent, and seeing who has been applying for patents in a particular area. The people applying for patents in your specific area may be your primary competitors. Another query would compare the text of a patent to Wikipedia entries in order to find the article most similar to a patent. In this way, you can get a readable description of the underlying technology.

In the past few years, Lindberg said, resources for finding and comparing patents have improved dramatically: more sophisticated databases, better search tools, and new AI-driven techniques. Two well-stocked databases include PatentsView and the patent databases in Google BigQuery Public Datasets. PatentsView, although provided by the US Patent Office, covers the entire world and enhances the text of the patent with metadata such as the number of citations of a patent and other patents filed by its inventors. The European Patent Office's Espacenet, which also covers patents from around the world, offers a RESTful API called Open Patent Services for searching among patents, downloading their contents, and annotating copies.

Lindberg said we can do much more than look for relationships among words using graphs and comparing distances between words. It is now possible to create a sophisticated ontology around a patent. For instance, going beyond lemmatization (stemming), one can find relationships between words that have different roots. For instance, if one patent talks about a "measure" and another talks about a "diagnosis", you may be able to determine that they're talking about the same thing. Subject-predicate-object triples using words like "contains" and "interconnects" can be represented, such as: "sleeve surrounds rotational shaft". This helps find patents covering similar devices or processes.

AI has also been used for sophisticated legal purposes beyond searching, such as summarizing and translating patents.

The future of patent applications

Sometimes, quantitative changes can lead to qualitative disruption. Thus, Lindberg said that an order of magnitude reduction in the cost of patent searches, even if the resulting output has slightly less quality, can drive a whole new market. Because few lawyers are Lindberg, most of them lack both the technical knowledge to employ AI and the vision of its potential. But some companies are using it in the ways he described in his talk. And he predicted that AI will soon lead to far more disruption in the legal field. In fact, some of our basic notions of human ingenuity may be upended by AI in patenting.

For instance, Lindberg predicted that we're two years away from being able to answer the question: "What is the next area ripe for patenting that is most likely to show up in my field?" Even more disquieting, AI may be able to analyze existing patents and create a novel idea that's patentable. Of course, it would also search prior art to see whether someone else has already thought of the idea.

A computer application cannot be a patent applicant. According to Lindberg, the law considers only actual people to be inventors. A truly novel idea invented by an AI automatically becomes prior art—assuming the AI was even able to assert its authorship, which is likely to be more than just a few years off. Although patent AIs might not achieve Ray Kurzweil's singularity before the rest of the world, there's good reason to expect AI to take an increasing role in their generation and review.

Index entries for this article
GuestArticles	Oram, Andy
Conference	OSCON/2018

This is crazy

Posted Aug 12, 2018 7:39 UTC (Sun) by cpitrat (subscriber, #116459) [Link] (1 responses)

The main point of software patents is to give a job to lawyers, this would make them completely useless !

This is crazy

Posted Aug 13, 2018 23:14 UTC (Mon) by k8to (guest, #15413) [Link]

Well played, sir. Well played.