Package org.wikidata.wdtk.examples
Class DataExtractionProcessor
java.lang.Object
org.wikidata.wdtk.examples.DataExtractionProcessor
- All Implemented Interfaces:
EntityDocumentProcessor
This simple
EntityDocumentProcessor finds all items with a GND
identifier (property P227) who are also humans (P31 with value Q5), and
extracts for each of them the id, GND value, as well as English and German
labels and Wikipedia articles, if any. The results are written to a CSV file
"extracted-data.csv". The extracted property can be modified by changing the
value for extractPropertyId. The current code
only extracts the first value for this property if many are given. The filter
condition (P31::Q5) can also be changed in the code.- Author:
- Markus Kroetzsch
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()static voidMain method.static voidPrints some basic documentation about this program.voidPrints the current status, time and entity count.voidprocessItemDocument(ItemDocument itemDocument) Processes the given ItemDocument.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.wikidata.wdtk.datamodel.interfaces.EntityDocumentProcessor
processEntityRedirectDocument, processLexemeDocument, processMediaInfoDocument, processPropertyDocument
-
Constructor Details
-
DataExtractionProcessor
- Throws:
IOException
-
-
Method Details
-
main
Main method. Processes the whole dump using this processor. To change which dump file to use and whether to run in offline mode, modify the settings inExampleHelpers.- Parameters:
args-- Throws:
IOException
-
processItemDocument
Description copied from interface:EntityDocumentProcessorProcesses the given ItemDocument.- Specified by:
processItemDocumentin interfaceEntityDocumentProcessor- Parameters:
itemDocument- the ItemDocument
-
printStatus
public void printStatus()Prints the current status, time and entity count. -
printDocumentation
public static void printDocumentation()Prints some basic documentation about this program. -
close
public void close()
-