MARLIES is an information extraction system targeted at extracting contact data and detailed data on services offered by companies in the manufacturing industry via their Web sites. The system constitutes one component of the intermediary system tech2select, which is operated by the Austrian company Tech2select GmbH. Its development by FAW is partly funded by the Austrian Research Promotion Agency FFG under grant FFG 817789.
Services as treated by MARLIES are machines or manufacturing processes promoted by a supplier. In order to build the base for a business assignment the services must be specified in sufficient detail, including information on the processable material and dimension, which are further described by measurements, units and values. On account of the given requirements an ontology and rule based approach was implemented. The major challenges of MARLIES are on the one hand the proper ontological modelling of highly structured and complexly related technical data, which constitutes the basis for the realized approach of an ontology aware annotation, on the other hand, the extraction of relations between the data units while tackling structural provocations, as related data on a service might be spread over several Web pages and is often concealed in nested tables.