Webstruct
0.4
  • Webstruct
  • Tutorial
  • Reference
  • Changes
Webstruct
  • Docs »
  • Webstruct
  • Edit on GitHub

Webstruct¶

Webstruct is a library for creating statistical NER systems that work on HTML data, i.e. a library for building tools that extract named entities (addresses, organization names, open hours, etc) from webpages.

Contents:

  • Webstruct
    • Overview
    • Installation
  • Tutorial
    • Get annotated data
    • From HTML to Tokens
    • Feature Extraction
    • Using a Sequence Labelling Toolkit
    • Named Entity Recognition
    • Entity Grouping
    • Model Development
  • Reference
    • HTML Loaders
    • Feature Extraction
    • Model Creation Helpers
    • Metrics
    • Entity Grouping
    • Wapiti Helpers
    • CRFsuite Helpers
    • WebAnnotator Utilities
    • BaseSequenceClassifier
    • Miscellaneous
  • Changes
    • 0.4 (2016-11-26)
    • 0.3 (2016-09-19)

Indices and tables¶

  • Index
  • Module Index
  • Search Page
Next

© Copyright 2014, Scrapinghub Inc.. Revision 9b7986be.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: 0.4
Versions
latest
stable
0.4
0.3
0.2
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.