Webstruct
stable
  • Webstruct
  • Tutorial
  • Reference
  • Changes
Webstruct
  • Docs »
  • Webstruct
  • Edit on GitHub

Webstruct¶

Webstruct is a library for creating statistical NER systems that work on HTML data, i.e. a library for building tools that extract named entities (addresses, organization names, open hours, etc) from webpages.

Contents:

  • Webstruct
    • Overview
    • Installation
  • Tutorial
    • Get annotated data
    • From HTML to Tokens
    • Feature Extraction
    • Using a Sequence Labelling Toolkit
    • Named Entity Recognition
    • Entity Grouping
    • Model Development
  • Reference
    • HTML Loaders
    • Feature Extraction
    • Model Creation Helpers
    • Metrics
    • Entity Grouping
    • Wapiti Helpers
    • CRFsuite Helpers
    • WebAnnotator Utilities
    • BaseSequenceClassifier
    • Miscellaneous
  • Changes
    • 0.6 (2017-12-29)
    • 0.5 (2017-05-10)
    • 0.4.1 (2016-11-28)
    • 0.4 (2016-11-26)
    • 0.3 (2016-09-19)

Indices and tables¶

  • Index
  • Module Index
  • Search Page
Next

© Copyright 2014-2017, Scrapinghub Inc.. Revision d5a7fcff.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: stable
Versions
latest
stable
0.6
0.5
0.4.1
0.4
0.3
0.2
Downloads
pdf
htmlzip
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.