How To Extract Data From A Wikipedia Article?
I have a question regarding parsing data from Wikipedia for my Android app. I have a script that can download the XML by reading the source from http://en.wikipedia.org/w/api.php?a
Solution 1:
Unfortunatelly, it seems the mediawiki.org documentation for parse
doesn't tell you how to do this. But the documentation in the API itself does: You can use section
parameter. And you can use prop=sections
to get the list of sections.
So, you could first use:
to get the list of sections and then
to get the HTML for a certain section.
Solution 2:
action=parse doesn't work well with per-section parse, consider this shoert example:
Foo is a bar<ref>really!</ref>==References==<references/>
Parsing just the zeroth section will result in red error message about without while parsing the first one will result in empty references list.
However, there's a better solution: action=mobileview is not only free from this problem, but it's also specifically intended for mobile apps and gives you mobile-optimized HTML.
Post a Comment for "How To Extract Data From A Wikipedia Article?"