This project has moved and is read-only. For the latest updates, please go here.

Parsing Nginx Auto Index

Topics: Developer Forum, User Forum
Mar 8, 2014 at 5:45 AM
I'm trying to parse an nginx auto index page to get the links from a download directory and their timestamps.

I have successfully retrieved the links and their "names" so to speak but I am struggling with the timestamp.

I have the following code:
return doc.DocumentNode.SelectNodes("//a").Select(anchor => new IndexPageLink
                    Link = new Uri(root, anchor.InnerText),
                    Name = anchor.InnerText
Which is parsing the following HTML structure
<pre><a href="../">../</a>
<a href="file.txt">file.txt</a>      24-Jan-2014 01:50    5M
I've tried looking at the next element, which correctly shows as text element but it only has new line characters. I can definitely see the text when I look at the document from the pre node but it would be nice to process relative to the anchors that I find with the select nodes search.

Any ideas?