I recently came across this post and wanted to share it with you to show you how "the other side" of web scraping.
We are in the process of migrating a comparatively small web application (~2k pages) embedded in a large (enterprise scale) portal to a new platform. The rewrite is an enabling step for future expansion, albeit some minor new features will be added on the fly already.
Other than that, the site should basically look identical to end users after the migration; internally the DOM as well as CSS and JS composition are changing significantly though, thanks to our excellent front end designer/developer, who is driving accessibility 'on the side' (i.e. unobtrusive JavaScript only etc.).
I'm aware of and facilitate unit and functional testing of web sites via various tools like Selenium, Canoo Webtest, HTTrack etc.; nonetheless I'm lacking a procedure on how to achieve this task without a lot of manual test/XPath coding - here is what's desired:
Here's how I'd approach this, approximately:
This would yield two DOM fragments that should be semantically identically through users eyes, but will be quite different internally; consequently I'll need to:
Is there an established procedure/tooling available for this kind of task or are my automation desires too sophisticated here?
Source: stackoverflow.com