Sitemap Module not reflecting robots.txt permissions in sitemap.xml

n8tgc

Joined: 2012-02-28
Posts: 8
Posted: Tue, 2014-01-21 23:37

I have what I believe to be a properly formatted robots.txt file. Yet, when I use the sitemap module to generate a sitemap.xml, there are folders/albums that are listed that are specifically listed as "disallow" in robots.txt.

This is resulting in many crawl errors with Google.

Thinking that I may have a syntax error in my robots.txt file, I tried an independant, third-party sitemap generator (freesitemapgenerator.com), and the sitemap that was generated was correct (reflected the disallowed folders in robots.txt)

Is there something I may be missing in the installation and use of this module that is preventing it from performing as expected?

 
floridave
floridave's picture

Joined: 2003-12-22
Posts: 27300
Posted: Wed, 2014-01-22 01:49

I doubt that the site map module takes into account the robots.txt file.
Don't know if it takes permissions into account either.

Dave
____________________________________________
Blog & G2 || floridave - Gallery Team

 
floridave
floridave's picture

Joined: 2003-12-22
Posts: 27300
Posted: Wed, 2014-01-22 01:51
Quote:
I tried an independant, third-party sitemap generator (freesitemapgenerator.com), and the sitemap that was generated was correct

Use that then.

Dave
_____________________________________________
Blog & G2 || floridave - Gallery Team

 
n8tgc

Joined: 2012-02-28
Posts: 8
Posted: Wed, 2014-01-22 15:28

Thanks, Dave.

I figured that the third-party solution was going to be the way to go. Sad though. Having an integrated Module to perform this (while acknowledging robots.txt) would have been nice.

Also, this morning, I found that my sitemap to Bing was VASTLY improved. It went from zero URL successfulle submitted (using the Sitemap module) to nearly 400 this morning. Though, I'll have to wait a few days for all of the crawl results to appear on Bing's management console. Regardless, it was a dramatic improvement for both Bing and Google.

Perhaps, in my free time some day in the future, I'll see what I can do to help come up with a module that accomplishes what I am looking for. I can't be the only person that would benefit, right?

~Christian

 
floridave
floridave's picture

Joined: 2003-12-22
Posts: 27300
Posted: Wed, 2014-01-22 15:58

Any help would be beneficial to the community even if you add some notes to the modules docs.

Dave
_____________________________________________
Blog & G2 || floridave - Gallery Team

 
slart

Joined: 2013-11-11
Posts: 112
Posted: Mon, 2014-01-27 18:21

Hi n8tgc,
i have fix the problem with the sitemap.xml.
you can read the tutorial here:
http://galleryproject.org/node/112448

With the robots.txt i have seen in amdin area of the module that there is a "." (dot) is missing. Maybe you can find it and adding it to the right place. Otherwise, it is rather unimportant "robots.txt" with the sitemap involve. disable it easily. The sitemap entered in the robots.txt is not necessarily recommended.

The Module:Sitemap works fine on my Gallery3.

 
slart

Joined: 2013-11-11
Posts: 112
Posted: Mon, 2014-01-27 18:42

Oh sorry, my english is not so good. I have my own written robots.txt and Google accepts it, and the modules: Sitemap well. With the fix Sitemap stylesheet it also looks good in the webbrowser. www.example.com/sitemap.xml
What Google does, however, is that it takes up the sitemap-words included in the analysis. see in google webmaster tool or analytics.

robots.txt info: http://www.robotstxt.org/robotstxt.html