Custom site

June 21, 2015, 8:48 am

Hello, I have a site with rss by categories, for example: http://site.com/xbox/rss.xml http://site.com/ps4/rss.xml I have made in custom/ directory the file site.com.txt, Now, i want different rules for each of the rss category, like i do not want image from xbox but for ps4 i do I tried to make in custom/ files like site.com.xbox.txt and site.com.ps4.txt and then in the url adding &siteconfig=site.com.xbox.txt but no luck, is it possible to specify which txt file to use for site patterns? Regards

↧

Replace the title

June 25, 2015, 6:43 am

≫ Next: Images are broken in feed

≪ Previous: Custom site

Hello, I have a RSS with a truncated title name, i can get the full title from the content h1, editing the site config with a line like that title: //h1 will not replace the final title. what am i missing?

↧

Images are broken in feed

July 7, 2015, 1:54 am

≫ Next: Combining two XPATHs in body: (site configuration for theregister.co.uk)

≪ Previous: Replace the title

Hi! I can't get blogs from United Bloggers to work correctly (self hosted). The images are broken, and it seems like an issue with encoding? Two test feeds: http://www.julianyland.com/feed/ http://www.anettemarie.no/feed/ Is there anything I can do? Any help is greatly appreciated :-)

↧

Combining two XPATHs in body: (site configuration for theregister.co.uk)

August 18, 2015, 4:42 pm

≫ Next: unable to retrieve full-text content - problem parsing the rss feed

≪ Previous: Images are broken in feed

Hi, I've been trying to edit a custom config site for theregister.co.uk and the body: is not working. Here's my config: single_page_link: //link[contains(@href, 'm.theregister')] strip: //div[@class='wptl btm'] body: //div[contains(@class,'article_head')]//h2 | //div[@id='body'] The strip is one I'd recommend adding to the custom config in the next release of FTR, the single_page_link I haven't yet had the occasion to test (but the Reg doesn't seem to use /PRINT/ anymore so the default site config for it will need updating anyway). The body is what is not working; it gets the body of the articles (//div[@id='body']) fine, but not the subtitle (//div[contains(@class,'article_head')]//h2) I've tried instead the more complex XPATH generated by http://siteconfig.fivefilters.org: //div[contains(concat(' ',normalize-space(@class),' '),' article_head ')]//h2 | //div[@id='body'] Or the even simpler XPATH: //div[@class="article_head multi_page")]//h2 | //div[@id='body'] It doesn't make any difference, the subtitle is never included. Am I making a stupid mistake, and if so what is it?

↧

unable to retrieve full-text content - problem parsing the rss feed

July 30, 2015, 6:05 am

≫ Next: Problem with escaping characters

≪ Previous: Combining two XPATHs in body: (site configuration for theregister.co.uk)

FTR seems to have issues with analysing/parsing the RSS feed as although the feed has five items, FTR appears to find it only has one (comparing with debug info of feeds that work fine with FTR) which it then can't manage to analyse or load. This looks like it could be a bug. Here's the end part of the debug information: * Fetching feed items * Starting parallel fetch (curl_multi_*) * Processing set of 1 * ...http://www.infiniteideasmachine.com/feed/ * ......in memory * -------- * Processing feed item 1 * Item URL: http://www.infiniteideasmachine.com/feed/ * ** Loading class FeedItem (feedwriter/FeedItem.php) * URL already fetched - in memory (http://www.infiniteideasmachine.com/feed/, effective: http://www.infiniteideasmachine.com/feed/) * Done!

↧

Problem with escaping characters

July 9, 2015, 1:59 am

≫ Next: Filtering options

≪ Previous: unable to retrieve full-text content - problem parsing the rss feed

If I use the form and enter 'http://content.met.police.uk/cs/Satellite?c=MPSSafer_C&cid=1257246961010&feed=news&p=1257246745756&pagename=MPS_CMS_Internet/MPSRSSLayout' (without quote) in the URL feed, it generates the feed. However if I try to use the generated feed, FTR is unable to retrieve the full content This is how it encodes the URL in the generated feed: content.met.police.uk/cs/Satellite?c=MPSSafer_C&cid=1257246961010&feed=news&p=1257246745756&pagename=MPS_CMS_Internet/MPSRSSLayout If I try with the URL encoded using Meyerweb tools: content.met.police.uk%2Fcs%2FSatellite%3Fc%3DMPSSafer_C%26cid%3D1257246961010%26feed%3Dnews%26p%3D1257246745756%26pagename%3DMPS_CMS_Internet%2FMPSRSSLayout it still doesn't work. BTW if I try the encoded URL in the form, it generates an Invalid URL supplied message (and debug doesn't help). As it works fine when decoded in the form I strongly suspect something is not handling the decoding correctly somewhere.

↧

Filtering options

July 8, 2015, 8:29 am

≫ Next: Feed of 20 items is recognised as having only one item; a feed parsing bug?

≪ Previous: Problem with escaping characters

Hi Keyvan, Would you add some filtering options to exclude feed items that contain certain keywords for example in the near future? If its impossible, could you tell us if you know any service that can do that. Regards

↧

Feed of 20 items is recognised as having only one item; a feed parsing bug?

July 9, 2015, 12:39 pm

≫ Next: Something weird RSS and related question

≪ Previous: Filtering options

The feed http://ftr.fivefilters.org/makefulltextfeed.php?url=http%3A%2F%2Fhackney.greenparty.org.uk%2Fnews.rss&max=3&debug is parsed as just one item, however it contains 20 items.

↧

Something weird RSS and related question

August 11, 2015, 4:47 am

≫ Next: CloudFlare & email obfuscation

≪ Previous: Feed of 20 items is recognised as having only one item; a feed parsing bug?

http://help.fivefilters.org/customer/portal/questions/263879-differences-between-real-article-s-address-and-rss-s-address First of all, it's too late to buy your best product. So sorry for about it. I have question about reading full text from partly-written rss. There is an one of the popular blog site in korea, it called "NAVER Blog". And I requested for common support for those NAVER Blog. So, after that, I could get the full text RSS from any NAVER Blog site. But, I had to face with the another problem with naver blog. NAVER Blog support "blog.naver.com/OOO" and recently they added the domain "OOO.blog.me". so when I tried to get the rss from "OOO.blog.me" and tried same extract rule. but it failed. Because "OOO.blog.me" has very strange structure for get the address of every RSS. For example, the site I want to get the full text is "http://santa_croce.blog.me/" They gave the "http://santa_croce.blog.me/rss" and the first link of RSS is connected to "http://santa_croce.blog.me/220418678330". And then, in the HTML source of site, "" So, to in the "src", "" It is almost similar to the custom extraction rule for "blog.naver.com". But compared to "blog.naver.com", it has one more frame and one more things to get the text. I know that it isn't able to get the custom rule for personal use. but I think it need the new way of get the text from RSS, and It could be better to get more RSS and better quality. Thank you for your continuous update and making of Full-Text RSS.

↧

CloudFlare & email obfuscation

July 26, 2015, 11:49 pm

≫ Next: problems with retrieving date

≪ Previous: Something weird RSS and related question

I have a feed that uses CloudFlare for email obfuscation. When the feed article is scraped, these emails are converted to "email protected" and can't be retrieved. Wondering if anyone else has come across this or similar js obfuscation and if there is a workaround. Thanks!

↧

problems with retrieving date

July 27, 2015, 2:12 am

≫ Next: Wordpress

≪ Previous: CloudFlare & email obfuscation

Hello, we have difficulties with getting date from tag. the example is 21.07.15 15:19 and we want to get date with the following pattern: date: //div[@class='pull-right']//small[@class='muted'] we still don't get it can you, please help us Thank you

↧

Wordpress

July 29, 2015, 4:12 pm

≫ Next: URLS IN FEED

≪ Previous: problems with retrieving date

Is there a plugin for the full text rss api and are we able to import these feeds as wordpress posts with a featured image? Also for the full text rss can it grab thee partial feeds but import the full articles to? Thanks!

↧

URLS IN FEED

August 8, 2015, 1:47 am

≫ Next: Images are not showing in the Full Text RSS Feeds.

≪ Previous: Wordpress

I thought we can include one or more websites per feed.... is this not that case???

↧

Images are not showing in the Full Text RSS Feeds.

August 20, 2015, 3:11 am

≫ Next: Full-text for Forbes?

≪ Previous: URLS IN FEED

Images are not showing in the Full Text RSS Feeds. They are showing in the original rss feeds but not the full text rss versions. Why?

↧

Full-text for Forbes?

September 1, 2015, 5:11 pm

≫ Next: How to use item_desc?

≪ Previous: Images are not showing in the Full Text RSS Feeds.

Forbes recently redesigned their website and I can't figure out how to create a site config file that will work. Using Full-Text RSS 3.4, but I've also tried with 3.5 here on FiveFilters -- no luck. I think they may be serving all their content via Javascript (which could be the problem), but would appreciate a second opinion. I might just be doing this wrong! Testing things with the following article: http://www.forbes.com/sites/abigailtracy/2015/08/21/its-a-verizon-world-we-just-live-in-it/ Thank you in advance for any help.

↧

How to use item_desc?

August 24, 2015, 3:17 pm

≫ Next: img tags after parsing just return the alt value

≪ Previous: Full-text for Forbes?

Hello, I am trying to generate from http://www.tori.fi/uusimaa?f=&q=&cg=5030&w=1&st=s&st=g&c=0&ca=18&l=0&md=th The problem is, i cannot get the price shown on feed description using item_desc parameter. Description field is just empty no matter which css div or class i use. Am i doing something wrong?

↧

img tags after parsing just return the alt value

August 27, 2015, 12:49 pm

≫ Next: feed creator; more than 20?

≪ Previous: How to use item_desc?

Purchased this because of it's ability to easily be modified...and why reinvent the wheel. That being said it's late and I can't figure this out for the life of me... I've got two custom rules set up blog.cleveland.com.txt and a cleveland.com.txt Pretty sure I need both as the links go back and forth (not to running debug it looks for both although really it's the cleveland.com one doing the work.) here's my custom for cleveland.com body: //div[@id='content'] tidy: no #parser: html5lib strip_id_or_class: entry_widget_right strip_id_or_class: ArticleSidebar strip_id_or_class: best_of strip_id_or_class: newrelated strip: //div[@id='social_bottom'] strip: //div[@class='social_simple'] strip: //div[@class='CommentCount'] find_string: onerror="resimg.imageError(this)" replace_string: "" #find_string: Johnny Manziel and that's actually progress that I've just noticed since my last test (evidently didn't bother to look at the markup) the span and everything in it is new, for hours it's just been Johnny Manziel Someone please help me with this...

↧

feed creator; more than 20?

August 27, 2015, 1:36 pm

≫ Next: Image not shown

≪ Previous: img tags after parsing just return the alt value

Hi, I was curious if there was away to combine more than 20 feeds using the feed creator tool? Would it work for me to combine 10 feeds into 1 feed, and repeat that 3 times so that I have 3 feeds and then combine those 3 into 1? thanks! Adam

↧

Image not shown

September 7, 2015, 2:48 am

≫ Next: Extract Date from Articles?

≪ Previous: feed creator; more than 20?

Hi, I just test this rss feed: http://detik.feedsportal.com/c/33613/f/656124/index.rss Why the image not shown? Thanks

↧

Extract Date from Articles?

September 7, 2015, 2:35 am

≫ Next: Help creating site pattern, need to add image.

≪ Previous: Image not shown

Hi Guys! First of all, I really love your products, and have bought the hosted version for both Full-Text and Feed-Creator ;) Been testing it both with great results. I saw on the latest release the following: - New parameter: item_date - CSS selector to pick out item dates (extract.php endpoint) How can we use this? In particular, I'm using Full-Text-RSS on the Engadget.com website, just to grab some specific articles, the description extraction goes ok, but I cant get the articles date. How can it be done? (extract dates from articles) can you provide an example? thanks a lot!!

↧