CoderZone.org
Pages: 1 « previous     next »
  Print  
Author Topic: Auto-link text, but not if already inside a link...?  (Read 33404 times) Bookmark and Share
Cruise Elroy
Newbie
*
Posts: 14



View Profile
« on: Feb 11, 2011, 05:24:45 pm »

I have a list of words that I'd like to make into links if they appear in a block of text, but only if they aren't already inside of a link. For example, using this text:

The dog is a domesticated form of the gray wolf, a member of
the Canidae family of the order Carnivora. The term "dog" is used for
both feral and pet varieties. The domestic dog has been the most
widely kept working, hunting and companion animal in human history.

Let's say I'd like to link the words "wolf", "dog", and "hunting", but only if they aren't already linked in the text. The first instance of dog would be linked, but the second instance would be left as is since it's already within a link.

Is there a bit of regex and/or code that could do this?
Logged
Cruise Elroy
Newbie
*
Posts: 14



View Profile
« Reply #1 on: Feb 11, 2011, 08:34:08 pm »

Rexep was not the way to go...after some research it seems that it can be done (sort of), but it gets extremely messy and shaky if you have nested tags and what not. I found this solution on the web on phpbuilder.com:

http://www.phpbuilder.com/board/showpost.php?p=10411554&postcount=12

Essentially it works by removing the text you want to keep from being altered. You can then work on the remaining text to your heart's content, and then use str_replace() to put the protected bits back in. It's not elegant, but it absolutely works like a charm. Here's the post from the link above, slightly condensed.

Here's the function:

Code:  
Highlight Mode: (PHP)
  1. function protect_from_borking($find, &$contents, &$placeholders, &$text){    
  2.    static $n=0;
  3.    $count=0;
  4.    while(preg_match($find, $text, $matches))
  5.    {    $contents[$count]=$matches[0];
  6.        $placeholders[$count]="@@".$n."_".$count."@@";
  7.        $text = preg_replace($find, $placeholders[$count], $text, 1);
  8.        ++$count;
  9.    }
  10.    ++$n;
  11. }
 

For it to be used, note that four parameters need to be supplied:

  • $find contains a regexp pattern that matches the sort of text one wants to protect. If, for example, you don't want to touch anything between <b>...</b> tags, the pattern would be #<b>.*?</b>#. Note that the entire match is stored, so in this example, the bold tags themselves would be protected.
  • &$contents is a reference to an array that should be empty when the function is called. After the function is completed, it will contain all the protected text that has been taken out.
  • &$placeholders is also initially empty, and afterwards contains copies of all the text that the corresponding entries in $contents have been replaced by.
  • &$text is the text being worked on.

The function takes text out - after altering the remaining text the protected bits are put back in:

$text = str_replace($placeholders, $contents, $text);

The function may be used to protect several different classes of thing: this helps simplify the task of determining appropriate regexps, because you don't need to write one regexp to do everything. It automatically makes sure that each call to the function produces different placeholders by using a static variable $n which in effect counts how many times the function has run, and using that number in the placeholders it generates.
Logged
cuberat
Newbie
*
Posts: 40


View Profile
« Reply #2 on: Feb 12, 2011, 09:51:36 am »

Different approach ...

Code:  
Highlight Mode: (PHP)
  1. <?php
  2. $string='The dog is a domesticated form of the gray wolf, a member of
  3. the Canidae family of the order Carnivora. The term <a href="http://doggie.com">"dog"</a> is used for
  4. both feral and <a href="http://pet.com">pet</a> varieties. The domestic dog has been the most
  5. widely kept working, hunting and companion animal in human history.';
  6.  
  7. echo 'Original string'.PHP_EOL.$string.PHP_EOL;
  8.  
  9. // Remove all but the link tags
  10. $stripped=strip_tags($string,'<a>');
  11.  
  12. // Extract all the links
  13. $matches=array();
  14. $links=preg_match_all('#<a[^>]+>([^<]*)</a>#s',$stripped,$matches);
  15.  
  16. // Array with links for new terms
  17. $link_array=array('"dog"'=>'http://doggie2.com','cat'=>'http://kitty.com','pet'=>'http://pet.com','companion'=>'http://buddy.com');
  18.  
  19. // If there are links
  20. if ($links)
  21. {
  22.        // Go through all the links.  Restore those removed, or replace the new text
  23.        foreach ($link_array as $k => $v)
  24.        {
  25.                if (($i=array_search($k,$matches[1]))!==false)
  26.                        $link=preg_replace('/<a href=[\'\"]([^>^\'^\"]*)[\'\"]*[^>]+>(.*)$/','$1',$matches[0][$i]);
  27.                else
  28.                        $link=$v;
  29.                $patterns[]='/\W+'.preg_quote($k).'\W+/';
  30.                $replacements[]=' <a href="'.$link.'">'.$k.'</a> ';
  31.        }
  32.        // Remove all tags
  33.        $stripped=strip_tags($string);
  34.  
  35.        $new=preg_replace($patterns,$replacements,$stripped);
  36.        echo 'Updated string'.PHP_EOL.$new.PHP_EOL;
  37. }
  38. else
  39.        echo 'No links'.PHP_EOL;
  40.  
 

Output

Original string
Code:  
Highlight Mode: (HTML)
  1. The dog is a domesticated form of the gray wolf, a member of
  2. the Canidae family of the order Carnivora. The term <a href="http://doggie.com">"dog"</a> is used for
  3. both feral and <a href="http://pet.com">pet</a> varieties. The domestic dog has been the most
  4. widely kept working, hunting and companion animal in human history.
 

Updated string
Code:  
Highlight Mode: (HTML)
  1. The dog is a domesticated form of the gray wolf, a member of
  2. the Canidae family of the order Carnivora. The term <a href="http://doggie.com">"dog"</a> is used for
  3. both feral and <a href="http://pet.com">pet</a> varieties. The domestic dog has been the most
  4. widely kept working, hunting and <a href="http://buddy.com">companion</a> animal in human history.
  5.  
  6.  
 
Logged
Max
Jr. Member
*****
Posts: 75



View Profile WWW
« Reply #3 on: Feb 13, 2011, 08:32:51 am »

Did that work right, or did I miss something....? It looks like the word "dog" didn't get linked in the updated string. (?) Or maybe I didn't understand what Cruise wanted. If I understood him right he wanted to link all the text terms that weren't already linked.

Original string
Code:  
Highlight Mode: (HTML)
  1. The dog is a domesticated form of the gray wolf, a member of
  2. the Canidae family of the order Carnivora. The term <a href="http://doggie.com">"dog"</a> is used for
  3. both feral and <a href="http://pet.com">pet</a> varieties. The domestic dog has been the most
  4. widely kept working, hunting and companion animal in human history.
 

Updated string
Code:  
Highlight Mode: (HTML)
  1. The dog is a domesticated form of the gray wolf, a member of
  2. the Canidae family of the order Carnivora. The term <a href="http://doggie.com">"dog"</a> is used for
  3. both feral and <a href="http://pet.com">pet</a> varieties. The domestic dog has been the most
  4. widely kept working, hunting and <a href="http://buddy.com">companion</a> animal in human history.
  5.  
 
Logged
cuberat
Newbie
*
Posts: 40


View Profile
« Reply #4 on: Feb 13, 2011, 08:46:11 am »

The first dog didn't get auto-linked, because the list has quotes, and the word didn't.  The "dog" did not get updated because it already had a link. 

I didn't test this as carefully as I could have, concerns/issues/notes:

It might not keep track of whether it placed a link or not, but preg_replace may only replace the first instance by default, which would comply.

All the other tags are removed, which may be a good thing - or not.

It didn't replace the 'cat' in domesticated - which is good, because of the 'non-word' delimiters.

Logged
cuberat
Newbie
*
Posts: 40


View Profile
« Reply #5 on: Feb 13, 2011, 05:48:17 pm »

Minor updates - added the limit paramater on the preg_replace to ensure only the first instance is replaced.  Removed the test for links, which was unnecessary.

Code:  
Highlight Mode: (PHP)
  1. <?php
  2. $string='The dog is a domesticated form of the gray wolf, a member of
  3. the Canidae family of the order Carnivora. The term <a href="http://doggie.com">"dog"</a> is used for
  4. both feral and <a href="http://pet.com">pet</a> varieties. The domestic dog has been the most
  5. widely kept working, hunting and companion animal in human history.';
  6.  
  7. echo 'Original string'.PHP_EOL.$string.PHP_EOL;
  8.  
  9. // Remove all but the link tags
  10. $stripped=strip_tags($string,'<a>');
  11.  
  12. // Extract all the links
  13. $matches=array();
  14. $links=preg_match_all('#<a[^>]+>([^<]*)</a>#s',$stripped,$matches);
  15.  
  16. // Array with links for new terms
  17. $link_array=array('dog'=>'http://puppy.com','"dog"'=>'http://doggie2.com','cat'=>'http://kitty.com','pet'=>'http://pet.com','companion'=>'http://buddy.com');
  18.  
  19. // Go through all the links.  Restore those removed, or replace the new text
  20. foreach ($link_array as $k => $v)
  21. {
  22.        if (($i=array_search($k,$matches[1]))!==false)
  23.                $link=preg_replace('/<a href=[\'\"]([^>^\'^\"]*)[\'\"]*[^>]+>(.*)$/','$1',$matches[0][$i]);
  24.        else
  25.                $link=$v;
  26.        $patterns[]='/\W+'.preg_quote($k).'\W+/';
  27.        $replacements[]=' <a href="'.$link.'">'.$k.'</a> ';
  28. }
  29. // Remove all tags
  30. $stripped=strip_tags($string);
  31.  
  32. $new=preg_replace($patterns,$replacements,$stripped,1);
  33. echo 'Updated string'.PHP_EOL.$new.PHP_EOL;
 


Original string
Code:  
Highlight Mode: (HTML)
  1. The dog is a domesticated form of the gray wolf, a member of
  2. the Canidae family of the order Carnivora. The term <a href="http://doggie.com">"dog"</a> is used for
  3. both feral and <a href="http://pet.com">pet</a> varieties. The domestic dog has been the most
  4. widely kept working, hunting and companion animal in human history.
 
Updated string
Code:  
Highlight Mode: (HTML)
  1. The <a href="http://puppy.com">dog</a> is a domesticated form of the gray wolf, a member of
  2. the Canidae family of the order Carnivora. The term <a href="http://doggie.com">"dog"</a> is used for
  3. both feral and <a href="http://pet.com">pet</a> varieties. The domestic dog has been the most
  4. widely kept working, hunting and <a href="http://buddy.com">companion</a> animal in human history.
  5.  
 
Logged
Tags:
Pages: 1
  Print  
 
Jump to: