Register | Login | |||||
Main
| Memberlist
| Active users
| ACS
| Commons
| Calendar
| Online users Ranks | FAQ | Color Chart | Photo album | IRC Chat |
| |
0 user currently in Programming. | 3 guests |
Acmlm's Board - I2 Archive - Programming - Grabbing URL's with PHP? | | | |
Add to favorites | "RSS" Feed | Next newer thread | Next older thread |
User | Post | ||
windwaker Ball and Chain Trooper WHY ALL THE MAYONNAISE HATE Level: 61 Posts: 1562/1797 EXP: 1860597 For next: 15999 Since: 03-15-04 Since last post: 4 days Last activity: 6 days |
| ||
I'm trying to build something that'll go to another site of mine in PHP, however, I need to be able to grab urls from the page it's viewing (because I need it to differentiate between images and links). Any ideas? |
|||
Ramsus Octoballoon Level: 19 Posts: 64/162 EXP: 34651 For next: 1126 Since: 01-24-05 From: United States Since last post: 39 days Last activity: 71 days |
| ||
Couldn't you just use a regex like /<a href="(http:\/\/.*?)">/ ? EDIT: In case you're not familiar with using regular expressions in PHP, you'd use the following code with a buffer full of HTML (in this case, $buffer) to get an array of URLs: <?php preg_match_all("/<a href=\"(http:\/\/.*?)\">(.*?)<\/a>/", $buffer, $links); // $links[0] is an array filled with all of the anchor tags // $links[1] is an array filled just with the URLs from those tags // $links[2] is an array filled with the names of the links foreach ($links[1] as $link) { echo "URL: $link \n"; } ?> (edited by Ramsus on 05-20-05 08:15 PM) (edited by Ramsus on 05-20-05 08:19 PM) |
|||
windwaker Ball and Chain Trooper WHY ALL THE MAYONNAISE HATE Level: 61 Posts: 1575/1797 EXP: 1860597 For next: 15999 Since: 03-15-04 Since last post: 4 days Last activity: 6 days |
| ||
Ah, I see. I'd used regular expressions, however I'd never really written them on my own; where'd you learn to do that? | |||
Ramsus Octoballoon Level: 19 Posts: 70/162 EXP: 34651 For next: 1126 Since: 01-24-05 From: United States Since last post: 39 days Last activity: 71 days |
| ||
I read a man page one day (a few years ago, I think) and played around with it. Just check out the perlre manual page. It's definitely a must if you do web development, since it is the absolute easiest tool to filter and secure user input with. Perl in taint-mode even requires you use regex's with all input and external variables. | |||
windwaker Ball and Chain Trooper WHY ALL THE MAYONNAISE HATE Level: 61 Posts: 1577/1797 EXP: 1860597 For next: 15999 Since: 03-15-04 Since last post: 4 days Last activity: 6 days |
| ||
:O Thanks tons man! This's what I've been looking for, for quite a while. |
|||
kode54 Level: 4 Posts: 5/7 EXP: 246 For next: 33 Since: 05-09-05 Since last post: 154 days Last activity: 133 days |
| ||
Here's a regular expression I kind of borrowed from Invision Power Board months ago, and later modified. As you can see, it uses case insensitive (i) and also PHP's extended execute (e) attribute, which treats the replacement string as a piece of code to execute instead of merely a replacement. I think you can still catch the echoed output by assigning the return value to an array, but this works as well: $urls = array(); preg_replace('#(^|\s|"|'."'".')((http|https|news|ftp)://\w+[^\s\(\)\[\]"'."'".']+)#ie', '\$urls[] = "\2"', $input_text); Token 2 (the complete link) in every match will be pushed into $urls. Token 1 is only there for the original code to preserve the preceding whitespace, but I added various quotation marks since I encountered various IRC logs where clients or servers added the quotes or other characters. It is probably not a good idea to use this string in a redundant manner, rather to process data once and record somewhere that you processed it. Well, since you're processing a web page, that should mean less complexity than what I was doing. (Thousands of lines of IRC logs, all processed from a MySQL server, every time the page is loaded... No I won't demonstrate.) Also, if you know what you are doing, and you will always be processing properly formed X/HTML content, it may be more secure to parse the pages with the XML extension and locate all anchor tags. Then worry about the Regex if/when you need fulltext scanning. |
|||
HyperLamer <||bass> and this was the soloution i thought of that was guarinteed to piss off the greatest amount of people Sesshomaru Tamaranian Level: 118 Posts: 4702/8210 EXP: 18171887 For next: 211027 Since: 03-15-04 From: Canada, w00t! LOL FAD Since last post: 2 hours Last activity: 2 hours |
| ||
Treating user input as code could lead to some nasty security flaws though. You'd need to make sure you sealed up any possible holes. | |||
kode54 Level: 4 Posts: 6/7 EXP: 246 For next: 33 Since: 05-09-05 Since last post: 154 days Last activity: 133 days |
| ||
That does not treat user input as code. It merely processes a token of the data which the expression finds using its own code. I don't think it's vulnerable to double-quotes faking out the processor code either. Even if that were possible, the expression cuts off at the first single or double quote character. There may yet be a vulnerability somewhere in there, but since IPB is using almost the same code, I presume it to be safe. Just to clarify the preg_replace "e" flag, it specifies that your replacement string, which in this case is a string constant, is a piece of PHP code to be executed once per match. I will have to look it up again, as I am not sure if it means that said code can "echo" or otherwise manipulate the standard output and in turn be piped as the replacement text, which is eventually output as the return value of preg_replace(). My example simply ignores the return and stores the data in an array created outside of the function. In fact, you could do like my code does and further foreach() parse an array of strings, or while() parse a SQL result set. As I said, Regex isn't the only way. You may want to experiment with the XML extension, it may prove to be faster for handling just anchors and/or img tags than scanning the raw text with a regular expression. |
|||
windwaker Ball and Chain Trooper WHY ALL THE MAYONNAISE HATE Level: 61 Posts: 1630/1797 EXP: 1860597 For next: 15999 Since: 03-15-04 Since last post: 4 days Last activity: 6 days |
| ||
Well, while we're on the subject of PHP... How do you do something like... $functionname = "mycustomfunction"; execute_function($functionname); Where execute_function is a function that runs a function called mycustomfunction()? |
|||
kode54 Level: 4 Posts: 7/7 EXP: 246 For next: 33 Since: 05-09-05 Since last post: 154 days Last activity: 133 days |
| ||
Here is information on the function handling functions. call_user_func may be what you want, but create_function may also be handy for declaring the function at once, or even custom generating function code on the fly, in the event that you find it more efficient to generate one function to be executed repeatedly. |
Add to favorites | "RSS" Feed | Next newer thread | Next older thread |
Acmlm's Board - I2 Archive - Programming - Grabbing URL's with PHP? | | | |