我正在尝试找到一种使用 我在下面尝试过- 但是有了这段代码,我得到的是空值。 非常感谢您提供正确的指针。 答案 0 :(得分:1) 该表不在您从网站请求的html中。它是通过页面上的javascript通过xhr POST请求动态加载的。您可以在Chrome或Firefox开发人员工具中找到它。 好消息是,通过遵循与浏览器相同的链接,您仍然可以在R中获得想要的东西:R but not Selenium (RSelenium)
来刮擦页面'https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/balancesheet/tcs'中的表的方法。library(rvest)
Link = 'https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/balancesheet/tcs'
read_html(Link) %>% html_nodes("#Table1") %>% html_text()
## character(0)
1 个答案:
library(httr)
library(rvest)
base_url <- "https://www.icicidirect.com/idirectcontent/"
url1 <- paste0(base_url, "Research/TechnicalAnalysis.aspx/balancesheet/tcs")
url2 <- paste0(base_url, "basemasterpage/ContentDataHandler.ashx?icicicode=TCS")
response_1 <- GET(url1) # This is the page you can't scrape
# Set the parameters for the POST call (found from developer tools)
parameters <- list(pgname = "BalanceSheet_NonBanking",
ismethodcall = 0,
mthname = "")
# Now post the form and we'll get our table as a response
response_2 <- POST(url2, body = parameters)
# Process it as you did before:
read_html(response_2) %>% html_nodes("#Table1") %>% html_text()