Ohh... I got what you need.
Try this:
html_data = """ <td colspan="3"><b>"Assemble under Siegfried!"</b>
<a href="/wiki/index.php/File:Continuous.png" class="image" title="CONT"><img alt="CONT" src="/wiki/images/thumb/7/78/Continuous.png/14px-Continuous.png" width="14" height="17" srcset="/wiki/images/thumb/7/78/Continuous.png/21px-Continuous.png 1.5x, /wiki/images/7/78/Continuous.png 2x">
</a> This unit gains +10 attack for each
<a href="/wiki/index.php/File:Black.png" class="image" title="Black"><img alt="Black" src="/wiki/images/thumb/7/71/Black.png/15px-Black.png" width="15" height="15" srcset="/wiki/images/thumb/7/71/Black.png/23px-Black.png 1.5x, /wiki/images/thumb/7/71/Black.png/30px-Black.png 2x">
</a> and
<a href="/wiki/index.php/File:White.png" class="image" title="White"><img alt="White" src="/wiki/images/thumb/8/80/White.png/15px-White.png" width="15" height="15" srcset="/wiki/images/thumb/8/80/White.png/23px-White.png 1.5x, /wiki/images/thumb/8/80/White.png/30px-White.png 2x">
</a> ally besides this unit.
</td>"""
from bs4 import BeautifulSoup
html = BeautifulSoup(html_data, "html.parser")
texts = [html.find("b").get_text()]
for a in html.find_all("a"):
texts.append(a.attrs.get("title"))
texts.append(a.next_element.next_element.next_element.strip())
print(" ".join(texts))
I don't sure that you realy want. But i purpose need attrs of Tag.
Example: from bs4 import BeautifulSoup
html = BeautifulSoup(html_data)
for a in html.find_all("a"):
print(a.attrs.get("title"))
Output:
CONT
Black
White
If you want download images: from urllib.parse import urljoin import requests from bs4 import BeautifulSoup
cdn_url = "http://some.com/" # root url of site with static content
html = BeautifulSoup(html_data)
for img in html.find_all("img"):
img_response = requests.get(urljoin(cdn_url, img.attrs.get("src"))) #img content should save in file