We propose a large-scale outdoor multimodal dataset, OMMO dataset, containing complex objects and scenes with calibrated images, point clouds and prompt annotations. A new benchmark for several outdoor NeRF-based tasks is established, such as novel view synthesis, surface reconstruction, and multi-modal NeRF. To create the dataset, we capture and collect a large number of real fly-view videos and select high-quality and high resolution clips from them. Then we design a quality review module to refine images, remove low-quality frames and fail-to-calibrate scenes through a learning-based automatic evaluation plus manual review. Finally, a number of volunteers are employed to add the text descriptions for each scene and keyframe. Compared with existing NeRF datasets, our dataset contains abundant real-world urban and natural scenes with various scales, camera trajectories, and lighting conditions.
The pipeline for our dataset generation. The original videos are collected from both YouTube and captured by us, and then fed into the review and annotation module. The former mainly removes low-quality frames and failed scenes; the latter annotates text descriptions for scenes and keyframes.
It's a night city scene with some lights. In the center of the scene is a hexagonal white building. The central building emits a yellowish light through the glass. There are some rectangular buildings around, and some buildings have white light signs on the top. Each building is separated by empty roads.
a white building made of white marble; the building consists of the nave, a large minaret and four smaller minarets on the nave; four towers around the building; nave and minaret both have openwork arched windows; the building is on either side of a car park; in front of the building is a square red open space;
In the center of the image is a small three-story white building. Xiaobailou is located in the corner of a walled basketball court. Outside the fence are green trees and grass. There is a lake in the center of the trees. In the distance of the scene are some neatly arranged buildings.
a night view of the temple with a layered loft; eaves of the temple building are raised at the corners; the top layer is gilt copper tile and copper ridge; the lower layer is green glazed tile and glazed ridge ornaments; each floor of the pavilion has observation platforms; red pillars hold up the roof; the lights set off the temple in a magnificent way
The ground floor of the white building is supported by columns; on the top floor of the building, many carved portraits are on either side of the windows; buildings with flat glass roofs; a glass pyramid in the center of the square; the large pyramid is surrounded by 3 little glass pyramids; under the glass pyramid is the exhibition and staircase; right of the building is a river;
a memorial hall in the center of the square; a lawn in front of the square; trees are planted around memorial hall and lawn; one road in front of the lawn; various buildings around trees; one playground behind memorial hall
a water building with white shell-shaped roof; the glass on the building surface is plain and topaz colored; a white carport in front of the building and some cars are parking; behind the building is a round island with many books planted; A river runs in front of the building; a bridge across the river is in front of it on the left; in the distance are some groups of buildings
a day view of the temple with a layered loft; a temple stands on top of a hill covered with trees; white eaves of the mountain gate; eaves of the temple building are raised at the corners; the top layer is gilt copper tile and copper ridge; the lower layer is green glazed tile and glazed ridge ornaments; each floor of the pavilion has observation platforms; red pillars hold up the roof; In the distance are many high building
A huge oval black flat-topped court; a large lawn in the middle of the court; the large lawn is surrounded by a stepped auditorium; the body of the court is surrounded by glass and steel pieces; a T-shaped building is on the left side of the court; the building is surrounded by a lawn with the words "TO DARE IS TO DO" on it; in the background are residential buildings and trees
the building surrounded by red city walls; the bottom of the building is a nine-story square building with windows on both sides of the front, a balcony with golden curtains in the middle, and windows on the other three sides; the middle of the building is a layer of white walls and black walls of windows and balconies respectively; top of the building is the golden roof and cocked eaves, surrounded by golden columns; in front of the building are two gates with raised eaves; the building is surrounded by trees
a signage composed of several columns on the square; one irregular pool below the signage; trees planted at equal intervals on the side of the square; non-motor vehicles parked along the trees; low shrubs distributed on the square in parallel; three pedestrians are walking along the street; three children are playing micro scooters in the square; people are standing in the square
a variety of buildings in the background; a T-shaped 10-story white residential building in the middle; seven 6-story residential buildings on the side of the white building; low old houses of different heights surround the white building; cars are parking among the houses; an ancient tower beside the complex; the tower is separated from the building complex by a road
The scene consists of many small islands and the sea in the distance. Each islet is separated by regular rivers. Both the river and the sea are blue. There are dense buildings on the island, and roads connect them together. There are many trees and grass around the building.
a building with a cubic plinth between many buildings; edges of the building take the shape of inverted angles; facades of the buildings are composed of slender isosceles triangular shaped glass; top of the building is a square plan; the spire portion of the building is a hybrid structure consisting of an antenna and a three-tiered platform ring; the antenna is wrapped in a radome and the geometry of the radome housing is based on a repetitive modular system; the building is taller than all the other buildings around it
a cylindrical high residential building in the middle; brown residential buildings beside the building; dense low buildings beside brown residential buildings; a wide road on one side of buildings; an overpass on the road; cars are passing on the road and the overpass; a river is flowing through the overpass
church composed of spires of different heights is under construction; trees are arranged in a rectangle in front of the church; dense residential buildings around the building; cars are passing by the church
buildings at the night in the background; skyscrapers in the middle; roads among buildings; cars are passing on the road
rolling mountains in the background; low residential buildings surrounded by mountains; two rivers crossing the city; cars are crossing rivers by bridges;
In the center of the scene is a small white cylindrical building. The roof of the building is a gray cone. There is a white windmill with 4 blades on the side of the building. The scene is surrounded by a vast expanse of green and yellow grass. There are a few groves scattered across the grass.
mountains with granite formations on the top; the summit is covered with meadows, lichens and other mosses; the hillside valley is lush with a large number of pine trees; stone brick buildings with all-stone walls on top of the hill; winding stone paths beside the steep rocks; the top boulders are irregularly rounded with steep slopes; in the distance are continuous mountains;
an irregularly shaped tree beside the clearing; an electric bike next to the tree; two adults and a child on the lawn; an iron plate is near the tree; a path is also on the side of the tree; in the distance are squares of lawn and residential area; cars driving on a distant street;
a flagstone road crosses the grassland; three families are having a picnic on the grassland; the open ground under the grass slope; a woman and a child are playing badminton on the open ground; low vegetation is sparsely distributed on the grassland; a pavilion between the flagstone road and the open ground; two paths connect the open ground and the flagstone road
In the center of the scene is a dead tree branch. There is a gray bird's nest on the branch. The scene is surrounded by dense green trees. There is a thin narrow road in the center of the scene. There is a white platform next to the nest.e
wide range of rocks in the background; stepped huge pit in the middle; a small lake at the bottom of the pit; snow is all over the rocks; the construction site on the top of the pit
Benchmark for novel view synthesis. We present the performance of six state-of-the-art and representative methods on our dataset. ↑ means the higher, the better.
More sub-benchmarks for novel view synthesis. We divide our dataset into subsets based on different scene types, camera trajectories, and lighting conditions, and provide sub-benchmarks under different settings. ↑ means the higher, the better.